My Casualzone blog which hosts in blogspot, suddenly encountered a problem: URLs restricted by robots.txt problem when I checked in Webmaster Tools by Google. It really frustrated me because I did not make any main changes. The reports showed that I had two URLs facing this problem and the problem was detected by 28 Jan 2009, not so long ago from today.
Note: you could sign up Webmaster Tools: click on "about google" in google search page, click on "Webmaster Centre", click on "Sign in to Webmaster Tools".
After checking Google helps, it showed that it mostly related to robot.txt file which allows user to add some restriction on webpage. Looked strange to me because I did not edit this file before. Then I realized that there was a Analyse robot.txt option in Webmaster Tools:Tools section as shown in picture above.
robot.txt file
The image above showed the original robot.txt file. Basically I did not understand how it worked. One thing I noticed was that it had disallow section for /search. I was not sure how this code was added in but I started to suspect it must be a culprit. After that I checked on my problematic URLs to find whether there was anything related to /search.
By observation, I noticed that all Blogspot Tag links had http://your_blog_name.blogspot.com/search/label/xxxx. For instance of my label tag: http://casualzone.blogspot.com/search/label/Semenyih
That meant that not only the two URLs were impacted but all URLs with Tag were affected. Wow, it was a big problem!
I tested the URLs one by one to confirm my statement just now. I inserted the impacted URL in to the "Test URL box" as shown in the above picture and then I ran a test. The test result showed that the URL was blocked by line 5: Disallow: /search. This verified my statement. The next step was how to fix it?
By observation, I noticed that all Blogspot Tag links had http://your_blog_name.blogspot.com/search/label/xxxx. For instance of my label tag: http://casualzone.blogspot.com/search/label/Semenyih
That meant that not only the two URLs were impacted but all URLs with Tag were affected. Wow, it was a big problem!
------------------------------------------------------------------------------------------------------
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow:
Noindex: /feedReaderJson
Sitemap: http://casualzone.blogspot.com/feeds/posts/default?orderby=updated
------------------------------------------------------------------------------------------------------
Basically I removed /search code from Disallow section. After that I reexecuted the URL test. It showed no problem -->Allowed by line 5 , depicted in figure below.
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow:
Noindex: /feedReaderJson
Sitemap: http://casualzone.blogspot.com/feeds/posts/default?orderby=updated
------------------------------------------------------------------------------------------------------
Basically I removed /search code from Disallow section. After that I reexecuted the URL test. It showed no problem -->Allowed by line 5 , depicted in figure below.
Test was sucessfully without error
What was the next action? Save the new robot.txt file? The bad news is blogspot does not allow user to change robot.txt, too bad. That means those labels / tag links will not be indexed. It is fine because my URL links will not be duplicated according to google replied. If you really want to your label page to be indexed, for example: indexing my hiking label which consists of all my webpages about hiking, moutain, hill adventures. Then I have to create a page and manually put links to link back to those webpages.
You only can change the robot.txt if you host it or a server that allow you to upload file (robot.txt) into your main domain. If this is allowed, then you can remove the disallow statement in the robot.txt and upload to your website main domain. You could check your robot.txt too by typing http://your_domain.com/robot.txt or http://your_domain.blogspot.com/robot.txt
In short, my efford was in vain. Sigh..
In short, my efford was in vain. Sigh..
7 comments:
I have the same URL restrictions problem, but Google does not want duplicate content in their index, so I can understand why they "disallow: /search" . Anyway to entirely remove this error from seeing it on Google webmaster tools CasulaZonE?
I like your blog very much.
Fill free to leave a comment on my paid surveys blog
I also facing this nonsense problem!!
But what is the solution?
As for blogger.com, you have no choice of fixing it. However it does not impact on your blogs in blogger.com, just that google does not want to duplicate the links (label/tag links)
One more reason to go to wordpress:( I really thought I could make blogger work but I'm not so sure now, just so many problems.
i am not sure what problem you faced. As for the "disallow: /search" problem that I faced previously is gone after few couple of weeks. Hence do not waste so much times on this problem.
Hi Kok-Siang,
There are a lot of small problems with blogger but most are fixable.
In google tools it shows:
Total URLs: 401
Indexed URLs: 283
So all the search URLs are not indexed but maybe not a problem because like you said then it would be duplicate content.
What do you mean the problem is fixed after a few weeks? Are your /search indexed now?
I didnt have any error in the webmaster tools now (in the overview tab). As for indexing, all my URL pages are in low index, which is highly associaed to blog traffic.
Post a Comment