Sunday, February 01, 2009

URLs restricted by robots.txt problem

URLs restricted by robots.txt problem: Two URLs were impacted

My Casualzone blog which hosts in blogspot, suddenly encountered a problem: URLs restricted by robots.txt problem when I checked in Webmaster Tools by Google. It really frustrated me because I did not make any main changes. The reports showed that I had two URLs facing this problem and the problem was detected by 28 Jan 2009, not so long ago from today.

Sign in to Webmaster Tools

Note: you could sign up Webmaster Tools: click on "about google" in google search page, click on "Webmaster Centre", click on "Sign in to Webmaster Tools".

Analyse robot.txt

After checking Google helps, it showed that it mostly related to robot.txt file which allows user to add some restriction on webpage. Looked strange to me because I did not edit this file before. Then I realized that there was a Analyse robot.txt option in Webmaster Tools:Tools section as shown in picture above.

robot.txt file

The image above showed the original robot.txt file. Basically I did not understand how it worked. One thing I noticed was that it had disallow section for /search. I was not sure how this code was added in but I started to suspect it must be a culprit. After that I checked on my problematic URLs to find whether there was anything related to /search.

By observation, I noticed that all Blogspot Tag links had http://your_blog_name.blogspot.com/search/label/xxxx. For instance of my label tag: http://casualzone.blogspot.com/search/label/Semenyih
That meant that not only the two URLs were impacted but all URLs with Tag were affected. Wow, it was a big problem!

Test impacted URL: blocked by disallow: /search

I tested the URLs one by one to confirm my statement just now. I inserted the impacted URL in to the "Test URL box" as shown in the above picture and then I ran a test. The test result showed that the URL was blocked by line 5: Disallow: /search. This verified my statement. The next step was how to fix it?

removed /search code from Disallow section
------------------------------------------------------------------------------------------------------
User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow:
Noindex: /feedReaderJson

Sitemap: http://casualzone.blogspot.com/feeds/posts/default?orderby=updated
------------------------------------------------------------------------------------------------------

Basically I removed /search code from Disallow section. After that I reexecuted the URL test. It showed no problem -->Allowed by line 5 , depicted in figure below.


Test was sucessfully without error

What was the next action? Save the new robot.txt file? The bad news is blogspot does not allow user to change robot.txt, too bad. That means those labels / tag links will not be indexed. It is fine because my URL links will not be duplicated according to google replied. If you really want to your label page to be indexed, for example: indexing my hiking label which consists of all my webpages about hiking, moutain, hill adventures. Then I have to create a page and manually put links to link back to those webpages.

You only can change the robot.txt if you host it or a server that allow you to upload file (robot.txt) into your main domain. If this is allowed, then you can remove the disallow statement in the robot.txt and upload to your website main domain. You could check your robot.txt too by typing http://your_domain.com/robot.txt or http://your_domain.blogspot.com/robot.txt

In short, my efford was in vain. Sigh..

7 comments:

Surveys Pay said...

I have the same URL restrictions problem, but Google does not want duplicate content in their index, so I can understand why they "disallow: /search" . Anyway to entirely remove this error from seeing it on Google webmaster tools CasulaZonE?

I like your blog very much.

Fill free to leave a comment on my paid surveys blog

Anonymous said...

I also facing this nonsense problem!!
But what is the solution?

Kok-Siang said...

As for blogger.com, you have no choice of fixing it. However it does not impact on your blogs in blogger.com, just that google does not want to duplicate the links (label/tag links)

David Bergeron said...

One more reason to go to wordpress:( I really thought I could make blogger work but I'm not so sure now, just so many problems.

Kok-Siang said...

i am not sure what problem you faced. As for the "disallow: /search" problem that I faced previously is gone after few couple of weeks. Hence do not waste so much times on this problem.

David Bergeron said...

Hi Kok-Siang,

There are a lot of small problems with blogger but most are fixable.

In google tools it shows:

Total URLs: 401
Indexed URLs: 283

So all the search URLs are not indexed but maybe not a problem because like you said then it would be duplicate content.

What do you mean the problem is fixed after a few weeks? Are your /search indexed now?

Kok-Siang said...

I didnt have any error in the webmaster tools now (in the overview tab). As for indexing, all my URL pages are in low index, which is highly associaed to blog traffic.