SEO & Web Design Resources for every Website Designer & SEO Worker

Get the Google out of there!

There are times when you just don’t want Google to crawl certain pages such as login pages, development pages that aren’t ready to be seen or used by web surfers, temporary areas, and other web content like that. But, how can you stop Googlebot from crawling those pages anyway?

Well, you can use wildcards within robots.txt (such as, for example, ‘$’) in order to block Googlebot. And she goes a-like-a dis:

1) Add the parameter such as ‘http://www.seoworkers.com/tools/analyzer.html?googlebot=nocrawl’ to your pages which you don’t want to be crawled, or at least not yet, by Googlebot.

2) Next, add what follows to your robots.txt.

User-agent: Googlebot
Disallow: *googlebot=nocrawl

It’s really quite simple. Now, Googlebot, might still see the links to the “taboo” pages, but it will not crawl them. The link may still show up in the search results, if an external site is linking to that page.

If you want to make sure that the link (snippet) will not show up in Google’s index, check this excellent how to prevent that: Bot Herding: The Ultimate Tool for PageRank Sculpting.

By the way, if you only cared about blocking a URL from crawling if ‘googlebot=nocrawl’ comes at the very end of the line, you can insert that ‘$’ character and signify the end of the line:

User-agent: Googlebot
Disallow: *googlebot=nocrawl$

Matt Cutts, Chief Principal Software Engineer and Lead of Google’s Web Spam Team published on his blog “Gadgets, Google, and SEO” a great tutorial, Googlebot: Keep Out! describing some of these practices.

All that said, there are ways to keep Google from indexing those special sections of your web site.

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Diigo
  • email
  • Faves
  • FriendFeed
  • Google Buzz
  • LinkedIn
  • MisterWong
  • MySpace
  • PDF
  • Ping.fm
  • Reddit
  • Technorati
  • Tumblr