Optimizing Your Crawling and Indexing

August 10, 2009 by Pablo Palatnik


Googles webmaster blog has a post regarding optimizing basically for the alog to make the best sense of your site and it really comes down to one question, “How easy is it for search engines to crawl your site?” Platforms such as wordpress and other blogs have made it so SEO friendly, that’s why blogs rank pretty well, for the most part that is.

* Remove user-specific details from URLs.

URL parameters that don’t change the content of the page—like session IDs or sort order—can be removed from the URL and put into a cookie. By putting this information in a cookie and 301 redirecting to a “clean” URL, you retain the information and reduce the number of URLs pointing to that same content.

* Rein in infinite spaces.

Do you have a calendar that links to an infinite number of past or future dates (each with their own unique URL)? Do you have paginated data that returns a status code of 200 when you add &page=3563 to the URL, even if there aren’t that many pages of data? If so, you have an infinite crawl space on your website, and crawlers could be wasting their (and your!) bandwidth trying to crawl it all. Consider these tips for reining in infinite spaces.

* Disallow actions Googlebot can’t perform.

Using your robots.txt file, you can disallow crawling of login pages, contact forms, shopping carts, and other pages whose sole functionality is something that a crawler can’t perform. (Crawlers are notoriously cheap and shy, so they don’t usually “Add to cart” or “Contact us.”) This lets crawlers spend more of their time crawling content that they can actually do something with.

* One man, one vote. One URL, one set of content.

In an ideal world, there’s a one-to-one pairing between URL and content: each URL leads to a unique piece of content, and each piece of content can only be accessed via one URL. The closer you can get to this ideal, the more streamlined your site will be for crawling and indexing. If your CMS or current site setup makes this difficult, you can use the rel=canonical element to indicate the preferred URL for a particular piece of content.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google

Random Posts

Leave a Reply

Powered by Fortune3 Shopping Cart Software