Did you know that Google has it’s own Robots.txt, Robots.txt is a text file that prevents search engines from accessing and indexing some file. So check it out. Well upon checking Google Robots.txt, this is what I saw.
User-agent: zombies Disallow: /brains User-agent: * Allow: /searchhistory/ Disallow: /news?output=xhtml& Allow: /news?output=xhtml Disallow: /search Disallow: /groups Disallow: /images Disallow: /catalogs Disallow: /catalogues Disallow: /news Disallow: /nwshp Allow: /news?btcid= Disallow: /news?btcid=*& Allow: /news?btaid= Disallow: /news?btaid=*& Disallow: /setnewsprefs? Disallow: /index.html? Disallow: /? Disallow: /addurl/image? Disallow: /pagead/ Disallow: /relpage/ Disallow: /relcontent Disallow: /sorry/ Disallow: /imgres Disallow: /keyword/ Disallow: /u/ Disallow: /univ/ Disallow: /cobrand Disallow: /custom Disallow: /advanced_group_search Disallow: /googlesite Disallow: /preferences Disallow: /setprefs Disallow: /swr Disallow: /url Disallow: /default Disallow: /m? Disallow: /m/? Disallow: /m/ig Disallow: /m/lcb Disallow: /m/news? Disallow: /m/setnewsprefs? Disallow: /m/search? Disallow: /m/trends Disallow: /wml? Disallow: /wml/? Disallow: /wml/search? Disallow: /xhtml? Disallow: /xhtml/? Disallow: /xhtml/search? Disallow: /xml? Disallow: /imode? Disallow: /imode/? Disallow: /imode/search? Disallow: /jsky? Disallow: /jsky/? Disallow: /jsky/search? Disallow: /pda? Disallow: /pda/? Disallow: /pda/search? Disallow: /sprint_xhtml Disallow: /sprint_wml Disallow: /pqa Disallow: /palm Disallow: /gwt/ Disallow: /purchases Disallow: /hws Disallow: /bsd? Disallow: /linux? Disallow: /mac? Disallow: /microsoft? Disallow: /unclesam? Disallow: /answers/search?q= Disallow: /local? Disallow: /local_url Disallow: /froogle? Disallow: /products? Disallow: /froogle_ Disallow: /product_ Disallow: /products_ Disallow: /print Disallow: /books Allow: /booksrightsholders Disallow: /patents? Disallow: /scholar? Disallow: /complete Disallow: /sponsoredlinks Disallow: /videosearch? Disallow: /videopreview? Disallow: /videoprograminfo? Disallow: /maps? Disallow: /mapstt? Disallow: /mapslt? Disallow: /maps/stk/ Disallow: /maps/br? Disallow: /mapabcpoi? Disallow: /center Disallow: /ie? Disallow: /sms/demo? Disallow: /katrina? Disallow: /blogsearch? Disallow: /blogsearch/ Disallow: /blogsearch_feeds Disallow: /advanced_blog_search Disallow: /reader/ Disallow: /uds/ Disallow: /chart? Disallow: /transit? Disallow: /mbd? Disallow: /extern_js/ Disallow: /calendar/feeds/ Disallow: /calendar/ical/ Disallow: /cl2/feeds/ Disallow: /cl2/ical/ Disallow: /coop/directory Disallow: /coop/manage Disallow: /trends? Disallow: /trends/music? Disallow: /notebook/search? Disallow: /music Disallow: /musica Disallow: /musicad Disallow: /musicas Disallow: /musicl Disallow: /musics Disallow: /musicsearch Disallow: /musicsp Disallow: /musiclp Disallow: /browsersync Disallow: /call Disallow: /archivesearch? Disallow: /archivesearch/url Disallow: /archivesearch/advanced_search Disallow: /base/search? Disallow: /base/reportbadoffer Disallow: /base/s2 Disallow: /urchin_test/ Disallow: /movies? Disallow: /codesearch? Disallow: /codesearch/feeds/search? Disallow: /wapsearch? Disallow: /safebrowsing Disallow: /reviews/search? Disallow: /orkut/albums Disallow: /jsapi Disallow: /views? Disallow: /c/ Disallow: /cbk Disallow: /recharge/dashboard/car Disallow: /recharge/dashboard/static/ Disallow: /translate_c Disallow: /translate_suggestion Disallow: /s2/profiles/me Allow: /s2/profiles Disallow: /s2 Disallow: /transconsole/portal/ Disallow: /gcc/ Disallow: /aclk Disallow: /cse? Disallow: /tbproxy/ Disallow: /MerchantSearchBeta/ Disallow: /ime/ Disallow: /websites? Disallow: /shenghuo/search? Disallow: /support/forum/search? Disallow: /reviews/polls/ Disallow: /hosted/images/ Disallow: /hosted/life/ Disallow: /newspapers? Disallow: /search2001/search? Disallow: /ppob/? Disallow: /ppob? Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml
Try to learn on how Google is making their Robots.txt and you may way to implement it also in your site. I have made a tutorial for Robots.txt just check it out form the link. Just want to share what I found today while browsing the net.
I really hope I could manipulate my own robots.txt.. Too bad I’m still using the free blogger hosting and have no control over this area.
That is one of the problem for free blogspot since you cannot command the search engine. There are limitations for free Blog