Dec 03
Sorry, almost forgot the previous night when I was posting about slowing down Yahoo Slurp, that the same technique applies to MSNBot as well. The only difference is a different name (bot name) User-agent definition.
User-Agent: msnbot
Crawl-Delay: 10
The above 2 lines in robots.txt file will instruct MSN Bot to delay new request by 10 seconds before it requests new file from your domain web site. Now, back to work ! 
Dec 02
Some people are screaming that Yahoo Slurp is overloading their web site and eating too much resources, including bandwidth.
In order to back off Slurp bot with custom delay between each requests (frequency), you can use robots.txt file and place it in your web site root directory, for example
http://www.myfinestblogforever.com/robots.txt
The following robots.txt setup will delay Yahoo Slurp bot 30 seconds (atleast) before each new GET query.
User-agent: Slurp
Crawl-delay: 30
Where value 30 is 30 seconds. You can define “Crawl-delay 60” and it will delay 60 seconds between each request from the Yahoo Slurp bot nodes.
However, some people are still complaining that Yahoo! Slurp will somehow ignore this value and spider bot nodes from different data centers and locations will still continue spidering and ignoring minimum delay time between each request you defined in robots.txt file. We believe that robots.txt file settings takes some extra time. So get back and work on your web site and stop looking what spiders spider at your web site 
Recent Comments