Why does robots.txt is important?

What is robots.txt ?
When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored.
Where to place it?
The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example:
http://www.yourwebsite.com/robots.txt
How to create it?
This is simple text file. There are basically 2 parts;
The User-agent line specifies the robot. For example:
User-agent: googlebot
You may also use the wildcard character “*” to specify all robots:
User-agent: *
You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have short names for their spiders.
The second part of a record consists of Disallow: directive lines. These lines specify files and/or directories. For example, the following line instructs spiders that it can not download adminlogin.amty:
Disallow: adminlogin.amty
You may also specify directories:
Disallow: /cgi-bin/
Which would block spiders from your cgi-bin directory.
There is a wildcard nature to the Disallow directive. The standard dictates that /bob would disallow /bob.html and /bob/indes.html (both the file bob and files in the bob directory will not be indexed).
If you leave the Disallow line blank, it indicates that ALL files may be retrieved. At least one disallow line must be present for each User-agent directive to be correct. A completely empty Robots.txt file is the same as if it were not present.
Self understanding Example;
Sitemap: http://www.yourwebsite.com/sitemap-web.xml Sitemap: http://www.yourwebsite.com/sitemap-mobile.xml Sitemap: http://www.yourwebsite.com/sitemap-image.xml Sitemap: http://www.yourwebsite.com/sitemap-video.xml User-Agent: * Disallow: /wp/wp-admin/ Disallow: /wp/wp-includes/ Disallow: /wp/wp-content/ Disallow: /wp/wp- Disallow: /go/ Disallow: /forums/profile/
2. Try to avoid comments in robots.txt
What you can do with robots.txt
You can stop crawlers to look into into your site contents.
You can protect cache folders, private folders from outsiders.
What to hide?
1. Cache folders & files
2. Search results
3. Login page
Never forget to read about how to use robots.txt to hack actual path of wordpress installation directory.
views


Blogger Broadcast
27 Sep, 2011
Hello, I have a couple of question about robot.txt file.
1) If my blog is on Blogger, how do I upload a robot .txt file?
2)I have the following url that is restricted: Ihttp://www.bloggerbroadcast.com/search/label/Savings
URL restricted by robots.txt Sep 23, 2011
a) is this bad for my site to have these labels restricted? Is this why my search widget doesn’t work?
b) how do I remove these files from being restricted?
Thanks for your time, any help would be appreciated.
Amit Gupta
27 Sep, 2011
Unfortunately, you can not upload robots.txt to any blogger or wordpress site until you host them to some other server.
Moreover, If you are searching on your own site then robots.txt doesn’t interrupt your search