What is robots.txt ?
When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored.
Where to place it?
The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example:
How to create it?
This is simple text file. There are basically 2 parts;
The User-agent line specifies the robot. For example:
You may also use the wildcard character “*” to specify all robots:
You can find user agent names in your own logs by checking for requests to robots.txt. Most major search engines have short names for their spiders.
The second part of a record consists of Disallow: directive lines. These lines specify files and/or directories. For example, the following line instructs spiders that it can not download adminlogin.amty:
You may also specify directories:
Which would block spiders from your cgi-bin directory.
There is a wildcard nature to the Disallow directive. The standard dictates that /bob would disallow /bob.html and /bob/indes.html (both the file bob and files in the bob directory will not be indexed).
If you leave the Disallow line blank, it indicates that ALL files may be retrieved. At least one disallow line must be present for each User-agent directive to be correct. A completely empty Robots.txt file is the same as if it were not present.
Self understanding Example;
Sitemap: http://www.yourwebsite.com/sitemap-web.xml Sitemap: http://www.yourwebsite.com/sitemap-mobile.xml Sitemap: http://www.yourwebsite.com/sitemap-image.xml Sitemap: http://www.yourwebsite.com/sitemap-video.xml User-Agent: * Disallow: /wp/wp-admin/ Disallow: /wp/wp-includes/ Disallow: /wp/wp-content/ Disallow: /wp/wp- Disallow: /go/ Disallow: /forums/profile/
2. Try to avoid comments in robots.txt
What you can do with robots.txt
You can stop crawlers to look into into your site contents.
You can protect cache folders, private folders from outsiders.
What to hide?
1. Cache folders & files
2. Search results
3. Login page
Never forget to read about how to use robots.txt to hack actual path of wordpress installation directory.