The robots.txt file makes search engine bots more efficient when looking at the SEO of your site. A good metaphor is that the robots.txt file helps point google to where you want them to go, like a signpost to your content. This article will show you how to create robots.txt files to optimize your website’s SEO and increase traffic. Historically, robots.txt sites have slightly better SEO and web traffic than websites not using robots.txt files.
1. Create the File
Begin by creating a .txt file using notepad or any text editing software, saving the new file as ‘robots’ all in lowercase.
2. Add Lines of Text to the File
Type the following two lines of text to the robots.txt file you just saved:
User-agent: *
Disallow:
The * shows the robots checking your site that this line of the text applies to all of them.
3. Use the Disallow Lines to Direct the Bot’s Search
Using the disallow lines to limit the parts of your site where the Google bots are allowed to scan, you can make the SEO more efficient by cutting out any pages that you don’t want them to focus on, leaving them to go directly to your content. An example of a disallow line could be:
Disallow: /database/
Cutting out the database section of your site and helping streamline the search of your site.
4. Save the robots.txt File to Your Website
The final step is to save the newly created robots.txt file to the root directory of your website. Navigate to the root directory of your website hosting server and then save the robots.txt file there.
The Structure and Content of a robots.txt File
There are two elements to the robots.txt file. Firstly, you must name the user agent. After that, you give the commands of which directories on your site should be read or ignored. The sitemap.xml file of your site may be stored in the robots.txt file to ensure that the crawler calls the whole site. Below is the proper structure of a robots.txt file:
- The command that addresses the bot comes first.
User-agent:
- After User-agent: you can either name each bot individually or use an asterisk * to include all bots.
- Next comes the command lines. Disallow: This is for preventing certain areas from being accessed by the bots. While the Allow: Command allows access to the listed areas.
Below are a few sample robots.txt files:
Sample 1:
User-agent: seobot
Disallow: /nothere/
In this example, the bot named ‘seobot’ will not crawl the folder http://www.test.com/nothere/ and all following subdirectories.
Sample 2:
User-agent: *
Allow: /
In this example, all user agents can access the entire site. However, the bots will search the whole site anyway unless there is a disallow command, so the Allow: / command is unnecessary.
Sample 3:
User-agent: seobot
Disallow: /directory2/
Disallow: / directory3/
In this example, the bot names ‘seobot’ informs that it cannot view directories 2 and 3. Notice how each Disallow: command must be on a separate line.
Other Instructions a robots.txt File Can Use
We have mentioned a few command instructions above. Here is a descriptive list of instructions;
- User-agent: Used to name the bots you wish to give commands. Using the * to allow all agents to follow the commands.
- Disallow: Instructs bots not to access any directories that are under the designated file path. / the slash refers to all pages on the site, so Disallow: / would prevent bots from accessing any page.
- Allow: By default, every page on the site is marked as allow access. However, it can be used to give access to specific file paths even if previously blocked by the Disallow: command. This function is helpful if you want to block access to a subdomain and access a particular page within that blocked subdomain.
- Sitemap: Used to provide the location of your sitemap to the search engine bots.
How Does a robots.txt File Affect Search Engine Optimization?
If used properly, a robots.txt file can significantly impact search engine optimization (SEO). It is vital not to restrict the search engine bots too much with the disallow command. If they are too limited, it will have a negative result for the rank of your web pages. Before saving your file to the root directory, it is essential to check for any errors. If there is an error, it could mean that crucial areas of your site are not included or that areas intended to be ignored are included.
Google has a handy tool to check that your robots.txt file is working correctly. Use Google Search Console, as it will list any blocked pages from your disallow instructions under the headings of ‘current status’ and ‘crawl errors.’ The main benefit of using robots.txt is that you are ensuring that your site is fully indexed by the search engines that visit it.
Advantages of Using a robots.txt on Your Website
The search engine spiders that function to index websites on the internet have a predetermined allowance for the number of pages they can crawl, known as a crawl budget. The main advantage of a robots.txt file is that it allows you to block them from various parts of the site and focus on the more SEO friendly sections. For example, if your site sells t-shirts in many colours and sizes, each one has a valid URL for the bot to crawl. By blocking these, the bot can focus on the main important pages and skip over the multiple colours and sizes sections when you disallow that area. So if you create robots.txt for your site, you can benefit from these advantages.
Disadvantages of Using a robots.txt File
Search engines are not obligated to follow the commands provided in a robots.txt file. So in the future, the robots.txt file may get ignored entirely. Another downside is that even if you disallow a section of your website, it will be indexed into the search results regardless of the robots.txt file if there are enough links found to that section. Meaning that the Google result for that page will look blank because the bots aren’t allowed to view it, but they are aware it is there. The robots.txt file offers no protection from other people either, even though it is highly recommended to use password protection with the webserver. If you worry about these, you may not create robots.txt file for your site.
Concluding robots.txt File
We have explained how to create robots.txt file. Overall, a robots.txt file is easy to create and implement and can help boost SEO friendliness as well as increasing web traffic for your site. The fact that search engines may ignore the file entirely in the future does not take away the benefits of implementing the file today. The fact that a tiny file may help to direct search engines to specific areas of your site is too big an opportunity to ignore. Hopefully, you have enjoyed reading this article and learned something about robots.txt files and how to use them. If you want to learn what robots.txt file is, we have a more detailed article about it as well.
Frequently Asked Questions About
No, there is no need to use a robots.txt file. However, it can help your site be more SEO friendly and increase website traffic.
You should always save the robots.txt file in the root directory of your domain. From our earlier example, you should find your robots.txt file at https://www.test.com/robots.txt. The filename is case sensitive and must always be lowercase; otherwise, it will not work.
You should include your sitemap.xml file in the robots.txt file as it is considered good practice to do so. You can use the sitemap: command to reference your sitemap. Multiple sitemaps can be referenced in your robots.txt file. Make sure also to submit your sitemap through Google Search Console and Bing Webmaster Tools.
If there is already a robots.txt file on your website, you can find it at https://www.yourdomain.com/robots.txt. If you navigate to your URL and see the notepad text displayed, then you have a working robots.txt file already set up that you may edit if you wish.
Yes, many websites will generate robots.txt files for you. However, it is only a few lines of text, and anybody can create their own in a few minutes by following our simple robots.txt guide featured above.
No comments to show.