Robots.txt Sitemaps

Before diving into the robots.txt sitemap topic, we need to remember a few elements about robots.txt files. You may picture robots.txt files as a kind of intelligent guide or instruction manual for search engine robots. They give indications about which pages should or should not be crawled and also determine which robots are allowed to crawl your website.

In other words, robots.txt is a text file or core component of a website. It can come in different formats, depending on the goal it is supposed to serve. For example, if you don’t want your site to be too restrictive or selective, the ‘User-agent’ and ‘Disallow’ areas of the file can be left almost blank.

On the other hand, if you intend to give specific instructions to crawling robots, then you will have to fill those areas with further information about which ‘user-agents’ you want to provide access to or which URLs you ‘disallow.’ One thing to keep in mind though: you can’t control the bots 100%. Even if they follow the indications written in your files, they will also decide what to do on their own if your files are confusing or lacking some mandatory details.

Speaking of more information, let’s not forget that each of your subdomains requires separate robot.txt files. So let’s say that your domain name is inflatablemugs.com: you will need a file for the domain, but also for the subdomains related to it (e.g., contact.inflatablemugs.com, about.inflatablemugs.com, and so on).

whats the deal with sitemaps

What’s the Deal With Sitemaps?

Some of you may be seeing formulas such as robots.txt sitemap and not fully understanding what they stand for. We have already tried to refresh our memories about the concept of robot.txt in the first section of this article. Now, let’s take care of the sitemap part. Actually, the term sitemaps protocol would be more appropriate.

After the first attempts of Google in 2005, a more elaborate Sitemaps Protocol had been announced in 2006 under the joint supervision of Google, Microsoft, and Yahoo!. Roughly speaking, the main idea was to offer some support for webmasters who wanted to provide more accurate information about their sites to the search engines. A sitemap is indeed comparable to an identity document for websites since search engines identify the site content thanks to it.

There are different types of sitemaps, each one displaying specific elements: XML sitemaps, image sitemaps, video sitemaps, news sitemaps… So search engines sort of pick up what they need. They also get informed about the most important parts of the site. In fact, sitemaps are the counterpart of robots.txt files: the former follows an inclusive logic, while the latter aims at excluding irrelevant items when necessary. We will now focus on how that partnership works or what we should do when we want to add a sitemap in robots.txt.

How to Add Sitemap to Robots.txt Files

The most usual way to get robots.txt sitemaps (that is, to add a sitemap to robots.txt) is as follows:

Step 1: Determine Your Sitemap URL

First of all, you have to locate your XML sitemap, also known as the most important sitemap, to highlight the pages that matter the most. In case your site was created by someone else, you should contact the developer to get your sitemap information. Indeed, you need to access your File Manager. Let’s take our inflatablemugs.com example one more time. The default URL for this sitemap would be:
https://www.inflatablemugs.com/sitemap.xml

If you happen to have several XML sitemaps (depending on the requirements of your site), those will be grouped into a sitemap index. In this case, the URL will look like this:
https://inflatablemugs.com/sitemap_index.html

There are a bunch of other well-known solutions to reach sitemaps. Among those, we can cite XML sitemap generators such as the one available on xml-sitemaps.com. This service is free for amounts less than or equal to 500 pages. You can also use Google Search if your site is already indexed by Google. If so, you can enter operator information such as:
filetype:xml site:inflatablemugs.com

locate your robotstxt file

Step 2: Locate Your Robots.txt File

To get started, you will need a text file with a ‘.txt’ extension. It’s quite easy to create. Remember the ‘User-agent’ and ‘Disallow’ areas? Well, they should be filled as follows:

User-agent: * 
Disallow:

Yes, there is almost no information in this case because the goal here is not to restrict the access but rather to be visible. In other terms, what you are trying to do here is to provide information for the directory of your server. Once you are done, you can check whether the file does now exist by typing something that would look like:
https://www.inflatablemugs.com/robots.txt

Needless to say that the central part in the above example should be replaced by your own domain name.

add a sitemap to the robotstxt file

Step 3: Add a Sitemap to the Robots.txt File

Here we are. This is the final step where you can add your sitemap in the robots.txt file. The latter will need to be slightly edited so that you can include the URL of your sitemap. If we use our previous example, which is inflatablemugs.com, then the entry would look like this: Sitemap: https://www.inflatablemugs.com/sitemap.xml

So this means that the robots.txt file will become: Sitemap: https://www.inflatablemugs.com/sitemap.xml

User-agent: * Disallow:

If you have a large website and/or several subsections, you will most likely need more than one sitemap. Indeed, grouping the different elements will help you keep things more organized. This means that you should create a sitemap index file (a concept already mentioned in the previous paragraphs). Don’t worry; it is not as complicated as it may sound. It is just a kind of larger XML sitemap file.

To create a sitemap index file, you have two main possibilities:

  • You can report your sitemap index file URL within the robots.txt file already discussed above. The result would look like this:
    Sitemap: https://www.inflatablemugs.com/sitemap_index.xml
    User-agent: * Disallow:
  • Or you can also report each sitemap file URL separately, as follows:
    Sitemap: https://www.inflatablemugs.com/sitemap_1.xml
    Sitemap: https://www.inflatablemugs.com/sitemap_2.xml
    User-agent: * Disallow:

Conclusion on Adding Sitemap to Robots.txt

As you could see, robots.txt files and sitemaps form an efficient combo that can be viewed as the main protocol of a website. By using them, you get the chance to enhance your website planning, provide the necessary information to search engines and consequently, gain a non-negligible amount of time. In other words, you gather the key ingredients for a fully functioning website with a higher probability of getting indexed. If you want to take a few steps back and learn about creating robots.txt, go ahead and check out our other post.

Frequently Asked Questions About

Most of the current search engines do follow and respect robots.txt information. However, there may always be some exceptions. Plus, each search engine is likely to interpret robots.txt files in its own way.

Yes, this is a piece of important information for sitemaps. So you will need to be as specific as possible by including the protocol (either http or https). Some servers can also require trailing slashes.

Not really. The ‘priority tag gives an idea about the importance of a certain page in comparison to any other pages on your site. However, it does not give any advantage over other websites about ranking.

No, the importance or priority of an URL has nothing to do with its position on the list. It can be either at the top of the end of a sitemap; this would not change anything regarding its priority. What matters is the ‘priority tag.

There are several options. Just make sure that the program you have chosen is able to produce valid text files. Emacs, Notepad, TextEdit are some of the available programs coming to mind.

Yavuz Sadıkoğlu
Since his early years, Yavuz has been studying the inner workings of different digital environments.
Be the First to Comment on Robots.txt Sitemaps

Your email address will not be published. Required fields are marked *

(Total: 27 Average: 5 )

No comments to show.