Welcome to our troubleshooting guide for the most common indexing problems. It’s high time we wrote one, considering how often those issues are encountered across the web. We will pay more particular attention to the case of Google since this is where we notice most of the complaints. After all, it’s pretty logical that the majority of websites want to see themselves on the world’s top search engine.
But what do we exactly label as problems here?
Let us ask you one concrete question. Are you suddenly experiencing some unexplainable deterioration regarding your website traffic? If so, this might be simply due to an indexing drawback that you aren’t aware of yet. So here’s the plan: we will break down the most common indexing problems and their solutions one by one. For each problem, you will get the most easily implementable solutions.
- Duplicate Content
- Poor Quality
- Crawl Budget
- Soft 404
- General Crawling and Scanning Issues
Duplicate Content
Perhaps one of the most common website indexing problems is duplicate content. This occurs when similar (if not entirely identical) content is displayed on multiple URLs. Search engines struggle to decide which one they should choose and put in their results. Oftentimes they just ignore the content in question altogether. This may have highly negative consequences, such as ending up with an entire website not showing up on Google.
This is not really anyone’s fault in particular. It may be a technical bug or something beyond your control. For example, some sites may be using parts of your content without you knowing it (content syndication).
You may also be in the middle of an ongoing migration process from HTTP to HTTPS. Or you may be using two versions of your site (one with prefixes such as www and the other without). Not to mention printer-friendly pages and session IDs that keep track of your visitors. All of these scenarios can cause site indexing problems because they look like duplicate content.
Search engines may put into the same basket URLs that have any resemblances with one another. Likewise, the same codes appearing in a different order or the same terms using different cases (upper versus lower) can cause confusion. Compare:
- www.inflatablemugs.com/homepage/?a=1&b=2 and www.inflatablemugs.com/homepage/?b=2&a=1
- www.inflatablemugs.com/products and www.inflatablemugs.com/Products
Code-related issues can also happen when your online store offers different versions/colors of the same product.
What to Do
One of the best troubleshooting practices here is to use canonical URLs (rel=”canonical”). By doing so, you instruct the search engines to treat the so-called duplications as copies of specific pages. Copies benefit from the same advantages and status as canonical URLs.
Redirection also works fine. Use a 301 redirect protocol to lead the crawlers from the duplicate page to the original page. This strategy can even act as a combination of ranking power (pages reinforcing each other).
You can also add meta robots to the HTML headers of the pages you don’t want to see indexed. This won’t prevent Google from crawling them, but they won’t be indexed.
Poor Quality
Have you already gotten a ‘Crawled – Currently Not Indexed’ message after submitting a URL to Google Search Console? The explanation is less mysterious than it seems. One thing most web experts are unanimous about is definitely the importance of quality over quantity. No one should create a website just for the sake of it and then fill it with mediocre material.
For example, website owners who care about their rankings can simply not afford to practice ‘cheap’ SEO. Just think about keywords. One can’t use them randomly. They should rather be parts of a coherent ensemble that follows basic rules of grammar, semantics, argumentation, originality, etc. Unfinished/fragmented content and plagiarism are also very likely to create indexing problems.
What to Do
Pay attention to content quality. We are aware that there’s no such thing as 100% novelty. Every ‘new’ idea and creation is inspired by and the extension of older ones. Nevertheless, one should always try to bring some personalized touch.
Your headlines, descriptions, and body texts should be unique and match the objectives you have set for your site. Not confident enough about your skills? Don’t hesitate to hire or collaborate with domain specialists such as copywriters.
Web designers and webmasters are also here to ensure coherence to your site and fix any related breakdown. And, of course, refrain from falling into plagiarism. Beyond legal and ethical issues, this can also result in duplicated content.
If you really can’t avoid a few lower-quality elements on your website, that’s ok. Just instruct Google not to crawl them. To do so, use a robots.txt file. Similarly, noindex tags can prevent search engines from indexing certain sections.
Crawl Budget
Another one of these common indexing problems is the crawl budget. Googlebot is not Led Zeppelin. It doesn’t say, ‘I’m gonna crawl’ at all costs. Jokes aside, the crawling mechanism is indeed an extremely resource-consuming one. When creating website indexes, all search engines and their web crawlers have to sort out numerous sophisticated parameters. Therefore, they have to proceed in an economical way by setting some quotas.
They simply can’t spend all their time on one single website. So they remain within a determined budget. For each website, they crawl and index only a certain number of pages. This limitation can lead to various indexing problems. Most typically, Google can ignore some of your URLs.
No need to panic, though. Search engines are generous enough despite their aforementioned quota. They are usually restrictive only, with websites having too many pages and redirects. But since it’s better to be safe than sorry, let’s check the possible precautions and solutions.
What to Do
Be ‘techniclean.’ Make sure that your website is compliant with basic contemporary technical requirements. Check your site speed and keep it up-to-date. Use a sitemap along with flat, interlinked site architecture.
Speaking of interlinked architecture, be careful with orphan pages. If you have any, integrate them into the rest of the site through internal and external links.
Internal links are, by the way, very budget-friendly elements. So, incorporate them into your site as much as you can. This will allow Googlebot to navigate across your pages more efficiently. Here again, avoid duplicate content in order not to waste the crawl budget assigned to your site.
Soft 404
Another one of these common indexing problems is the Soft 404 error. Soft 404 errors are among the website issues occurring frequently. You probably already know the classical HTTP 404. It’s the standard response code indicating that a certain page hasn’t been found. It usually appears when the page in question has been removed or in case of a broken link. This information is crucial for web crawlers.
That’s how they know they shouldn’t care about ‘dead’ pages anymore. So what about Soft 404? Sometimes, there’s a communication breakdown with the server. People trying to access non-existent pages get a simple ‘page not found’ message instead of the standard HTTP 404 confirmation. Or worse, they are redirected to a totally different page. These soft or ‘fake’ 404 situations create a mess because they mislead the crawlers.
What to Do
First of all, you got to be sure that there’s no false alarm. Google Search Console may sometimes treat certain pages as soft 404 for no apparent reason. Use the following verification procedure:
- Access your list of soft 404 in your Console board.
Log in to your account. Reach your Coverage report through the ‘Coverage’ section on the left menu. Click on the ‘Submitted URL seems to be a Soft 404 error’ notification. - Browse the list of soft 404.
While checking the list of soft 404 pages returned by the system, also open the related URLs in new tabs. Compare them with each other. - Fix the possible errors.
If the page(s) you are inspecting is/are valid, select the ‘Validate Fix’ option. This informs Google that you want it/them to be crawled and appear in the search results.
You can check whether the operation has succeeded by testing your browser’s URLs.
At other times, there may be too little content or a problem regarding the overall quality. Revise and improve your pages, then resubmit them to Google index. Make also sure to delete any nonvalid pages and reconfigure the server with the appropriate HTTP response codes (404 or 410).
If you really need to keep a ‘problematic’ page, add a noindex directive in the header. That’s how search engines will know they should not index that specific page. You may also redirect a defective page to a valid one by adding a 301 redirect code to your .htaccess file.
General Crawling and Scanning Issues
Common indexing problems may appear in a multitude of other forms requiring a case-by-case investigation. A very common example is configuration mistakes regarding robots.txt files. Even the slightest error found in them may prevent Googlebot from scanning your pages properly. So make sure to check once again everything from A to Z. Be it your user-agent directives or simply the placement of slashes, every single detail counts.
Another general and frequent issue is related to the website size. A recent study has shown that smaller websites are more affected by a reduced Google crawl budget. In other words, the granted budget may be proportional to the site size. But some bigger sites also suffer from a similar problem because, well, they have much more components. As for duplicate content, smaller sites tend to be more repetitive. It could be partly because they have less diversified content.
Note that scanning and Google crawling issues happen more often on larger websites. There’s simply too much information to be processed.
What to Do
There’s no specific formula to report here except handling your website according to its website. Those of you dealing with larger sites ought to spend more time on inspection activities. Otherwise, the huge number of pages and elements can quickly become overwhelming.
Additional Note on Background Check Issues
If you ventured into some shady virtual activities in the past and then were penalized, you should clean this up. Otherwise, high chance that Google will keep ignoring your website.
Terminate any ongoing legal proceedings. Then, make a fresh start. Build a brand new domain and site from scratch with totally updated content. Most importantly, be sure to make peace with the official rules this time.
As you can tell, we haven’t touched much upon Google indexing API; but if you wish, you can get detailed information by clicking on the link.
Getting Rid of Indexing Problems
In our article, we have reviewed some of the most common indexing problems, especially with Google. It was of the utmost importance to investigate their respective solutions for obvious reasons. Like what? Doing justice to your SEO efforts, ensuring higher ranking and authority to your website, and improving the user experience of your visitors. Yes, all of that and even more. A smoothly running indexing process is like a security valve for your website.
Frequently Asked Questions About
Recheck your coding language to see if it’s uniform and coherent across your entire website. You may also have accidentally activated a noindex directive for that particular page. If so, cancel it in order to restore the indexability of your page.
A major part of website traffic comes from search engine results. So being indexed has a positive effect on the visibility and ranking of your site.
Yes, it does more and more, especially since 2018. In effect, Google explained back then that its mobile-first indexing system would become more predominant and prioritized over time. So switching to mobile optimization is a good idea if you haven’t done it yet.
You may check it out through the URL inspection tool in Google Search Console. See the specific date and time next to the ‘Last Crawl’ section.
Yes, some online services provide quick verification. One of them is webpagetest.org, where you can run a site performance test.
No comments to show.