Pages with Feed and swcfpc=1 are not being indexed by google. - Support

Resolved Shreyas Goel
1 year, 8 months ago

Rank Math free

Around 50 pages with feed and swcfpc=1 are not being indexed. They shouldn’t be crawled by google and I don’t want them to be indexed. But why are they being crawled. Please solve this issue

Viewing 11 replies - 1 through 11 (of 11 total)

RankBot
1 year, 8 months ago
Hello,

I understand your concern about certain pages with "feed" and "swcfpc=1" parameters not being indexed by Google. If these pages shouldn’t be crawled and indexed, there are a few steps you can take to address this issue:
1. Robots.txt File: You can block these specific pages from being crawled by adding directives to your website’s robots.txt file. This file tells search engine crawlers which pages or files they should not crawl.
2. Noindex Meta Tag: You can add a noindex meta tag to the pages you don’t want to be indexed. This tag tells search engines not to include those pages in their index.
3. Canonical Tags: If these pages are duplicates or variations of other pages on your site, you can use canonical tags to specify the preferred version of the content that should be indexed.
4. Content Audit: Perform a content audit to identify low-quality or duplicate content on your website. Remove or improve these pages to enhance the overall quality of your site.
If you need further assistance in implementing these solutions or have any other questions, please let us know by assigning this ticket to a human agent using the available buttons. Thank you for reaching out, and we’re here to help you resolve this issue.

Closing spiel.

Sources:
Why Google Hasn’t Indexed Your Post, Page, and/or Website? » Rank Math
How to Fix "Crawled – Currently Not Indexed" Error in Google Search Console » Rank Math
John
1 year, 8 months ago

Rank Math free

Having your website indexed by Google is vital for your content’s visibility in search results.
Ike
1 year, 8 months ago
Hello,

Thanks for contacting us, and sorry for any inconvenience that might have been caused due to that.

Feed URLs are not created for humans but for RSS feed crawlers and readers. They are only basic code versions of your actual content pages.

Those are not meant to be indexed and hopefully, you won’t want these pages to be indexed as Google doesn’t like them and would most likely not show them to users anyway.

In this case, please add this rule to your robots.txt to disallow them from being crawled:
```
User-Agent: *
Disallow: */feed/
Disallow: *swcfpc=1*
```
Here’s how you can edit the robots.txt file using Rank Math:
https://rankmath.com/kb/add-sitemaps-to-robots-txt/#num-2-2-navigate-to-edit-robots-txt

This should save your crawl budget and after the next few crawls, the warnings should be removed from GSC.

Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

Thanks.
Shreyas Goel
1 year, 8 months ago

Rank Math free

As suggested by you I changed the default robots.text file and added
This code
User-Agent: *
Disallow: /feed/
Disallow: /swcfpc=1/
Disallow: /wp-admin/
Disallow: /wp-admin/admin-ajax.php

https://confidentpersonality.com/sitemap_index.xm

However, nor lighthouse is showing this

robots.txt is not valid 1 error found
If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.

Reinelle
1 year, 8 months ago

Hello,

Your robots.txt is already working fine.

This can happen when you use a content delivery network (CDN) like Cloudflare since Rank Math does not create a physical robots.txt file on your site.

Instead, the file is created and displayed when a user visits your robots.txt URL.

So, you can safely ignore this error if your robots.txt file is already available.

Here’s a link for more information:
https://rankmath.com/kb/fix-common-robots-txt-issues/#robots-txt-is-not-valid-in-pagespeed-insights

Looking forward to helping you.
Ike
1 year, 8 months ago
Hello,

You are correct, your robots.txt is not blocking the feed and swcfpc=1 URLs fully and correctly:

The disallow rule for them should be as we shared earlier:
```
Disallow: */feed/
Disallow: *swcfpc=1*
```
Also, the syntax for sitemap in the robots.txt is incorrect; it should be:
Sitemap: https://confidentpersonality.com/sitemap_index.xml

Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.
Shreyas Goel
1 year, 8 months ago

Rank Math free

Why rankmath are you not able to help me fix my problem? How many days should I suffer
Confidentpersonality.com
Google search console is displaying this
If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.
Line #
Content
Error
8
Sitemap:
Invalid sitemap URL
9
https://confidentpersonality.com/sitemap_index.xml
Unknown directive

Also my feed and swcpcf=1 pages are still being crawled. For once assign someone to help me or someone who can take the matter in his hands otherwise I will have to leave this plugin.

Shreyas Goel
1 year, 8 months ago

Rank Math free

Why rankmath are you not able to help me fix my problem? How many days should I suffer
Confidentpersonality.com
Google search console is displaying this
If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.
Line #
Content
Error
8
Sitemap:
Invalid sitemap URL
9
https://confidentpersonality.com/sitemap_index.xml
Unknown directive

Also my feed and swcpcf=1 pages are still being crawled. For once assign someone to help me or someone who can take the matter in his hands otherwise I will have to leave this plugin.

User-Agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: */feed/
Disallow: *swcfpc=1*
Disallow: /cdn-cgi/

Sitemap:
https://confidentpersonality.com/sitemap_index.xml

This is the robots.text please ensure to solve all problems by dedicating an agent for half an hour.
Jeremy
1 year, 8 months ago
Hello,

Apologies for the issue but the robots.txt is still in the wrong format.

Please use this rules instead:
```
User-Agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: */feed/
Disallow: *swcfpc=1*
Disallow: /cdn-cgi/

Sitemap: https://confidentpersonality.com/sitemap_index.xml
```
As for the issue with the swcfpc=1 pages getting indexed by Google. Let’s try forcing it to set to noindex in their robots meta.

Please use this filter code for your website:
```
add_filter('rank_math/frontend/robots', function ($robots) {
	$url = home_url($_SERVER['REQUEST_URI']);
	if (strpos($url, 'swcfpc=')) {
		$robots["index"] = 'noindex';
		$robots["follow"] = 'nofollow';
	}
	return $robots;
});
```
You may refer to this guide on how to add filters to your website: https://rankmath.com/kb/wordpress-hooks-actions-filters/

Let us know how that goes. Looking forward to helping you.
Shreyas Goel
1 year, 8 months ago

Rank Math free

I corrected the code. Tell me one thing if these pages exist in crawled but not indexed then is there any problem. Also when I check their url in inspection tool it shows blocked by robots.txt

Ike
1 year, 8 months ago

Hello,

We tested your site and can see that the robots.txt is now set up correctly and blocking those URLs. After Googlebot has crawled your site for a while it will no longer report it as crawled but not indexed.

The report of being blocked by robots.txt in the URL inspection tool indicates that it is now setup correctly.

Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

Viewing 11 replies - 1 through 11 (of 11 total)

You must be logged in to reply to this ticket.