-
Around 50 pages with feed and swcfpc=1 are not being indexed. They shouldn’t be crawled by google and I don’t want them to be indexed. But why are they being crawled. Please solve this issue
-
Hello,
I understand your concern about certain pages with "feed" and "swcfpc=1" parameters not being indexed by Google. If these pages shouldn’t be crawled and indexed, there are a few steps you can take to address this issue:
-
Robots.txt File: You can block these specific pages from being crawled by adding directives to your website’s robots.txt file. This file tells search engine crawlers which pages or files they should not crawl.
-
Noindex Meta Tag: You can add a noindex meta tag to the pages you don’t want to be indexed. This tag tells search engines not to include those pages in their index.
-
Canonical Tags: If these pages are duplicates or variations of other pages on your site, you can use canonical tags to specify the preferred version of the content that should be indexed.
-
Content Audit: Perform a content audit to identify low-quality or duplicate content on your website. Remove or improve these pages to enhance the overall quality of your site.
If you need further assistance in implementing these solutions or have any other questions, please let us know by assigning this ticket to a human agent using the available buttons. Thank you for reaching out, and we’re here to help you resolve this issue.
Closing spiel.
Sources:
Why Google Hasn’t Indexed Your Post, Page, and/or Website? » Rank Math
How to Fix "Crawled – Currently Not Indexed" Error in Google Search Console » Rank MathHaving your website indexed by Google is vital for your content’s visibility in search results.
Hello,
Thanks for contacting us, and sorry for any inconvenience that might have been caused due to that.
Feed URLs are not created for humans but for RSS feed crawlers and readers. They are only basic code versions of your actual content pages.
Those are not meant to be indexed and hopefully, you won’t want these pages to be indexed as Google doesn’t like them and would most likely not show them to users anyway.
In this case, please add this rule to your
robots.txt
to disallow them from being crawled:User-Agent: * Disallow: */feed/ Disallow: *swcfpc=1*
Here’s how you can edit the
robots.txt
file using Rank Math:
https://rankmath.com/kb/add-sitemaps-to-robots-txt/#num-2-2-navigate-to-edit-robots-txtThis should save your crawl budget and after the next few crawls, the warnings should be removed from GSC.
Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.
Thanks.
As suggested by you I changed the default robots.text file and added
This code
User-Agent: *
Disallow: /feed/
Disallow: /swcfpc=1/
Disallow: /wp-admin/
Disallow: /wp-admin/admin-ajax.phphttps://confidentpersonality.com/sitemap_index.xm
However, nor lighthouse is showing this
robots.txt is not valid 1 error found
If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.Hello,
Your robots.txt is already working fine.
This can happen when you use a content delivery network (CDN) like Cloudflare since Rank Math does not create a physical robots.txt file on your site.
Instead, the file is created and displayed when a user visits your robots.txt URL.
So, you can safely ignore this error if your robots.txt file is already available.
Here’s a link for more information:
https://rankmath.com/kb/fix-common-robots-txt-issues/#robots-txt-is-not-valid-in-pagespeed-insightsLooking forward to helping you.
Hello,
You are correct, your robots.txt is not blocking the feed and swcfpc=1 URLs fully and correctly:
The disallow rule for them should be as we shared earlier:
Disallow: */feed/ Disallow: *swcfpc=1*
Also, the syntax for sitemap in the robots.txt is incorrect; it should be:
Sitemap: https://confidentpersonality.com/sitemap_index.xml
Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.
Why rankmath are you not able to help me fix my problem? How many days should I suffer
Confidentpersonality.com
Google search console is displaying this
If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.
Line #
Content
Error
8
Sitemap:
Invalid sitemap URL
9
https://confidentpersonality.com/sitemap_index.xml
Unknown directiveAlso my feed and swcpcf=1 pages are still being crawled. For once assign someone to help me or someone who can take the matter in his hands otherwise I will have to leave this plugin.
Why rankmath are you not able to help me fix my problem? How many days should I suffer
Confidentpersonality.com
Google search console is displaying this
If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.
Line #
Content
Error
8
Sitemap:
Invalid sitemap URL
9
https://confidentpersonality.com/sitemap_index.xml
Unknown directiveAlso my feed and swcpcf=1 pages are still being crawled. For once assign someone to help me or someone who can take the matter in his hands otherwise I will have to leave this plugin.
User-Agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: */feed/
Disallow: *swcfpc=1*
Disallow: /cdn-cgi/Sitemap:
https://confidentpersonality.com/sitemap_index.xmlThis is the robots.text please ensure to solve all problems by dedicating an agent for half an hour.
Hello,
Apologies for the issue but the robots.txt is still in the wrong format.
Please use this rules instead:
User-Agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: */feed/ Disallow: *swcfpc=1* Disallow: /cdn-cgi/ Sitemap: https://confidentpersonality.com/sitemap_index.xml
As for the issue with the
swcfpc=1
pages getting indexed by Google. Let’s try forcing it to set tonoindex
in their robots meta.Please use this filter code for your website:
add_filter('rank_math/frontend/robots', function ($robots) { $url = home_url($_SERVER['REQUEST_URI']); if (strpos($url, 'swcfpc=')) { $robots["index"] = 'noindex'; $robots["follow"] = 'nofollow'; } return $robots; });
You may refer to this guide on how to add filters to your website: https://rankmath.com/kb/wordpress-hooks-actions-filters/
Let us know how that goes. Looking forward to helping you.
I corrected the code. Tell me one thing if these pages exist in crawled but not indexed then is there any problem. Also when I check their url in inspection tool it shows blocked by robots.txt
Hello,
We tested your site and can see that the robots.txt is now set up correctly and blocking those URLs. After Googlebot has crawled your site for a while it will no longer report it as crawled but not indexed.
The report of being blocked by robots.txt in the URL inspection tool indicates that it is now setup correctly.
Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.
-
You must be logged in to reply to this ticket.