Pages with Feed and swcfpc=1 are not being indexed by google.

#821007
  • Resolved Shreyas Goel
    Rank Math free

    Around 50 pages with feed and swcfpc=1 are not being indexed. They shouldn’t be crawled by google and I don’t want them to be indexed. But why are they being crawled. Please solve this issue

Viewing 11 replies - 1 through 11 (of 11 total)
  • Hello,

    I understand your concern about certain pages with "feed" and "swcfpc=1" parameters not being indexed by Google. If these pages shouldn’t be crawled and indexed, there are a few steps you can take to address this issue:

    1. Robots.txt File: You can block these specific pages from being crawled by adding directives to your website’s robots.txt file. This file tells search engine crawlers which pages or files they should not crawl.

    2. Noindex Meta Tag: You can add a noindex meta tag to the pages you don’t want to be indexed. This tag tells search engines not to include those pages in their index.

    3. Canonical Tags: If these pages are duplicates or variations of other pages on your site, you can use canonical tags to specify the preferred version of the content that should be indexed.

    4. Content Audit: Perform a content audit to identify low-quality or duplicate content on your website. Remove or improve these pages to enhance the overall quality of your site.

    If you need further assistance in implementing these solutions or have any other questions, please let us know by assigning this ticket to a human agent using the available buttons. Thank you for reaching out, and we’re here to help you resolve this issue.

    Closing spiel.

    Sources:
    Why Google Hasn’t Indexed Your Post, Page, and/or Website? » Rank Math
    How to Fix "Crawled – Currently Not Indexed" Error in Google Search Console » Rank Math

    John
    Rank Math free

    Having your website indexed by Google is vital for your content’s visibility in search results.

    Hello,

    Thanks for contacting us, and sorry for any inconvenience that might have been caused due to that.

    Feed URLs are not created for humans but for RSS feed crawlers and readers. They are only basic code versions of your actual content pages.

    Those are not meant to be indexed and hopefully, you won’t want these pages to be indexed as Google doesn’t like them and would most likely not show them to users anyway.

    In this case, please add this rule to your robots.txt to disallow them from being crawled:

    User-Agent: *
    Disallow: */feed/
    Disallow: *swcfpc=1*

    Here’s how you can edit the robots.txt file using Rank Math:
    https://rankmath.com/kb/add-sitemaps-to-robots-txt/#num-2-2-navigate-to-edit-robots-txt

    This should save your crawl budget and after the next few crawls, the warnings should be removed from GSC.

    Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

    Thanks.

    As suggested by you I changed the default robots.text file and added
    This code
    User-Agent: *
    Disallow: /feed/
    Disallow: /swcfpc=1/
    Disallow: /wp-admin/
    Disallow: /wp-admin/admin-ajax.php

    https://confidentpersonality.com/sitemap_index.xm

    However, nor lighthouse is showing this

    robots.txt is not valid 1 error found
    If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.

    Hello,

    Your robots.txt is already working fine.

    This can happen when you use a content delivery network (CDN) like Cloudflare since Rank Math does not create a physical robots.txt file on your site.

    Instead, the file is created and displayed when a user visits your robots.txt URL.

    So, you can safely ignore this error if your robots.txt file is already available.

    Here’s a link for more information:
    https://rankmath.com/kb/fix-common-robots-txt-issues/#robots-txt-is-not-valid-in-pagespeed-insights

    Looking forward to helping you.

    Hello,

    You are correct, your robots.txt is not blocking the feed and swcfpc=1 URLs fully and correctly:
    https://imgur.com/SQVXLdl

    The disallow rule for them should be as we shared earlier:

    Disallow: */feed/
    Disallow: *swcfpc=1*

    Also, the syntax for sitemap in the robots.txt is incorrect; it should be:
    Sitemap: https://confidentpersonality.com/sitemap_index.xml

    Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

    Why rankmath are you not able to help me fix my problem? How many days should I suffer
    Confidentpersonality.com
    Google search console is displaying this
    If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.
    Line #
    Content
    Error
    8
    Sitemap:
    Invalid sitemap URL
    9
    https://confidentpersonality.com/sitemap_index.xml
    Unknown directive

    Also my feed and swcpcf=1 pages are still being crawled. For once assign someone to help me or someone who can take the matter in his hands otherwise I will have to leave this plugin.

    Why rankmath are you not able to help me fix my problem? How many days should I suffer
    Confidentpersonality.com
    Google search console is displaying this
    If your robots.txt file is malformed, crawlers may not be able to understand how you want your website to be crawled or indexed. Learn more about robots.txt.
    Line #
    Content
    Error
    8
    Sitemap:
    Invalid sitemap URL
    9
    https://confidentpersonality.com/sitemap_index.xml
    Unknown directive

    Also my feed and swcpcf=1 pages are still being crawled. For once assign someone to help me or someone who can take the matter in his hands otherwise I will have to leave this plugin.

    User-Agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Disallow: */feed/
    Disallow: *swcfpc=1*
    Disallow: /cdn-cgi/

    Sitemap:
    https://confidentpersonality.com/sitemap_index.xml

    This is the robots.text please ensure to solve all problems by dedicating an agent for half an hour.

    Hello,

    Apologies for the issue but the robots.txt is still in the wrong format.

    Rank Math support

    Please use this rules instead:

    User-Agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Disallow: */feed/
    Disallow: *swcfpc=1*
    Disallow: /cdn-cgi/
    
    Sitemap: https://confidentpersonality.com/sitemap_index.xml

    As for the issue with the swcfpc=1 pages getting indexed by Google. Let’s try forcing it to set to noindex in their robots meta.

    Please use this filter code for your website:

    add_filter('rank_math/frontend/robots', function ($robots) {
    	$url = home_url($_SERVER['REQUEST_URI']);
    	if (strpos($url, 'swcfpc=')) {
    		$robots["index"] = 'noindex';
    		$robots["follow"] = 'nofollow';
    	}
    	return $robots;
    });

    You may refer to this guide on how to add filters to your website: https://rankmath.com/kb/wordpress-hooks-actions-filters/

    Let us know how that goes. Looking forward to helping you.

    I corrected the code. Tell me one thing if these pages exist in crawled but not indexed then is there any problem. Also when I check their url in inspection tool it shows blocked by robots.txt

    Hello,

    We tested your site and can see that the robots.txt is now set up correctly and blocking those URLs. After Googlebot has crawled your site for a while it will no longer report it as crawled but not indexed.

    The report of being blocked by robots.txt in the URL inspection tool indicates that it is now setup correctly.

    Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

Viewing 11 replies - 1 through 11 (of 11 total)

You must be logged in to reply to this ticket.