sitemap problems – not being updated, and other issues

#378093
Viewing 11 replies - 1 through 11 (of 11 total)
  • BTW, I’m a pro user (not free)

    Anas
    Rank Math business

    Hello,

    Thank you for contacting Rank Math, and sorry for any inconvenience caused.

    You have opened this ticket with a Free Rank Math account.

    Could you please share the email you used to purchase the subscription?

    For the issues,
    1. Both the URLs are accessible and set to index, however, they are not included in the sitemap.

    Can you please follow these steps?
    1. Flush the Sitemap cache by following this video screencast:
    https://i.rankmath.com/pipRDp

    2. Exclude the Sitemap files of the Rank Math plugin in your caching plugin. The cache could be via a plugin or from the server. For plugins or Cloudflare, please follow this article:
    https://rankmath.com/kb/exclude-sitemaps-from-caching/

    2. Please replace the robots.txt file with the following:

    User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Disallow: */?orderby=price*
    Disallow: */?front_page*
    Disallow: /feed/
    
    Sitemap: https://www.watersportswarehouse.co.za/sitemap_index.xml
    

    You can edit robots.txt with Rank Math:
    https://rankmath.com/kb/how-to-edit-robots-txt-with-rank-math/

    For Page with redirects, once you have fixed the redirects, submit the URLs for validation in the Search Console.

    Please note that crawling and indexing of the website depend on the authority and crawl budget of the website.

    The Discovered – currently not indexed status means that Google knows about these URLs, but they haven’t crawled (and therefore indexed) them yet.

    Here is what Google says about this status:

    Discovered – currently not indexed: The page was found by Google, but not crawled yet. Typically, Google tried to crawl the URL but the site was overloaded; therefore Google had to reschedule the crawl. This is why the last crawl date is empty on the report.

    You need to wait for Google to crawl and index those URLs.

    The Crawled —currently not indexed report indicates that the content is eligible to appear in Google’s index, but Google is electing not to include it.

    You can check this link for more information:
    https://rankmath.com/kb/crawled-currently-not-indexed/

    Hope that helps.

    Thank you.

    Hello,

    I have updated the sensitive data as requested. Can you please check further?

    Thank you.

    I seem to have lost my reply when adding extra sensitive data… I’ll update again..

    Hi

    1. Pro vs Free
    I bought Rankmath pro via this email
    s****@w************************

    2. Excluding the sitemap from cache

    I use Litespeed. I notice the screenshot in your document and your document are different. I intend following your document which says I need to add this to Litespeed
    /(.*)sitemap(.*).xml
    /(.*)sitemap.xsl
    /sitemap_index.xsl
    /sitemap_index.xml

    3. My .htaccess

    Mine is very complex. I’ve added it to sensitive data….

    Do I just add the code you suggest to the top of my .htaccess..??? I will update the htaccess directly from my host file system.

    cheers, Bruce

    Hello,

    1. Would you want us to merge the two accounts together? If so what is the preferred email address to keep using the PRO features?

    2 & 3.The sitemap seems to be updated now and it’s no longer returning those URLs with the query parameter.

    Please note that it is completely normal for a newly published post/page/website to take time before it gets crawled or indexed by Google. It depends on a lot of factors. Your posting frequency + the domain authority are just two of the many factors Google considers when indexing some new URL. Google assigns a crawl budget to your website depending on these factors (especially these two) and that has a direct effect on how soon or how late your content can get indexed.

    Moreover, Google has said multiple times that a good percentage of a website might never get indexed.

    It is normal for 20% of a website to not be indexed, as per Google’s John Mueller:

    https://www.searchenginejournal.com/google-not-indexing-site/416717/

    Finally, it all boils down to the quality of the content, and if your content improves Google’s overall search index.

    Please check this article where one of the Google employees mentioned:

    “If your site relies on manual index submission for normal content, you need to significantly improve your site. Search console does not fix your site, you need to do that yourself.” – JohnMu
    https://www.seroundtable.com/google-manually-index-site-quality-28942.html

    With that said, here’s a good article on how to ensure that your website gets indexed regularly:
    https://ahrefs.com/blog/google-index/

    Hope this helps clarify your doubts.

    Don’t hesitate to get in touch if you have any other questions.

    Hi Migual

    Point 1.
    S****@w************************** is the owner who pays the bills.
    b**********@g******** is my email (personal) address. I’m the developer. You can merge, but please keep my account as the main email… We can always change that if I get someone else to support him.

    Point 2 & 3.The sitemap seems to be updated now and it’s no longer returning those URLs with the query parameter.

    Yes, I’ve just checked them and I also don’t see the funny “parameter URL’s”… BUT…. I’ve done nothing yet..? Should I still put those entries into .htacess and into Litespeed settings????

    I’m guessing I should. Please have a look at my 2 queries and advise me though.

    My Previous question…
    “2. Excluding the sitemap from cache

    I use Litespeed. I notice the screenshot in your document and your document are different. I intend following your document which says I need to add this to Litespeed
    /(.*)sitemap(.*).xml
    /(.*)sitemap.xsl
    /sitemap_index.xsl
    /sitemap_index.xml

    3. My .htaccess

    Mine is very complex. I’ve added it to sensitive data….

    Do I just add the code you suggest to the top of my .htaccess..??? I will update the htaccess directly from my host file system.”

    Lastly. I checked all my sitemaps.

    1. product_sitemap.
    All looks good. as you say, no funny parameters.
    All new products seem to be in the sitemap.
    However, I’m not sure the funny parameters were in my sitemap. They are however in my Google Coverage tab under Google search console

    2. post & page sitemap
    seem ok

    3. Category (wordpress post category)
    looks ok
    4. PWB brand sitemap.
    Looks good. All Brands present, even new onbes.
    5. Local sitemap.
    All good. Where would I update my phone # ???

    6. Product-Category Sitemap.
    Probably my most important sitemap, and the one with issues…
    Some missing URL’s
    https://www.watersportswarehouse.co.za/product-category/surf-gear/rash-vests/
    https://www.watersportswarehouse.co.za/product-category/hydrofoil/wake-foiling/
    https://www.watersportswarehouse.co.za/product-category/easter-special/

    Then All my top level product categories are not in the sitemap. These all have no products attached to them, BUT they rather have subcategories..
    https://www.watersportswarehouse.co.za/product-category/kitesurfing/
    https://www.watersportswarehouse.co.za/product-category/waterski/
    https://www.watersportswarehouse.co.za/product-category/wake/
    https://www.watersportswarehouse.co.za/product-category/inflatable-boat-tube/
    https://www.watersportswarehouse.co.za/product-category/hydrofoil/
    https://www.watersportswarehouse.co.za/product-category/surf-gear/
    But these are the main pages of my site….?????

    If I go into search console and do an inspect URL on these top level URL’s I find them on Google. However it does say that these entries were not found in a sitemap… Surely they should be in the sitemap…

    regards, Bruce

    Hello,

    1. If you would like to keep receiving notifications on this account we should keep both active to prevent any issues.

    2. In the settings for Litespeed we recommend adding the code that we have in the text area as the image only serves as an example of the place and not the final code.

    3. The code should be added to the end of the .htaccess file to make sure it’s the last rule being parsed.

    5. To update any data from this you can head over to each one of your location pages created with Rank Math and add that inside the LocalBusiness Schema.

    If you only have one location you can change that by going to WordPress Dashboard > Rank Math > Titles & Meta > Local SEO.

    6. The top-level categories are not in the sitemap because they are essentially empty and are only used to house all of the subcategories.

    To include any empty categories you should enable that option under WordPress Dashboard > Rank Math > Sitemap Settings > Product Categories.

    Hope this helps clarify your doubts.

    Don’t hesitate to get in touch if you have any other questions.

    Hi Miguel

    I’ve done most of your changes, just not the .htaccess ones. I still have queries on this…
    I’ve just looked at the Google Coverage Tab on Google search console, to try and see what entries from Google I’d like to remove…
    My simple logic is that I should exclude
    1. All Order by URL’s…
    2. All Filter by Attributes
    3. Various other entries I seem to find. Mostly these entries seem to be coming from Google Ads
    I’ll go through each scenario now.

    1. Sort orders on all URL’s
    Surely I would like to exclude all “sort orders” from my site map. So no order by Price, by Rating, by menu_order
    So can I rather insert the below code to exclude all of these.
    /?orderby=*

    2. Different product attributes that I use to filter
    Examples are
    Gender= male, gender=female
    Size =…..
    Ride Capacity=
    These all end up with URL’s that I see on Google search console like
    /?filter_ride-capacity=*
    Or
    /?filter_pwb_list=*

    Can I exclude all of these with this

    /?filter*

    3. Other attribute filters
    They all look like
    /?attribute_pa_size=xxxl
    /?attribute_pa_colour=green

    I can presumably use this to exclude
    /?attribute_pa*

    4. How about the various different page #? Presumably we don’t exclude these are there are products on each page.

    5. Various other

    https://www.watersportswarehouse.co.za/?stock_cd=656 and others

    This to exclude these
    /?stock_cd=*

    6. Other parameters
    I see URL’s like
    https://www.watersportswarehouse.co.za/product-category/hydrofoil/wing-foiling/?gclid=EAIaIQobChMI8LSQ44Cy6wIVNMW7CB2a-AS6EAEYASAAEgJSZPD_BwE

    This to exclude these
    /?gclid=*

    So, Is this a better .htaccess file…

    User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    Disallow: */?dTribesID*
    Disallow: */?orderby=*
    Disallow: */?filter*
    Disallow: */?attribute_pa*
    Disallow: */?stock_cd=*
    Disallow: */?gclid=*
    Disallow: */?front_page*
    Disallow: */feed/

    Sitemap: https://www.watersportswarehouse.co.za/sitemap_index.xml

    Hello,

    The first thing to note is that the only code that should be added to the .htaccess file is the one mentioned in the caching tutorial here: https://rankmath.com/kb/exclude-sitemaps-from-caching/#htaccess

    That code should only be added there if all the other options via the plugins don’t work on your website to generate the sitemap dynamically.

    The rules you mentioned throughout the reply should be added in the robots.txt file as those rules belong there and must be added before the last entry for the sitemap.

    Regarding the rules, those are correct to prevent Google from crawling those pages but the ones already present on SERPs will remain there unless you request their removal with the help of the following article: https://support.google.com/webmasters/answer/9689846?hl=en

    Last but not least, those URLs are no longer on the sitemap and they might even have been discovered via internal linking on your website, as that is a method that Google uses to discover URLs and not only via the sitemaps.

    For example, if you reference those pages with query parameters anywhere on your pages and Google sees those it will crawl them and try to index them.

    Hope this helps clarify your doubts.

    Don’t hesitate to get in touch if you have any other questions.

    Thanks for pointing out my error re robot.txt and .htaccess… I got that wrong.

    My sitemaps look 100%.
    My inclination is to leaves things now.. i.e. not put those rules in tobot.txt, and not tell google to remove these entries…

    Should I resubmit my new site (well updated) sitemap, or wait until google reads it again???

    cheers, Bruce

    Hello,

    Usually re-submitting the sitemap indicates to Google that it should be read again which can speed up the process initially.

    You can submit the sitemap to GSC by following the steps here: https://rankmath.com/kb/submit-sitemap-to-google/

    Don’t hesitate to get in touch if you have any other questions.

    Hello,

    Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.

    If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.

    Thank you.

Viewing 11 replies - 1 through 11 (of 11 total)

The ticket ‘sitemap problems – not being updated, and other issues’ is closed to new replies.