Issue with Post Sitemap Splitting, Duplicate URLs, and XML Error

#1008367
  • Resolved Aditi Negi
    Rank Math free

    Hi RankMath Team,

    We’re facing an issue with our blog post sitemap. Earlier, each sitemap (under /post-sitemap.xml) used to contain around 500 URLs. Recently, we started getting the following error when opening the sitemap:

    “error on line 2 at column 6: XML declaration allowed only at the start of the document.”

    We tried the usual fixes like removing whitespace, but the problem persisted. Eventually, we realized the issue was due to sitemap file size limits.

    To fix it, we changed the number of URLs per sitemap to 200 — the error was still there. Finally, at 100 URLs per sitemap, the error disappeared.

    Now we have 7 paginated blog sitemaps (like /post-sitemap1.xml, /post-sitemap2.xml, etc.), but I noticed some blog URLs are appearing in multiple sitemaps.

    Here are my main concerns:

    Is there any way to fit all 671 blog URLs into a single post sitemap?

    If not, will the presence of duplicate URLs across multiple post sitemaps cause any crawling or indexing issues?

    What’s the recommended way to resolve this and ensure proper sitemap structure without duplicate entries?

Viewing 3 replies - 1 through 3 (of 3 total)
  • Hello,

    We apologize for the issue you are facing with your blog post sitemap. Thank you for providing detailed information about the problem.

    Regarding your concerns:

    1. Fitting All URLs into a Single Post Sitemap: The standard practice for sitemaps is to limit the number of URLs to 50,000 per sitemap file, but it’s generally recommended to keep it lower (like 200) to avoid server load issues. Unfortunately, Rank Math does not support combining all URLs into a single sitemap if it exceeds the limit. Therefore, you will need to maintain multiple sitemaps as you currently have.

    2. Duplicate URLs Across Multiple Post Sitemaps: Having duplicate URLs across multiple sitemaps should not cause significant crawling or indexing issues, as search engines are designed to handle such scenarios. However, it is best practice to avoid duplicates to ensure that search engines can efficiently crawl your site without confusion.

    3. Recommended Way to Resolve This: To ensure proper sitemap structure without duplicate entries, consider the following steps:

      • Check for Exclusions: Ensure that you are not unintentionally excluding any posts from appearing in the sitemap settings. You can do this by going to Rank Math SEO → Sitemap Settings and checking the options for each post type.
      • Review Custom Canonicals: If you have set custom canonical URLs for any posts, they may not appear in the sitemap. Ensure that all posts have the correct canonical settings.
      • Flush Permalinks: Sometimes, flushing permalinks can help resolve issues with sitemap generation. You can do this by navigating to Settings → Permalinks in your WordPress admin area and clicking “Save Changes.”
      • Check for Plugin Conflicts: Use the Health Check plugin to identify any potential conflicts with other plugins that might be affecting your sitemap.

    If you continue to experience issues or if there are specific URLs that are duplicating, please reply to this ticket with your site URL so we can assist you further.

    Thank you for raising this concern, and we hope this helps resolve your sitemap issues!

    Sources:
    Rank Math Sitemap Issues and Their Fixes » Rank Math

    Hi,
    Since the recommended limit is around 200 URLs per sitemap, and we currently have 671 blog posts live, I understand that having multiple post sitemaps is expected.

    However, I wanted to ask:
    Does having duplicate URLs across multiple post sitemaps cause any issues with crawl budget or have any negative SEO implications?
    If yes, what’s the best way to resolve this and avoid duplication while maintaining multiple sitemaps?

    Hello,

    Sitemaps are only one of the few methods Google uses to discover your URLs, so it is fine for your articles to be referred to other URLs on your sitemap.

    However, your posts should not be duplicated in another sitemap. Please exclude the Sitemap files of the Rank Math plugin in your caching plugin. The cache could be via a plugin or from the server. For plugins or Cloudflare, please follow this article:
    https://rankmath.com/kb/exclude-sitemaps-from-caching/

    If the issue persists, disable the transient cache for the sitemap by applying this filter. This will make sure that your XML sitemap always reflects the latest changes on your site:

    /** 
     * Filter if XML sitemap transient cache is enabled.
     *
     * @param boolean $unsigned Enable cache or not, defaults to true
     */
    add_filter( 'rank_math/sitemap/enable_caching', '__return_false');

    If you’re not sure how to add this code, you can follow this guide:
    https://rankmath.com/kb/wordpress-hooks-actions-filters/

    Looking forward to helping you.

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this ticket.