High Resource Usage Due to Bots Interaction with Product Filters

#974178
  • Resolved Hüseyin Kocatepe
    Rank Math free

    Request for Sitemap Optimization

    Dear RankMath Support Team,

    I hope this message finds you well. I am reaching out regarding a significant resource usage issue on our website caused by bot interactions with our product filters. Over the course of a single day, we experienced a massive 14GB of cache buildup and CPU usage that exceeded 100%, leading to the temporary shutdown of our website.

    After investigating the issue, we identified that both YandexBot and BingBot were causing this spike due to their interactions with the product filters. As a temporary solution, we added the following blocks to our **robots.txt** and **.htaccess** files to prevent further issues:

    **robots.txt**:

    `
    User-agent: Yandex
    Disallow: /

    User-agent: YandexBot
    Disallow: /

    User-agent: YandexBot/3.0
    Disallow: /

    User-agent: bingbot/2.0
    Disallow: /

    Sitemap: https://longoni.com.tr/sitemap_index.xml
    `

    **.htaccess**:

    `
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} YandexBot/3.0 [NC]
    RewriteRule .* – [F,L]

    # Block YandexBot user-agent
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} YandexBot [NC]
    RewriteRule .* – [F,L]
    `

    While these changes have helped mitigate the issue, we believe the root cause is related to our sitemap. Given that you are the developers of RankMath, we would appreciate your guidance on optimizing our sitemap to prevent crawler bots from triggering the product filters, thus resolving the issue in a more permanent way.

    Could you kindly advise us on the best approach to optimize our sitemap in this context?

    Thank you in advance for your help. Looking forward to your response.

    Best regards,
    Hüseyin Kocatepe
    longoni.com.tr
    h****************@g******** or h*****************@g********

Viewing 7 replies - 1 through 7 (of 7 total)
  • Hello,

    We apologize for the issue you are facing with high resource usage due to bot interactions with your product filters. Thank you for providing detailed information about the steps you’ve already taken to mitigate the problem.

    To optimize your sitemap and help prevent crawler bots from triggering your product filters, consider the following approaches:

    1. Review Your Sitemap Structure: Ensure that your sitemap only includes essential URLs that you want search engines to crawl. If your product filters generate numerous URLs, it might be beneficial to exclude them from the sitemap.

    2. Use the robots.txt File: You can further refine your robots.txt file to block specific parameters or paths that lead to product filters. For example:

      User-agent: *
      Disallow: /path-to-your-product-filters/

      This will prevent all bots from accessing those specific paths.

    3. Implement Canonical Tags: If you have multiple URLs for the same product due to filters, ensure that you are using canonical tags to point to the main product page. This helps search engines understand which version of the page should be indexed.

    4. Limit Crawling Frequency: You can set a crawl delay in your robots.txt file to limit how often bots can access your site. For example:

      User-agent: *
      Crawl-delay: 10

      Note that this may not apply to all bots, as Googlebot manages its own crawl rate.

    5. Monitor Bot Traffic: Use tools like Google Search Console to monitor how often bots are crawling your site and which URLs they are accessing. This can help you identify any problematic areas that need further optimization.

    6. Consider Using a CDN: If you haven’t already, implementing a Content Delivery Network (CDN) can help manage traffic spikes and reduce server load during high bot activity.

    For more detailed guidance on managing bot traffic and optimizing your sitemap, you can refer to our articles on bot traffic management and sitemap issues.

    If you continue to experience issues or need further assistance, please reply to this ticket, and a human agent from our support team will take over and provide additional help.

    Thank you for reaching out, and we hope this helps resolve your issue!

    Sources:
    Bot Traffic: How It Affects Your Website and What to Do About It » Rank Math
    Common robots.txt Issues & How to Fix Them » Rank Math

    AI provided good answers, but when I look at our sitemap, there are no links related to product filters. I need guidance on how to properly configure the sitemap in the plugin settings.

    Hello,

    Thank you so much for getting in touch.

    Rank Math’s sitemap does not include dynamic URLs like product filter pages, so the issue likely stems from internal links rather than the sitemap itself. Bots probably discovered these URLs through crawlable links on your site. To prevent indexing, we recommend setting those links to nofollow.

    Additionally, your robots.txt blocks the entire site, not just filter URLs so please review this to ensure it aligns with your goals. Here’s a small modification to the robots.txt directives we can suggest:

    User-agent: Yandex
    Disallow: /renk/turuncu/?filter_renk=*
    
    User-agent: YandexBot
    Disallow: /renk/turuncu/?filter_renk=*
    
    User-agent: YandexBot/3.0
    Disallow: /renk/turuncu/?filter_renk=*
    
    User-agent: bingbot/2.0
    Disallow: /renk/turuncu/?filter_renk=*

    Lastly, you can also use Bing’s URL Inspection Tool to identify how bots are discovering these pages.

    Don’t hesitate to get in touch with us if you have any other questions.

    You’re awesome, thank you! However, there’s still something I don’t understand. I actually selected “Prevent indexing” for the color terms under the WooCommerce attributes menu. Yet, bots are still trying to crawl the dynamic color filters.

    Is there an issue with your plugin’s SEO settings customization related to preventing indexing of terms? Or does this restriction take some time to reflect on bots, similar to DNS changes (e.g., 24-72 hours)?

    Hello,

    The filtered URL is already set to noindex in the page’s source code, which should inform bots they shouldn’t be crawled.

    However, it seems that those bots are still crawling them; that’s why you may need to add the disallow rules for that bot to stop crawling them.

    Looking forward to helping you.

    Thank you

    Hello,

    We are super happy that we could address your concern. If you have any other questions in the future, know that we are here to help you.

    If you don’t mind me asking, could you please leave us a review (if you haven’t already) on https://wordpress.org/support/plugin/seo-by-rank-math/reviews/#new-post about your overall experience with Rank Math? We appreciate your time and patience.

    If you do have another question in the future, please feel free to create a new forum topic, and it will be our pleasure to assist you again.

    Thank you.

Viewing 7 replies - 1 through 7 (of 7 total)

The ticket ‘High Resource Usage Due to Bots Interaction with Product Filters’ is closed to new replies.