Hello,
Thank you for your question regarding the adjustments to your robots.txt file. Your understanding of how the user-agent rules work is correct. The *
wildcard does indeed cover all other user agents, so you won’t need to list every possible variant.
Your proposed rules would look like this:
User-agent: *
Disallow: /wp-content/uploads
User-agent: Bingbot
Allow: /wp-content/uploads
User-agent: Google-Extended
Allow: /wp-content/uploads
This setup will effectively disallow all user agents from accessing the /wp-content/uploads
directory while allowing Bingbot and Google-Extended to crawl it.
Regarding your question about taking it back a directory level to /wp-content
, you can certainly do that. If you want to disallow access to the entire /wp-content
directory for all user agents except Bingbot and Google-Extended, you would adjust your rules as follows:
User-agent: *
Disallow: /wp-content/
User-agent: Bingbot
Allow: /wp-content/
User-agent: Google-Extended
Allow: /wp-content/
This will prevent all other user agents from crawling anything in the /wp-content
directory while still allowing Bingbot and Google-Extended access.
Keep in mind that while these rules can discourage scraping, they are not foolproof, as some scrapers may ignore the robots.txt directives.
If you have any further questions or need assistance with anything else, feel free to ask!
Sources:
What Are the Best Robots.txt File Rules for Multisite? » Rank Math
Using Rank Math’s Robots.txt Generator » Rank Math
This is an automated reply from Rank Math Support Bot. Your ticket has been assigned to a human agent and we will get back to you shortly.
Hello,
You seemed to have marked this ticket as resolved.
We’re just following up with you to see if this is a concern that we can assist you with. Or, if your issue has been resolved, please let us know so we can mark this ticket as closed.
Thank you for choosing Rank Math.
Thanks Adetayo,
much appreciated. I’ll be doing more research into this. I need to give as best advice to clients on where to block and allow AI, depending on the content of their sites.
For example, from SiteGround documentation: We block crawlers intended for AI model training to protect our clients’ website data and intellectual property and save resources. So there are some nuances.