Limiting site crawls - Support

Resolved Stephen Vaughan
1 month, 2 weeks ago

Rank Math free

Hi I want to adjust my robots.txt file so that Google-Extended and Bingbot can crawl everything but for every other user agent I want to discourage scrapping images. Is the following the correct way to do this?

User-agent: *
Disallow: /wp-content/uploads

User-agent: Bingbot
Allow: /wp-content/uploads

User-agent: Google-Extended
Allow: /wp-content/uploads

I am assuming that the * wildcard covers all other user agents and that I won’t have to methodically list every possible variant.

Also, is it feasible to take it back a directory level to /wp-content?

I am using the LLMS.txt option and while I want to give every client’s site a good start many of them are artists and worried about copyright infringements of their creative work.

Of course this is a polite approach and scrapping can happen anyway, but it’s a start.

Viewing 4 replies - 1 through 4 (of 4 total)

RankBot
1 month, 2 weeks ago
Hello,

Thank you for your question regarding the adjustments to your robots.txt file. Your understanding of how the user-agent rules work is correct. The * wildcard does indeed cover all other user agents, so you won’t need to list every possible variant.

Your proposed rules would look like this:
```
User-agent: *
Disallow: /wp-content/uploads

User-agent: Bingbot
Allow: /wp-content/uploads

User-agent: Google-Extended
Allow: /wp-content/uploads
```
This setup will effectively disallow all user agents from accessing the /wp-content/uploads directory while allowing Bingbot and Google-Extended to crawl it.

Regarding your question about taking it back a directory level to /wp-content, you can certainly do that. If you want to disallow access to the entire /wp-content directory for all user agents except Bingbot and Google-Extended, you would adjust your rules as follows:
```
User-agent: *
Disallow: /wp-content/

User-agent: Bingbot
Allow: /wp-content/

User-agent: Google-Extended
Allow: /wp-content/
```
This will prevent all other user agents from crawling anything in the /wp-content directory while still allowing Bingbot and Google-Extended access.

Keep in mind that while these rules can discourage scraping, they are not foolproof, as some scrapers may ignore the robots.txt directives.

If you have any further questions or need assistance with anything else, feel free to ask!

Sources:
What Are the Best Robots.txt File Rules for Multisite? » Rank Math
Using Rank Math’s Robots.txt Generator » Rank Math
Adetayo
1 month, 2 weeks ago

Hello,

You seemed to have marked this ticket as resolved.

We’re just following up with you to see if this is a concern that we can assist you with. Or, if your issue has been resolved, please let us know so we can mark this ticket as closed.

Thank you for choosing Rank Math.

Stephen Vaughan
1 month, 2 weeks ago

Rank Math free

Thanks Adetayo,

much appreciated. I’ll be doing more research into this. I need to give as best advice to clients on where to block and allow AI, depending on the content of their sites.

For example, from SiteGround documentation: We block crawlers intended for AI model training to protect our clients’ website data and intellectual property and save resources. So there are some nuances.

Jeremy
1 month, 1 week ago

Hello,

Yes, there are indeed nuances when it comes to managing AI crawlers and protecting creative content. Your current approach is a solid start, and continuing your research on how different crawlers handle directives will help refine it further.

If you ever need help configuring or testing specific robots.txt rules with Rank Math, we’ll be happy to assist.

RankBot
1 week, 5 days ago

Hello,

Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.

If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.

Thank you.

Viewing 4 replies - 1 through 4 (of 4 total)

The ticket ‘Limiting site crawls’ is closed to new replies.