Robots files query - Support

Hemant Kumar
2 weeks, 2 days ago

Rank Math free

Could you please guide me on how to block all URLs except some specific ones?

For example:

I want to block all posts on my website not pages.

https://abc.com/10-off-orders-over-79/
https://abc.com/oyo-rooms-hdfc-offer/

But I want Google should crawl these specific URLs only.

https://abc.com/store/

Could you please guide me on how to write a robots.txt file for this?

Is this Robots.txt correct?

User-agent: *
Disallow: */

Allow: */store/
Allow: /contact-us
Allow: /privacy-policy
Allow: /terms-of-use
Allow: /about-us
Allow: /sitemap

Thanks

Viewing 15 replies - 1 through 15 (of 18 total)

1 2 →

RankBot
2 weeks, 3 days ago
Hello,

I apologize for the issue you are facing. Thank you for reaching out with your query about blocking specific URLs in your robots.txt file while allowing Google to crawl certain ones.

To block all posts on your website except specific URLs like /store/ and allow Google to crawl those specific URLs, you can use the following robots.txt rules:
```
User-agent: *
Disallow: /*

Allow: /store/
Allow: /contact-us
Allow: /privacy-policy
Allow: /terms-of-use
Allow: /about-us
Allow: /sitemap
```
Your robots.txt file seems correct based on the information provided. It blocks all URLs except the ones specified with the Allow directive. Just ensure that the paths you have mentioned after Allow: are correct and match the actual URLs on your website.

If you encounter any issues or need further assistance, please feel free to ask.

Thank you.

Sources:
What Are the Best Robots.txt File Rules for Multisite? » Rank Math
How to Edit Your Robots.txt With Rank Math SEO » Rank Math
Common robots.txt Issues & How to Fix Them » Rank Math

This is an automated reply from Rank Math Support Bot. Your ticket has been assigned to a human agent and we will get back to you shortly.
Jeremy
2 weeks, 2 days ago
Hello,

Thank you so much for getting in touch.

The robots.txt configuration you shared seems all right and we have tested it here.

We would like to address the directives for the sitemap. It should go like this:
```
Sitemap: https://abc.com/sitemap_index.xml
```
Please try it and see if it works.

Looking forward to helping you.
Hemant Kumar
2 weeks, 2 days ago

Rank Math free

Hello,

I have updated the sensitive data as requested. Can you please check further?

Thank you.

Jeremy
2 weeks ago

Hello,

Besides the allowed pages mentioned in the robots.txt file, can you please confirm if you also intend to indeed your homepage?

Looking forward to helping you.

Hemant Kumar
2 weeks ago

Rank Math free

Yes. I want to crawl the home page also of my site. I want to stop Google from crawling posts. Just want to crawl pages and the website home page.

Adetayo
1 week, 6 days ago

Hello,

To stop search engine bots from indexing your posts, please head over to WordPress Dashboard > Rank Math SEO > Titles & Meta > Posts and then select the noindex option under the Post Robots Meta as shown below:

This will set all your posts to noindex.

If this option isn’t available for you, make sure that you’re using the Advanced mode of Rank Math.

Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

Thank you.

Hemant Kumar
1 week, 6 days ago

Rank Math free

I already enabled Noindex. But still, Google is crawling these posts. Waste of Crawl budget for my website. I don’t want to waste the crawl budget on unused Posts.
Miguel
1 week, 6 days ago
Hello,

To allow only the pages you mentioned and the homepage you can add the following rules, if the homepage has a trailing slash:
```
User-agent: *

Disallow: /

Allow: /$
Allow: /store
Allow: /contact-us
Allow: /privacy-policy
Allow: /terms-of-use
Allow: /about-us
Allow: /sitemap

Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://domain.com/sitemap_index.xml
```
If the homepage doesn’t have a trailing slash the first allow rule should be the following:
```
Allow: $
```
You might need to make further modifications depending on whether you want to crawl sub-pages but this at least is a good starting point to start testing some of the pages.

Don’t hesitate to get in touch if you have any other questions.
Dipak Jaiswal
1 week, 6 days ago

Rank Math free

When I submit a post I get a robots.txt error on Google Search Console, it’s not even indexed, when I check the settings here it shows that robots.txt files are not found – N/ A (We were unable to fetch the robots.txt file.) Whenever I do live testing my post is showing error (Failed: robots.txt unreachable) Please give me solution.

Great
1 week, 5 days ago

Hello @dipakjaiswal1,

Please create a new ticket and share your website URL with us. You can create a new ticket from here: https://support.rankmath.com/new-ticket/

Looking forward to hearing back from you.

Hemant Kumar
6 days, 11 hours ago

Rank Math free

Still, Google is crawling POSTS. Please check the screenshot. https://snipboard.io/G42wvY.jpg

Please check the robot txt code and provide me with the correct code.

Miguel
6 days, 8 hours ago

Hello,

The robots.txt file from your website that is currently available to Google is not the same one we shared previously.

If you test our own implementation in the testing tool you’ll see that the page you shared will not get crawled:

So, you would need to make sure that you update the file correctly with those rules and then clear the cache to allow Google to load the correct instructions.

Don’t hesitate to get in touch if you have any other questions.

Hemant Kumar
5 days, 7 hours ago

Rank Math free

Please check now. It blocked my website’s Home page also.
https://snipboard.io/pStgcQ.jpg
Miguel
5 days, 7 hours ago
Hello,

We mentioned that if you have the trailing slash for the homepage you should use the following allow rule:
```
Allow: /$
```
You are currently using the one without the trailing slash so this is the expected result.

You need to check what is the canonical of your page and set the rule based on whether it has the trailing slash or not.

It’s very important that you read the instructions carefully when dealing with the robots.txt file.

Thank you.
Hemant Kumar
5 days, 4 hours ago

Rank Math free

Now it is blocking the Home page also.

https://snipboard.io/jNcPTW.jpg

Viewing 15 replies - 1 through 15 (of 18 total)

1 2 →

You must be logged in to reply to this ticket.