Robots files query

#784600
  • Could you please guide me on how to block all URLs except some specific ones?

    For example:

    I want to block all posts on my website not pages.

    https://abc.com/10-off-orders-over-79/
    https://abc.com/oyo-rooms-hdfc-offer/

    But I want Google should crawl these specific URLs only.

    https://abc.com/store/

    Could you please guide me on how to write a robots.txt file for this?

    Is this Robots.txt correct?

    User-agent: *
    Disallow: */

    Allow: */store/
    Allow: /contact-us
    Allow: /privacy-policy
    Allow: /terms-of-use
    Allow: /about-us
    Allow: /sitemap

    Thanks

Viewing 15 replies - 1 through 15 (of 18 total)
  • Hello,

    I apologize for the issue you are facing. Thank you for reaching out with your query about blocking specific URLs in your robots.txt file while allowing Google to crawl certain ones.

    To block all posts on your website except specific URLs like /store/ and allow Google to crawl those specific URLs, you can use the following robots.txt rules:

    User-agent: *
    Disallow: /*
    
    Allow: /store/
    Allow: /contact-us
    Allow: /privacy-policy
    Allow: /terms-of-use
    Allow: /about-us
    Allow: /sitemap

    Your robots.txt file seems correct based on the information provided. It blocks all URLs except the ones specified with the Allow directive. Just ensure that the paths you have mentioned after Allow: are correct and match the actual URLs on your website.

    If you encounter any issues or need further assistance, please feel free to ask.

    Thank you.

    Sources:
    What Are the Best Robots.txt File Rules for Multisite? » Rank Math
    How to Edit Your Robots.txt With Rank Math SEO » Rank Math
    Common robots.txt Issues & How to Fix Them » Rank Math

    This is an automated reply from Rank Math Support Bot. Your ticket has been assigned to a human agent and we will get back to you shortly.

    Hello,

    Thank you so much for getting in touch.

    The robots.txt configuration you shared seems all right and we have tested it here.

    We would like to address the directives for the sitemap. It should go like this:

    
    Sitemap: https://abc.com/sitemap_index.xml

    Please try it and see if it works.

    Looking forward to helping you.

    Hello,

    I have updated the sensitive data as requested. Can you please check further?

    Thank you.

    Hello,

    Besides the allowed pages mentioned in the robots.txt file, can you please confirm if you also intend to indeed your homepage?

    Looking forward to helping you.

    Hemant Kumar
    Rank Math free

    Yes. I want to crawl the home page also of my site. I want to stop Google from crawling posts. Just want to crawl pages and the website home page.

    Hello,

    To stop search engine bots from indexing your posts, please head over to WordPress Dashboard > Rank Math SEO > Titles & Meta > Posts and then select the noindex option under the Post Robots Meta as shown below:

    This will set all your posts to noindex.

    If this option isn’t available for you, make sure that you’re using the Advanced mode of Rank Math.

    Hope that helps, and please do not hesitate to let us know if you need our assistance with anything else.

    Thank you.

    I already enabled Noindex. But still, Google is crawling these posts. Waste of Crawl budget for my website. I don’t want to waste the crawl budget on unused Posts.

    Hello,

    To allow only the pages you mentioned and the homepage you can add the following rules, if the homepage has a trailing slash:

    
    User-agent: *
    
    Disallow: /
    
    Allow: /$
    Allow: /store
    Allow: /contact-us
    Allow: /privacy-policy
    Allow: /terms-of-use
    Allow: /about-us
    Allow: /sitemap
    
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php
    
    Sitemap: https://domain.com/sitemap_index.xml
    

    If the homepage doesn’t have a trailing slash the first allow rule should be the following:

    
    Allow: $
    

    You might need to make further modifications depending on whether you want to crawl sub-pages but this at least is a good starting point to start testing some of the pages.

    Don’t hesitate to get in touch if you have any other questions.

    When I submit a post I get a robots.txt error on Google Search Console, it’s not even indexed, when I check the settings here it shows that robots.txt files are not found – N/ A (We were unable to fetch the robots.txt file.) Whenever I do live testing my post is showing error (Failed: robots.txt unreachable) Please give me solution.

    Hello @dipakjaiswal1,

    Please create a new ticket and share your website URL with us. You can create a new ticket from here: https://support.rankmath.com/new-ticket/

    Looking forward to hearing back from you.

    Still, Google is crawling POSTS. Please check the screenshot. https://snipboard.io/G42wvY.jpg

    Please check the robot txt code and provide me with the correct code.

    Hello,

    The robots.txt file from your website that is currently available to Google is not the same one we shared previously.

    If you test our own implementation in the testing tool you’ll see that the page you shared will not get crawled:
    Disallow

    So, you would need to make sure that you update the file correctly with those rules and then clear the cache to allow Google to load the correct instructions.

    Don’t hesitate to get in touch if you have any other questions.

    Please check now. It blocked my website’s Home page also.
    https://snipboard.io/pStgcQ.jpg

    Hello,

    We mentioned that if you have the trailing slash for the homepage you should use the following allow rule:

    
    Allow: /$
    

    You are currently using the one without the trailing slash so this is the expected result.

    You need to check what is the canonical of your page and set the rule based on whether it has the trailing slash or not.

    It’s very important that you read the instructions carefully when dealing with the robots.txt file.

    Thank you.

    Now it is blocking the Home page also.

    https://snipboard.io/jNcPTW.jpg

Viewing 15 replies - 1 through 15 (of 18 total)

You must be logged in to reply to this ticket.