-
hi kindly refer to sensitive data for the full info.
-
Hello,
Thank you for contacting Rank Math for help with your crawling and indexing issues.
Question 1: I am unsure why that 2k pages with /feed/rss/ is added to “Crawled – currently not indexed” (means it is possible to index such spam pages). Would be great to have an explaination on this.
/feed/rss/ URLs are not spam and are normal WordPress URLs that are generated automatically for RSS feeds. If your website does not have syndication of posts (automatic content sharing for republishing on other platforms like podcasts, news sites etc), you can disable RSS feeds. Please follow this guide for how: https://www.wpbeginner.com/wp-tutorials/how-to-disable-rss-feeds-in-wordpress/
Question 2: How do we add the code so that it reflects and make sure that those 2k + spam pages are not able to be indexed.
Your robots.txt is correctly set up to disallow crawling of your RSS feed URLs. The already crawled URLs will eventually be removed from your GSC report.
How do we removed all these spam pages?
It looks like you have done everything necessary, setting search results to ‘noindex’ and setting disallow on /search/ and /feed/ URLs in your robots.txt. You will need to wait up to a few months for URLs to be cleared from your GSC report.
Hope that helps. Please let us know if you have questions.
Hi
Thanks for your reply.
Q1: /feed/rss/ URLs are not spam and are normal WordPress URLs that are generated automatically for RSS feeds.
Yes. I agreed with this. However, my site was spammed by Internal Spam Search Spam.
Thousands of RSS feeds URLs like gambling stuff shows up in these urls.
“They target RSS feed versions of search results (e.g., /search/[spam]/feed/rss2/). This is particularly clever, and I suspect the main (or most impactful) example. That’s because other systems actively seek out and consume RSS feeds, and often convert URLs into links. That creates a link back to the attacking site on many more sites. Your WordPress site is just part of a ‘man in the middle’ attack.”
Here’s an example of the URL which is spammed >> https://snipboard.io/oA4xP2.jpg
Q2: Your robots.txt is correctly set up to disallow crawling of your RSS feed URLs.
Can you check the spam URL in the sensitive box if it is indexable. I checked on my end it still shows it is still indexable even in the code.
I try a different way. For example, i try to publish a post like domain.com/search/
By default, the code like Disallow: /search/ in robots.txt should block the post from crawling. But after publishing, it seems like the post is indexable.
You can see the screenshot i shared in the sensitive data. You’ll see the tool shows that the URL is live and it is possible to index it.
Q3: It looks like you have done everything necessary, setting search results to ‘noindex’ and setting disallow on /search/ and /feed/ URLs in your robots.txt.
Instead of just blocking it, how to take down these spam feed URLs with all those gambling keywords?
Even if you block it, those spammed URLs are still live on the site.
Hello,
RSS feed attacks can be caused due to various reasons. Most primarily due to database or corrupt WordPress installation. The most common issue reported with Adsense injection.
My course of action will be:
1. Disable / change FTP password if you’re using.
2. Change DB password to something strong.
3. Use WordFence plugin to set the security.Do you actively use your RSS Feed? If not, you can disable the access to user or even completely disable that. Also, you could test blocking that page from visitors in Wordfence.
Q1: /feed/rss/ URLs are not spam and are normal WordPress URLs that are generated automatically for RSS feeds.
As even after running the scan, nothing was found. I would suggest running a database search as well. If there’s any discrepancy, try removing that. It’s hard to pin-point the actual reason for hack, we can add preventive features to prevent more misuse. I would suggest using Wordfence and stop access of the rss feed.
By default, the code like Disallow: /search/ in robots.txt should block the post from crawling. But after publishing, it seems like the post is indexable.
Disallow directive set in robots.txt file is for search engines not to index those pages. Once they encounter those directives in robots file, they’ll skip it and move to the next. It doesn’t mean that the code will be changed from “index” to “noindex”.
Question 3: How do we removed all these spam pages?
As there’s no source of infected files, the only place you need to cross-check is the database. Check in the database if those URLs are there and clean them.
After cleaning I would still suggest using any security plugin and disabling RSS if you aren’t using it.
Let us know how it goes.
Thank you
hi
I try to disable in the feed.
But that spam URL still show live page but showing error instead of a 404 page.
>> https://snipboard.io/4FGSCm.jpg
How do we resolve this?
Hello,
The status of the feed URL is already
500
since the feed is already disabled. You can also refer to the screenshot attached in the sensitive data section for your reference.Please note that URLs should have a status code of 200 for them to be qualified for indexing. Here’s a link for more information:
https://www.searchenginejournal.com/google-checks-status-codes-before-anything-else-when-crawling-content/331251/Hope that helps.
Thank you.
Hello,
Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.
If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.
Thank you.
The ticket ‘Internal Site Search Spam’ is closed to new replies.