add_filter( 'rank_math/frontend/robots', function( $robots ) {
$post_id = get_the_ID();
if (get_post_type($post_id) === 'job_listing') {
$current_url = home_url(add_query_arg(array(), $wp->request));
$terms = wp_get_post_terms($post_id, 'job_languages', array('fields' => 'names'));
if (in_array('Nederlands', $terms) && strpos($current_url, '/en/') !== false) {
$robots["index"] = 'noindex';
$robots["follow"] = 'follow';
} elseif (!in_array('Nederlands', $terms) && strpos($current_url, '/en/') === false) {
$robots["index"] = 'noindex';
$robots["follow"] = 'follow';
}
}
return $robots;
});
I wrote this code to noindex the pages I want not to show up in Google. I was wondering whether I also should add a canonical to the correct version and whether I should also write code that takes the “wrong” URLs out of my sitemap. Thought I’d share it already, looking forward to your response!
So I kept improving my code. An SEO expert advised me to set the pages “noindex” and set a canonical as well. Right now I have the following code. I first tried using a Rank Math filter but this conflicted with my $robots filter, so I found this solution:
function modify_rank_math_canonical() {
$post_id = get_the_ID();
if (get_post_type($post_id) === 'job_listing') {
$current_url = home_url(add_query_arg(array(), $wp->request));
$terms = wp_get_post_terms($post_id, 'job_languages', array('fields' => 'names'));
$page_uri = get_page_uri($post);
$home_url = 'https://' . $_SERVER['HTTP_HOST'];
$full_dutch_slug = $home_url . '/vacature/' . $page_uri;
$full_english_slug = $home_url . '/en/vacancy/' . $page_uri;
if (in_array('Nederlands', $terms) && strpos($current_url, '/en/') !== false) {
remove_action('wp_head', 'rank_math_canonical');
echo '<link rel="canonical" href="' . esc_url($full_dutch_slug) . '" />' . "\n";
} elseif (!in_array('Nederlands', $terms) && strpos($current_url, '/en/') === false) {
remove_action('wp_head', 'rank_math_canonical');
echo '<link rel="canonical" href="' . esc_url($full_english_slug) . '" />' . "\n";
}
}
}
add_action('wp_head', 'modify_rank_math_canonical', 11);
add_filter( 'rank_math/frontend/robots', function( $robots ) {
$post_id = get_the_ID();
if ( get_post_type( $post_id ) === 'job_listing' ) {
$current_url = home_url( add_query_arg( array(), $wp->request ) );
$terms = wp_get_post_terms( $post_id, 'job_languages', array( 'fields' => 'names' ) );
if ( in_array( 'Nederlands', $terms ) && strpos( $current_url, '/en/' ) !== false ) {
if ( isset( $robots['index'] ) && $robots['index'] === 'index' ) {
$robots['index'] = 'noindex';
}
} elseif ( !in_array( 'Nederlands', $terms ) && strpos( $current_url, '/en/' ) === false ) {
if ( isset( $robots['index'] ) && $robots['index'] === 'index' ) {
$robots['index'] = 'noindex';
}
}
}
return $robots;
});
I think this is ok now and this could be well out of the scope from this support, but if you would be able to comment whether this solution is good use of your documentation, that would suffice. Thanks a lot!
Hello,
Thank you for contacting Rank Math for help with duplicate content issues on your multilingual site.
I checked your website and found several pages that had identical content on both the Dutch and English version of your website. For example the following two posts are both in Dutch even though on should be in English:
https://fiks.nl/vacature/dutch-boosting-group-traineeship-systems-engineering-en-systeemdenken-veenendaal-15828/
and
https://fiks.nl/en/vacancy/dutch-boosting-group-traineeship-systems-engineering-en-systeemdenken-veenendaal-15828/
Setting the extra pages to noindex as in the code snippet you shared should work to remove the duplicate content warnings. You can also set canonicals for the correct versions. Ideally, you’d create translations for the duplicate pages that need translations, so you wouldn’t need to noindex any pages.
Hope that helps. Please let us know if you have questions.
Hi,
Unfortunately not yet resolved. The jobs with the above code stop appearing in Google for Jobs. I checked and a job posted a week ago was not in my sitemap yet. Now I just refreshed the sitemap by changing the amount of links that can be present in the sitemap, which updated it.
Now I see the job https://fiks.nl/vacature/saleslift-studio-stage-sales-bij-ambitieuze-sales-developer-amsterdam-20693/ in my sitemap, but it also listed the noindex job:
https://fiks.nl/en/vacancy/saleslift-studio-stage-sales-bij-ambitieuze-sales-developer-amsterdam-20693/
Two questions:
1. My sitemap does apparently not update itself, how do I fix this?
2. Why does a noindex page end up in my sitemap and how do I get rid of it?
Thanks!
Hello,
Thank you for your patience.
If the changes are not reflected to your sitemaps and only refresh upon changing the links per sitemap, please exclude the Sitemap files of the Rank Math plugin in your caching plugin. The cache could be via a plugin or from the server. For plugins or Cloudflare, please follow this article:
https://rankmath.com/kb/exclude-sitemaps-from-caching/
If the issue still persists, please try adding this filter to your active theme’s functions.php file:
add_filter( 'rank_math/sitemap/enable_caching', '__return_false');
Let us know how this goes.
Thank you.
Hello,
I have updated the sensitive data as requested. Can you please check further?
Thank you.
Hi Reinelle,
Thanks!
Unfortunately it didn’t solve it. I added the snippet to my functions.php on our test environment.
These entries are still in the sitemap:
https://sandbox.fiks.nl/vacature/fiks-cloned-first-time-beuningen-20818/
0 2023-02-28 10:06 +00:00
https://sandbox.fiks.nl/en/vacancy/fiks-cloned-first-time-beuningen-20818/
0 2023-02-28 10:06 +00:00
As you can see only the first one is index, the second one is noindex, so it should not be in there right?
I updated the permalinks and the number of pages for the sitemap, then it also did not work yet.
In the sensitive data I added a login for you, so you can have a look if you want.
Through the backend> theme editor > functions.php you can test code if you want. Don’t worry if you break the site, it’s a development site anyways.
Thanks a lot!
Hello,
When I checked your staging site, both URLs you shared were set to noindex and were no longer in the sitemap. Please confirm if this is correct and if you need further assistance.
Looking forward to helping you.
Hello,
I have updated the sensitive data as requested. Can you please check further?
Thank you.
Apologies, the last jobs are both noindex for some reason… Something is setting everything I publish to noindex now. I will figure this out and report back when I have 2 jobs I can share.
My apologies, my team was testing things. I will make sure this will stay untouched:
https://sandbox.fiks.nl/vacature/fiks-one-should-be-expired-beuningen-20849/
https://sandbox.fiks.nl/en/vacancy/fiks-one-should-be-expired-beuningen-20849/
Both show up in the sitemap, where the first one should not, as it’s set to noindex.
Hello,
Thank you for the update. Please let us know when you are ready to continue troubleshooting.
Looking forward to helping you.
Hello,
Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.
If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.
Thank you.