How to filter one of two multilingual posts with same ID out of the sitemap?

#572168
  • Resolved G
    Rank Math business

    This is a continuation of this support ticket.

    Hi, I am ready to start troubleshooting again. I tried numerous things, but I think I know what the problem is here. Since I have a WordPress website where I use Translatepress each of the job listings (posts) on my site have a double URL. One for English jobs, one for Dutch jobs. With some code I was able to make sure only one of the jobs is set to index and the other one to noindex, as they are not translated and I want to avoid duplicate content.

    So I get this done by this code:

    add_filter( 'rank_math/frontend/robots', function( $robots ) {
      $post_id = get_the_ID();
      if ( get_post_type( $post_id ) === 'job_listing') {
          $current_url = home_url( add_query_arg( array(), $wp->request ) );
          $terms       = wp_get_post_terms( $post_id, 'job_languages', array( 'fields' => 'names' ) );
          if ( in_array( 'Nederlands', $terms ) && strpos( $current_url, '/en/' ) !== false ) {
              if ( isset( $robots['index'] ) && $robots['index'] === 'index' ) {
                  $robots['index']  = 'noindex';
              }
          } elseif ( !in_array( 'Nederlands', $terms ) && strpos( $current_url, '/en/' ) === false ) {
              if ( isset( $robots['index'] ) && $robots['index'] === 'index' ) {
                  $robots['index']  = 'noindex';
              }
          }
      }
      return $robots;
    });

    This works perfectly

    In my sitemap however, still both URLs are submitted. I tried a lot of things, based on the code you guys helped me write a year ago to immediately filter out filled jobs from our sitemap, which also works perfectly.

    add_filter( 'rank_math/frontend/robots', function( $robots ) {
    	if(get_post_type() == 'job_listing' && get_post_meta( get_the_ID(), '_filled', true )) {
    		$robots["index"] = 'noindex';
    		$robots["follow"] = 'follow';
    	}
    	return $robots;
    });

    I messed around with that filter by filtering out the posts with noindex, but no success.

    I think the reason is that one post will always have 2 separate URLs and I only want to submit one of these URLs (the one which says index) to the sitemap and as soon as it becomes noindex it should be deleted from the sitemap. So when in the function I look for the post ID, I will always have one URL with index and one with noindex for the same post ID.

    Any idea how I can get this done? I can give you access to our development environment to troubleshoot. But you should know that each page there is set to noindex to avoid Google from indexing our dev environment. This might making troubleshooting more difficult, but in essence each job you can find in our sitemap should be there once per Post ID, now I have doubles.

    Thanks!

Viewing 10 replies - 1 through 10 (of 10 total)
  • Hello,

    Thank you for contacting the support and sorry for any inconvenience that might have been caused due to that.

    When I try to view the sitemap / admin, I get Access denied error. Please check the sensitive data section for screenshot.

    You can whitelist or open access to countries India, Portugal, Bangladesh, and the Philippines to investigate this issue on your site.

    Looking forward to helping you.

    G
    Rank Math business

    Apologies, access granted for those countries. (as we are a job board for Netherlands only we want to limit who can access our site).

    Does it work now? Thanks!

    Hello,

    The issue here is that you are adding noindex directives to the desired pages on the fly using the filter rank_math/frontend/robots whereas, on the backend, it is configured as index. That is why our sitemap is still including that URL.

    To remove these URLs from your sitemap, you also would need to run another filter to check each entry that will be added to the sitemap.

    You may refer to this filter code:

    add_filter( 'rank_math/sitemap/entry', function( $url, $type, $object ){
    	return $url;
    }, 10, 3 );

    You can tap into the object ($object->ID) to get the current post ID and apply the same condition you did in your rank_math/frontend/robots filter hook.

    Let us know how that goes. Looking forward to helping you on this one.

    G
    Rank Math business

    I tried this but this doesn’t work

    
    add_filter( 'rank_math/sitemap/entry', function( $url, $type, $object ) {
            $post_id = $object->ID;
            if ( get_post_type( $post_id ) === 'job_listing') {
                $current_url = get_permalink( $post_id );
                $terms       = wp_get_post_terms( $post_id, 'job_languages', array( 'fields' => 'names' ) );
                if ( in_array( 'Nederlands', $terms ) && strpos( $current_url, '/en/' ) !== false ) {
                    return false; // do not include in sitemap
                } elseif ( !in_array( 'Nederlands', $terms ) && strpos( $current_url, '/en/' ) === false ) {
                    return false; // do not include in sitemap
                }
            }
        return $url; // include in sitemap
    }, 10, 3 );
    
    

    Anything you see in there that is wrong?

    Hello,

    Instead of evaluating the get_permalink() function, please try evaluating this instead: $url['loc'].

    So the code would look like this:

    add_filter( 'rank_math/sitemap/entry', function( $url, $type, $object ) {
    	$post_id = $object->ID;
    	if ( get_post_type( $post_id ) === 'job_listing') {
    			$terms       = wp_get_post_terms( $post_id, 'job_languages', array( 'fields' => 'names' ) );
    
    			if ( in_array( 'Nederlands', $terms ) && strpos( $url['loc'], '/en/' ) !== false ) {
    				return false; // do not include in sitemap
    			}
    	}
    return $url; // include in sitemap
    }, 10, 3 );

    After applying the code, please try flushing the sitemap cache by following this video guide: https://i.rankmath.com/pipRDp

    If the issue persists, please do let us know and we will share this with our development team for further assistance.

    Looking forward to helping you.

    G
    Rank Math business

    Tried it again but no luck.

    I thought maybe it’s wiser to just filter out everything that is labelled as noindex via frontend robots. messed around with some code there but also no luck. What do you guys suggest?

    Hello,

    Allow me to share this with our developers. We should get back to you with advice or a solution.

    We appreciate your time and patience in the meantime.

    Thank you.

    Hello,

    The TranslatePress plugin already includes an option to remove the translated links from the sitemap and you can access it from WordPress Dashboard > Settings > TranslatePress > Advanced:
    Sitemap

    If you enable that, only the default language of the website will get included in the sitemap and you don’t need to add any custom code for this feature.

    Don’t hesitate to get in touch if you have any other questions.

    G
    Rank Math business

    Hi, sorry, but this is not what I want. I want the URLs with English content to be in the sitemap and I want the URLs with Dutch content to be in the sitemap, but not both, hence custom code is needed. Thanks!

    Hello,

    That is not possible from our plugin because the data our sitemap receives is only for the main language and the other URL generation happens from the code inside the TranslatePress plugin.

    You can clearly see this by dumping the URL data from our sitemap filter and you’ll see only the Dutch permalinks appearing.

    To reverse this order you need to get in touch with them about disabling the language that is currently set as primary (Dutch) instead of the secondary language (English).

    Don’t hesitate to get in touch if you have any other questions.

    Hello,

    Since we did not hear back from you for 15 days, we are assuming that you found the solution. We are closing this support ticket.

    If you still need assistance or any other help, please feel free to open a new support ticket, and we will be more than happy to assist.

    Thank you.

Viewing 10 replies - 1 through 10 (of 10 total)

The ticket ‘How to filter one of two multilingual posts with same ID out of the sitemap?’ is closed to new replies.