Deploying HREFLANG when the global site does not want to participate
September 14, 2021

Last week I encountered one of the latest mistaken reasons for not using HREFLang elements on an enterprise website. The DevOps Manager said they did not need to use it because they are using IP address-based Geo Redirection and users will “always get to the correct country website making hreflang unnecessary.

For those who are new to both the concept of Geo Redirects and hreflang elements, let me try to define and add context. If you have ever visited another country and tried to visit a website you’ve visited in the past, and ended up on the local language or country version of that site, you have experienced IP or Geo Redirection. Geo Redirection is an automated process that looks up the IP (internet protocol) address of the phone or computer you are on (like a form of caller ID), then based on the physical location of that user’s device, redirects them to a website the company believes is the best match for the visitor based on where they are connected to the internet.

For example, a visitor to a website sitting in a coffee shop in Munich Germany would be detected as being both in Germany and in Munich. Then they visit their English link they would be rerouted from the English URL to the company’s German homepage or matching German equivalent URL, which would typically be in German.  A visitor from Mountain View California in the US trying to open the German URL would be routed to the English website.  The question and problem are what if that visitor from Mountain View California is Google trying to visit and index the company’s Germany website?

Hreflang elements are directives that tell participating search engines that this page has alternate language versions and when a user uses the search engine from that market to show the local language alternate rather than the version of the page from the website Google originally thought was the most relevant for the query.

Why is using IP Detection Only a problem?

The problem is it can be identified in the description of the Geo Redirection – a “visitor’s IP address.” When a search engine comes to the website to try to catalog it for scoring and ranking they are viewed as just another visitor and are sent to the site that matches their location. Search engines tend to crawl from their dominant markets with Google and Bing crawl from the US, Yandex from Russia so they are routed to the site designated for that market making it near impossible to get the non-US versions of sites.

Let’s use Google as our example. If Google’s Googlebot requests a page from your German, France, or Japan website and the IP detection maps Googlebot’s IP address to the US, the system will redirect Google to the US website. Every time Googlebot (from the US) tries to visit a non-US website with Geo Redirection they are sent away resulting in ONLY the US site getting indexed. If Google cannot index and score the Germany website, then it is impossible to rank for any German language queries.

If Google only indexes the US site, and like many websites, they have the country in the title tag a user from France or Japan may be hesitant to click since they know it is a US site in US Dollars and shipping will be expensive. What if the majority of searchers do not click the link? Working with multinationals in Latin America especially we found that at least 65% of searchers were not clicking into non-local market websites. If the searcher in Peru does actually click on the US listing they will be presently surprised when they have redirected to the companies Peru website. But that is a subset of users that will take the risk. One of our multinational electronics companies deployed Hreflang Builder which updated their IP detection methodology to grant Google an exception resulting in a 200% increase in regional traffic.

The solution to get the best outcome is to leverage Geo Redirection based on IP address but grant an exception in the redirect rules for search engine user agents. If the request for the Australia page comes from a US IP address but also presents the User Agent Googlebot, the search engine is allowed to pass into the Australian website.

This will allow Google to index the pages and score them. If the websites are also using hreflang, Google will see that there is a version for US English, Australian English, and even UK English. Since it not only has all versions, it understands your assignment of the pages to the language markets. It will present the Australian page to searchers in Australia and the US page to those in the United States.

Are Search Engine Exemptions Cloaking?

A few of the Geo-Targeting vendors actually indicate that giving search engines an exemption is a form of “cloaking” which is giving search engines one page and regular users another. If you think about this process logically, we are NOT switching the page we give to Google we are giving Google the page they requested. In most cases, they are visiting the page because we asked them to via an XML sitemap.

Local Adaptive Page Protocols Confusion

In 2015 Google announced a change in how they crawled and indexed these local adaptive pages. The big change was that they were going to start crawling from various points around the world so if they were set to be IP specific they could reach the local versions of pages. Unfortunately, many consultants and Geo Targeting vendors incorrectly cite this page as to how to manage multinational websites which is not correct. If you have a specific page for Spanish and another unique page for German then you do not have a local adaptive site and the recommendations on this page are not relevant. Below is a direct quote from Google indicating you should have separate URLs for each language version and should leverage hreflang elements.

Note that these new configurations do not alter our recommendation to use separate URLs with rel=alternate hreflang annotations for each locale. We continue to support and recommend using separate URLs as they are still the best way for users to interact and share your content, and also to maximize indexing and better ranking of all variants of your content.

Until about 2020 many e-commerce systems took advantage of Java-based platforms that would swap content based on the location or language preference of the user. For example, the screen capture below shows the experience of a visitor to an Overstock.com page for a specific camera from both an IP in the US (left) and Sweden (right) based on the location the price was in US Dollars and Swedish Kronor but the URL DID NOT change. This meant there was ONLY ONE actual URL for that product but could be represented in multiple languages/currencies. As a result, only the US English version of the page would be indexed. They did not rank well in any other market in the world.

One of the ways that sites have been trying to get around this problem is to append a language or country parameter at the end of the URL. We had to build functionality into Hreflang Builder to accommodate this and in one case their system went nuts and appended every language and country combo to every URL resulting in over 8,000 country and language entries in their Hreflang. In another case, all of these parameter versions were made useless when a DevOps person added a canonical tag for the root URL thereby telling Google not to use any of the local language parameter URLs. Google recently addressed this problem explaining the challenges for Google and if you must use it how to use Google’s URL paramter tool.

How to Correctly Implement IP Detection with HREFLang

To achieve the best results, use Geo Redirection based on IP address but make an exception in the redirect rules for search engine user agents. This way search engines can visit and index all pages no matter where they are crawling from.

Second, and this is the Google recommended way. Properly implement HREFLang. This is how Google will notice that there are alternate versions for different countries, and show the Australian page to Australian searchers and the US page to US searchers.

The Myth of Global Crawling

Crawling the web from the far reaches of the world turned out to be far more complex and, I assume more expensive than anticipated, and in 2017 Google doing so did not create an advantage for them so they slowed down its deployment. Google just last week confirmed that they primarily crawl websites from the United States making IP Address-based Geo Redirects a challenge.

My own tests and those of others indicate very few visits from non-US Googlebot are coming to large e-commerce websites. Google also suggests that you do your own tests by looking up the IP address of the various Googlebot visits to your local sites. Your IP detection vendor should be able to give you this data. <Rant> One such company when pressed for this analysis for a client said that would be a privacy violation to give that data and did not want to grant search engines an exemption since they believed Google was crawling from all countries. The good news is the client changes providers to someone who would grant the exception</Rant>

Login

Lost your password?