Why so many hreflang errors?
March 19, 2024
Why so many hreflang errors?
March 19, 2024

Why is Hreflang So Annoying to SEOs?

Unsurprisingly, at another Search Conference, Gary Illyes felt the wrath of frustrated SEOs toward hreflang. For the past few months, I have been working on several Hreflang-specific online courses and have spent much time understanding the reasons for this frustration and how to mitigate it. 

Google is interested in alternative methods of hreflang because SEOs are frustrated.  Unfortunately, we are looking at this problem from the wrong viewpoint.  Can we reframe the problem to understand why they are frustrated and why most SEOs want to eliminate hreflang?

Why is hreflang so hard to implement correctly? Do we really think a different method can fix the fundamental problem of dysfunctional websites, organizational structures, and lack of rigid compliance and enforcement of the technique we have?

TLDR:

  • Hreflang frustrates SEOs, so let’s change the method.
  • Why not fix the root causes of frustration?
  • Inconsistent infrastructure complicates implementation.
  • Organizational dysfunctions make deployment nearly impossible.
  • Hreflang syntax rigidity restricts the freedom of developers to express themselves and copy other’s expressions. 
  • Google should enforce current requirements better and educate companies.

Solving for the Frustration

Many SEOs are frustrated and have mental scars from trying to wrangle the tangled mess we call global infrastructure and negotiate simple changes at a worldwide level—many advocate for hreflang to go away so they don’t have to deal with it.  

Maybe we should stop complaining about how frustrating it is and try to get broader initiatives to fix the root causes I document below.  I find it strange that SEOs have moved mountains to fix underperforming websites to get better core web vitals scores, but DevOps has drawn the line at fixing the multitude of frustrating issues preventing hreflang implementation when there is real ROI attached to the fix. 

I firmly believe it is the lack of publicity and understanding of the revenue benefits of hreflang and the challenges to reaping those benefits.  If global executives knew that their market websites would be excluded from the local search results and the revenue and customer service implications, they would want to take action.  But hreflang is relegated to a market level and to SEOs with only KPIs and no real ability or authority to impact meaningful change.  

The real problem is that those at the level deciding to fix are not often aware of the challenges and benefits. I guarantee that if a CEO or board knew they were losing $2 to $20 million a month due to cannibalization, they would fix their organizational and infrastructure problems.  With over 100 companies adding Google warning statements to their annual financial reports maybe it is time to indicate this is a real global problem.

Hreflang is Fundamentally Simple

The problem is not hreflang.  Hreflang is fundamentally simple. Hreflang requires you to use one of three methods to indicate the language and target country of the page, then list all of the pages that are alternates and their respective languages and target countries.  Simple right?  

Yes, it can be. Hreflang Builder onboarded a multinational site with 77 market sites, each with ~ 50,000 pages generating its hreflang XML sitemaps in 45 minutes. What sorcery was necessary for this to happen? The domain and URL structure were uniform, requiring only a simple pattern match in Hreflang Builder. This is the same as a website that cares about users and generates a mobile-friendly site that maxes core web vitals scores. 

Conversely, onboarding a smaller site where every product has a different domain structure, URL format, SKU, and item quantities across markets takes significantly longer to implement.  We could use some machine learning to match some of the pages, but manual intervention was needed to align alternates for most pages.

If we really look deep into teh source of frustration we can see a few challenges that have nothing to do with hreflang and will be present with any alternative solution.

Challenge 1 – Fundamental Misunderstanding of the Purpose of Hreflag

The problem, and therefore the need for hreflang, is dealing with market relevance and content duplication challenges. Too many experts believe this is about language, mainly due to the attribute’s name, but it’s more than that. Google can undoubtedly understand the page’s language, which is why some SEOs argue we don’t need hreflang. The problem, and therefore the need for hreflang, is content duplication.  In their quest for global domination, brands create market-specific websites for various reasons.  We are repeatedly told we must localize for market-specific nuances of language, sizes, price etc.  We also have KPIs.  Significant investments of time and resources are needed to launch market websites, so we need to ensure they deliver a return on that investment. For anyone working in this vertical, there are varied degrees of adherence to proper market localization.

I have seen thousands of before-and-after situations with hreflang. Most of the time, a company’s pages were not indexed as being considered duplicates. This is evidenced by a large set of errors in the Duplicate: Google chose a different canonical than the user report. We give examples in the Aspirational Hreflang challenge post.

I will use Nike to illustrate the real problem that hreflang solves. Nike has 78 market sites, with 37 in English.  Of those in English, 22 have the dollar currency symbol.  Google, upon encountering each of these English sites, needs to determine if there is value in having all 37 English pages for a specific shoe or just one.  Google can satisfy a user by having one, but Nike wants all 37 to be indexed and presented correctly to users. Let us say Google agrees. How can Google know what market each is for?  The only signals presented for a market are the folder in the URL, maybe currency, and a country name in the anchor text with a link to the country selector. 

Understanding and correctly applying these scant signals puts a lot of demands on Google. Is there enough of a difference for these seemingly duplicate pages to accept them as having a unique purpose and then use these signals to show it to the market correctly? 

That is the problem in a nutshell. Google must determine if a seemingly duplicate page should be indexed and which language and market version of the SERPs it should be shown. This challenge only gets bigger as we fold in the other 41 market and language versions.  The hreflang attribute disambiguates this uncertainty, gives each alternate a specific purpose, and simply indicates to whom it should be presented.  Why would we not want this gift from the Google gods?

Challenge 2 – Web Infrastructure Chaos

It is not that we don’t want this gift, because those who use it have benefitted greatly. We estimate that Hrelfnag Builder has captured over $100 million in lost revenue for clients that implemented hreflang correctly over the years.

The biggest challenge in implementing hreflang is aligning the alternate pages to each other. This happens because websites have little to no uniformity, resulting in multiple structures across markets. With Hreflang Builder, we have needed to develop forty automated methods of identifying and mapping alternates, yet many sites still need humans to indicate the actual alternate page. 

I have identified 87 different variations of setting the domain and country language.  This ranges from combinations of ccTLD to gTLDs with subdomains and folders to both with parameters.  We just set up a project with 12 variations, including the county and language folders at the 5th and 6th levels and a regional set of sites with the region at level 6 with parameters to set the language. 

We just set up a project with 12 variations, including the county and language folders at the 5th and 6th levels and a regional set of sites with the region at level 6 with parameters to set the language. 

When asked why this was set up this way, the reasons ranged from representing organizational hierarchy to CMS variations, licensing costs, and market requests. Let’s move a level down with URL structures. This is where SEOs and organizational silos have done the most damage—Keyword-rich folders and page names, localization, and categorization cause variations between sites. 

Even with forced logical structures for categories and products from Sales Force Commerce Cloud and Shopify, these vary due to different SKUs and IDs, and a simple lack of coordination in setting these variables often results in no two markets using the same ID despite being the same product. In this mapping example below, we see multiple variations from localized folders to SKUs with regional or market prefix to sites on old CMS without any unique identifier.

Much of this inconsistency is caused by those in the CMS of the year club, where there might be three to 17 different CMS systems across the globe.  Multiple CMS implementations result in URL variations as well as merchandising challenges.  For example, a consumer products company has some markets with a leading product page with selectors for sizes and colors. In contrast, other markets have individual pages for each variation but not a main page.  What do you set as the alternate to the main product page?  Is this a hreflang or Google’s problem?

Once, I suggested to an executive that their URL structure must have been inspired by monkeys throwing spaghetti against the wall.  I was summoned by the CTO to explain my insult.  He proclaimed great thought and logic were used, referring to his URL structure logic as “pure elegance.” I exclaimed as elegant as a Pollock painting as I showed the mapping matrix.  He was shocked at how the markets deviated from his masterpiece and admitted that the monkeys could have done better. 

It is not always dysfunctional. Companies may use separate CMS pods to manage tax and legal variations. Sites on different CMS instances cannot share a parent-child relationship, preventing automated hreflang. Oh, and god forbid we have any coordination in structures between pods, but that rant is for our next challenge.

Challenge 3 – Organizational Chaos

Organizational dysfunction is where hreflang comes to die. The more decentralized the organization is managed the more geopolitical and organizational challenges are present resulting in a much higher project mortality rate.  At a recent International Search Summit in Barcelona one Global SEO told me their hreflang committee had 70 members which is why it has not been implemented.

Due to the decentralized structures, many CMS or teams may be managing the website, and getting them to collaborate is nearly impossible. Especially for e-commerce companies that prioritize configurations and other sales-related projects, it is often impossible to get hreflang onto a dev cycle despite documented financial loss.

After three years of wading through organizational bureaucracy, we just onboarded a global auto company. The challenges initially ranged from who would pay to what markets needed. Then, they morphed into how to bypass market and agency GSC gatekeepers to make KPI adjustments to appease those markets that benefitted from cannibalization.

If you can believe it, a significant political challenge is getting the beneficiaries of cannibalization to give up their gains.  I have seen dozens of hreflang projects delayed or stopped due to a dominant market’s unwillingness to lose traffic they should never have received. This included a company with identified monthly cannibalization costs of $4 million. Still, the US team refused to discuss a hreflang solution for fear of failing to meet their SEO traffic KPIs. Since they controlled the website, they had all of the power.

There is always a challenge with who will pay for the solutions and the resources to fix site structures or map alternates.  Most decentralized organizations do not have budgets for global solutions, relying on all markets to chip in, and many do not have the budget to allocate. 

We have multiple Hreflang Builder clients now, with a regional SEO agency that only wants to manage the hreflang for markets in their remit. Despite knowing all sites should be managed, they either don’t like the drama or cannot collaborate with others due to competitive reasons, beneficiaries, or cannibalization.  Unfortunately, hreflang spans organizational structures, partners, and agency remits. 

Many companies with infrastructure or organizational challenges struggle with the decision to build or buy a hreflang solution.  They look at the standard and ask how hard it could be to set the language and country and map alternates.  Typically a cocky programmer boasts they can build it over the weekend, not realizing the rats nest they have as a global infrastructure.  This often results in major delays and a refusal to use alternative solutions as a bridge.

Challenge 4 – Rigidity of Hreflang syntax

Many argue the syntax is too rigid and that they should be able to adapt it as they see fit.  What other HTML do developers believe they can alter at will? I have collected dozens of comments like this one from Myriam Jessier from LinkedIn about deviations to the standard.

A recent poll on Linkedin found sites have incorrect country codes for the following reasons.

  • 40% were not aware or did not read the standard.
  • 40% deviated from the standard because they needed a different setup
  • 20% deviated because they copied a competitor.

Google needs to stop allowing deviations from the standard. I created a soon-to-be-released 2.5-hour course on troubleshooting hreflang errors and documented dozens of cases in which Google reps indicated they might figure out what the site means. SEOs shared that as a valid deviation from the standard.  

If you use en-uk rather than en-gb, guess what? It’s invalid. Relative URLs are invalid, and no self-referencing is also invalid, not just the pair but the cluster. An 8-tweet exchange by an SEO argued that underscores between languages and countries must be okay since Google did not give an error in the old hreflang report. This should have been one response: “Standard says dashes,” period.

Although no one in the poll selected this option, a reason to deviate from the standard can be due to CMS inflexibility. Most CMS use a two-letter designation to set language and country folders.  The problem is that no regional abbreviation exists so that they may use LA for Latin America, ME for the Middle East, and AS for ASEAN or APAC. Of course, let’s not forget GL for their global site.  Then, the CMS uses these codes for Hreflang. The problem is that some codes are for specific countries, not regions.  For example, LA is Laos, AS is American Samoa, ME is Montenegro, and GL is Greenland, throwing chaos into the regional designation.  I have had countless SEOs argue that Google can figure this out, and others state that it was never given an error in the old GSC hreflang reporting tool, so it must be okay.

Another significant challenge with the syntax is the requirement for cross-validation of the alternate pair. Cross-verification needs to happen to prevent manipulation, and Google needs to go to each page to reciprocate the alternates.  As noted, this is a challenge to implement but is further complicated by crawling and JavaScript rendering challenges.  For an organization that uses hreflang tags for 50 market sites using heavy JavaScript, Google must fetch and render all 50 alternates before it can validate the cluster. Depending on your crawl quality and frequency, some pairs may validate quicker, but completing all validations can take a while.  This is solved by using hreflang sitemaps.

Challenge 5 – Lack of Data and Reporting

Sharing an early edition with some friends, a few flagged the lack of data, information, and reporting as a major challenge. Aleyda and a few others on LinkedIn noted frustration with Google not clarifying the canonical choices of one market page over another. Previously, we had the hreflang reports in GSC that would flag various errors. The best we can do now is to use the Duplicate; Google chose another canonical report.

Most rank-checking tools don’t have a Brand Family Cannibalization report showing if another market’s pages rank instead of or higher than the intended market.  Rank Ranger was the first to offer this report by modifying their competitor report to allow a brand to add multiple domains to identify cross-market cannibalization.

What is the real solution?

In the last part of his note, Gary from Google seeks ideas for a less annoying solution to delivering the same information. For the past year, I have tried to review different options for hreflang implementation, but nothing is more effective than our method. From my point of view, I created a list of alternative methods for hreflang and their pros and cons and emphasized the need for a global search roundtable. 

While looking for an alternative to something that works fine for several websites, let’s take some time to reframe the problem and see where we can make changes. As crawling, organic SERPS, and globalization become more complex, we need to develop awareness and solutions to generate real change in the organization, with content management systems and an overall understanding of hreflang and its implications for the business.