One of the biggest challenges of building the XML site map is matching the alternative URL’s to each other. Currently, we have been able to account for 37 of the 50 or so most common problems associated with correctly building the files. The following are some of the formats we have sorted out and you need to understand the format you are using.
Syntax 1 – Standard Country Code/Language Code Syntax – this is the most common where your UK site is managed under www.mydomain.com/en/uk/ or uk/en type structure and this is uniform across the counties. This can also work for those that use /en-uk/ structures.
Questions: Does this format match for all countries? If you have different placements by country we will have to work with you to organize the initial load.
Syntax 2 – Country Code but Non Local Language – a big problem is that many tools want to force the local language pairing for a country but what happens when you have a site for Norway but the content is in English? Most tools would assign /no/no/ to these URL’s but it actually needs to be created as /no/en – our tool allows you to map a unique country site but set the language to English.
Syntax 3 – Global Language Version – this is the case where you do not use country designators but have common languages. For example, if you use a /es version for “any” Spanish speaking countries then we can set it as “Global Spanish.” You can do this by selecting the global language related to the site.
Syntax 4 – Regional Language Versions – while we know there is no country called LATAM or APAC many sites do this for their regional Spanish or English sites. These are often mapped as /latam/ or /apac/en/ so this makes it hard to map these. We have built-in detectors for you to find URL’s with /apac and then assign them to a single or multiple countries in the region. For more information please review Using HREFLang for Regional Sites.
Syntax 5 – Non-Standard Folders and Country Codes – we have encountered a number of multinationals that have a business unit or product folders before the country and language designators. For example www.bigglobalco.com/business_unit/us/en. For this syntax, we have developed a “Regex” element that allows you to tell the too where your country and language elements are located.
Syntax 6 – Language or Country Parameters – unfortunately, if you are using a top-level domain and language or country parameters to designate the pages for each country or language we are not able to build an HREF for these pages as they most likely are not unique pages. A number of sites use Java plugins that “replace” local language elements but use the same base URL.
Syntax 7 – Default Language – If you have a global site that is not associated to any country or language or if you use IP detection you can set any version of the site to be the default version. Using the x=default option we can tell the search engines this is the version to show when there is not a designated local version.
Custome Syntax – Custom Setup – If you have a global site that has a number of different formats we can set a custom regex and map it.