From my research international SEO is still struggling with the question of how to localise the URL after the domain name. This post reviews this subject in some detail, but if you would like the executive summary here are my recommendations.
- Don’t forget the basics of best practice URL design.
- Always localise a URL for non-English pages when the suggested localised URL does not have any special characters in them.
- Do not use special characters in landing pages that display next to the root of the domain name e.g. www.mybrand.com/page or at least provide a marketing URL that is used that does not include special characters.
- For all other URL structures I’d like to see more evidence as to the pros and cons, but I believe that they should be fully localised with UTF-8 encoding to ensure backwards compatibility.
To give these recommendations some context lets start with one of the golden rules of best practice URL optimisation – make sure that the structure of your URL describes the content of your page.
This has two benefits.
Most importantly it is helpful to a user, they can establish what a page is about without having seen the content of that page. For example when a link is emailed or shared its very often the URL that is seen first. Keyword context in good URL structures is a fundamental component of increasing the click through rates on a URL.
Search engines also value relevant keyword usage in the URL, indeed SEO experts as part of the 2011 Search Ranking Factors SEOmoz survey rated it the forth most important element of page level keyword usage.
How do you include URLs that describe the content of a page when a language includes or is made up entirely of special characters when traditional convention dictates that best practice dictates that URLs only be made up of ASCII characters (a-z, A-Z, 0-9, -, ., _)?
So herein lies the dilemma with URL structures – do you localise the URL or not? Is a localised URL better for SEO, usability and crucially for a websites users? Will a localised URL break old browsers or my website? What are the technical issues with special characters in URLs?
How are special characters included in URLs?
Regardless of whether its right or wrong for a good localised URL design special characters are technically implemented in URLs through the use of percent encoding, also know as URL encoding.
Example of percent encoding: B3%D0%B5%D0%BD%D0%B8%D0%
Currently the UTF-8 format is used as the specification for encoding of any special characters that traditionally were not supported in a URL.
What are the SEO best practices for localised URL designs if I don’t want to or can’t include special characters in the URLs of my website?
Before we get into the debate, if you do fall on the side of the argument that does not advocate the use of special characters in URLs or more simply your blog, CMS or website does not support them then its still important to follow SEO best practice for localised URL structures.
You should continue to follow the mantras of good URL designs, just like you do for English URLs. They should be descriptive and focused on the content of the page. This means that:
1. For languages that use the Roman Alphabet such as French and Spanish translate the URL into those languages.
- www.mybrand.com/chaussures (French)
- www.mybrand.com/zapatos (Spanish)
A translated, keyword rich URL will have much more impact when localised. Who in France or Spain would search for shoes when looking for shoes in their language?
2. Often these URLs do not get localised or websites that have multiple locales that stem from an English locale. There is an argument that the extra work in localising the URL outweighs the benefit, but I disagree on this point and any process that removes the need to include an English word on a non-English site should be encouraged. When there are special characters in a Roman alphabet it has become common practice to replace them with the its nearest equivalent so an é with an acute accent would simply become an e. Its important to remember that these changes can significantly impact the meaning of a word, but due to the initial limitations in URL standards this does work. Indeed search engines will often, but not in all circumstances return the same results for a term with or without the special character.
3. For URLs in non-Roman alphabet languages such as Russian, Chinese and Japanese the default has been to just use English. As most URLs for web pages in these languages continue to not be localised you won’t be at too much of a disadvantage. One option might be to localise the URLs using languages like Pinyin or Romaji the phonetic equivalent of Chinese and Japanese languages, but I’ve not seen any evidence to suggest that this works.
Even though the above if implemented will make a big difference for maximising the benefits of a good URL in international SEO the impact on non-Roman alphabet languages is still limited. There are many arguments against full special character URL localisation I will try and tackle them one by one.
Do search engines even understand encoded special characters?
From an initial research yes they do, or at least some do. I found this note from Google on Google Groups.
Yes, we can generally keep up with UTF-8 encoded URLs and we’ll generally show them to users in our search results (but link to your server with the URLs properly escaped). I would recommend that you also use escaped URLs in your links, to make sure that your site is compatible with older browsers that don’t understand straight UTF-8 URLs.
Indeed you can query Google for special characters in a URL only.
Take this example: http://www.google.com/search?q=inurl:%C3%BC
Search engines aim to map user intent so I’d expect them to give the same attention to special characters as they do standard characters in a URL.
For pages deep with the structure of a site I feel like this is a mute point. Take an English URL for a product like this…
Would you really expect many users to remember and type this in, even if it was made entirely of words? They are going to search for it, not go direct.
Even if the keyboard input was a concern most users who want to type in these type of URLs are either likely to have a localised keyboard that supports the characters or use keyboard shortcuts to get what they want.
However, a user may type in a landing page like…
For these type of URLs that are at the root of the domain I feel the best approach is to use URLs that don’t use special characters or at the very least provide a vanity URL without special characters in marketing communications that redirects to the page URL via a 301 redirect.
How is there benefit to the user if the URL is displayed with percent encoding in the address bar?
Modern browsers support UTF-8 character encoding. The only ones I have on my list, beyond the most archaic browsers, are IE6 and Firefox 2.0. These browsers return percent encoding instead. When testing this yourself you might find that a browser such as IE8 is not showing the encoded URL, but this is often because you don’t have the relevant language support installed against that browser.
For the users that matter, except for those using old browsers such as IE6 (which to be fair is still has high usage in countries like China), they will see the localised URL encoded in their language.
So overall, much better for the user, and as we have seen better for search engines.
UPDATE: I’ve been looking at this further and I’m only getting consistent results in Chrome. I’m going to investigate this further.
Won’t an encoded URL create errors when sharing or bookmarking an encoded page?
I’ve read that it might, but on the social sites I’ve tested there were no issues, see screenshots below.
Link takes you to http://www.dmoz.org/World/Japanese/%E3%82%AA%E3%83%B3%E3%83%A9%E3%82%A4%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%97/%E3%82%AA%E3%83%BC%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3/ without an issue.
Will special characters break my website in IE6?
I’ve seen this mentioned on some forums, but I can’t find conclusive evidence to suggest that it will. I’d expect IE6 and your website in general to break if you don’t probably encode the URLs, but if you use the applicable percent encoding for special characters the site should work without issues.
This said I’d love to hear if anyone has found issues even with special characters in IE6 or elsewhere when fully localising URLs for all languages.
Any other comments?
- Has anyone seen significant search benefits using special characters in URLs?
- Is there a URL localisation browser support cheat sheet out there to help us SEOers understand if users will see encoded URLs or percent encoding?
- Does anyone have any additional concerns or insights when localising URLs for non-English web pages?