In the beginning there was only ASCII. ASCII is the basic language that all other character sets are based on. ASCII characters are the uppercase and lower case letters A to Z, plus some basic punctuation. ASCII is a universal set of characters, every device everywhere understands what ASCII is. At the dawn of SMS 20 years ago, ASCII characters were the only ones you could reliably send in an SMS message.
Not everyone speaks English. As much as we might wish everyone spoke one language and we didn't need any other characters to express their language, it just isn't so. The world is a very interesting place, with an incredible variety of characters, accents, symbols, and punctuation. We can't expect the world to only accept SMS messages in plain ASCII.
In the old days, different language meant different encoding. In the past, if you wanted to send a message to a user in Greece, you had to use a Greek character set. If you wanted to send a message to a customer in Russia, you had to use an entirely different character set with Cyrillic characters. And the user had to have the same character set installed. If this sounds complicated, rest assured, it was much worse than it sounds!
The GSM encoding was created to solve this problem. For Western Europe, at least. The GSM encoding allowed for the entire ASCII character set, plus a lot of common accented characters to be sent in an SMS. This encoding standard also allowed using fewer bits per character, which means you can send 160 characters in an SMS instead of only 140.
Enter Unicode. Unicode was intended to be the one code to rule them all. It is a new system of representing every character in use today. It also includes characters from many ancient languages, and it still has plenty of room for other stuff (Emoji, anyone?). Unicode has room for 1.1 million characters, and currently has about 110,000 that have been assigned.
There are different Unicode encodings. Unicode is a great improvement, but we still need to turn those Unicode characters into something that computers can understand. There are several different Unicode encodings. Each of them has strengths and weaknesses. UTF-8 is universal and has many strengths, but is not always well supported in the mobile world. UTF-16 is another modern encoding and is a good choice for Asian languages in particular, but reduces your SMS message length to 70 characters. UCS-2 is an older version of UTF-16, but a lot of devices still require it.
Every Provider is different. It would be great if all of the SMS carriers and aggregators out there supported the same encodings in the same way, uniformly around the world. They don't. (But you already knew that, didn't you?) Most carriers in Europe support the GSM character set, while those in the US generally do not. Japanese carriers sometimes support the Shift-JIS encoding. Some carriers can handle UTF-8, some insist on UCS-2.
Ok, I get it, international SMS is complicated, now what?
If your business needs international reach with SMS, make sure to ask these questions of any potential vendor:
What countries do you support sending SMS messages to? If you need global reach, make sure your provider supports the countries you're interested in. A lot of aggregators specialize in specific countries or regions. Only a few have worldwide reach.
What languages and encodings does your provider support? US based providers might only support sending messages in plain ASCII. European providers probably support GSM. Better providers will let you send your message to them in UTF-8 and do any necessary conversions for you.
Does your provider automatically normalize messages when it is necessary? If you're trying to send a message and the recipient can't receive it in the language you sent it in, does your provider try to send it as plain ASCII for you automatically?
Understanding all the intricacies of International SMS messaging can be daunting, especially if your provider doesn't give you much help. However, if global reach is a necessity for your business, taking the time to find a provider that understands these details can take much of the sting out of it.