Thursday, May 31, 2012

Braille and Unicode are more similar than I ever knew.

As you probably know, braille is a writing system that is widely used by visually impaired people. A couple of years ago, I visited China with my parents, where we went to take a look at the terracotta army. Suddenly, I noticed that they also had information in braille. Curious about how Chinese braille would feel, I walked over to the metal plate on the wall on which the information was written. None of the characters made sense to me, but what I did notice, is that they used the same 6-point braille system, that we use in Europe. To explain this, I'll first have to teach you a little bit about how braille works.

Each braille symbol consists of a matrix of 3 high and 2 wide. Each of the cells in the matrix may contain a dot. The cells in the matrix all get a number. The numbers are laid out like this:

14
25
36

As an example, take a look at this image, it's the letter f. This is written by typing points 1, 2 and 4.

Now, going back to the Chinese braille, a couple of weeks ago, I had the brilliant idea to just google "chinese braille", and the first link I came across was a wikipedia article. What a surprise. but in this wikipedia article, I found a great explanation of how different aspects of these chinese characters are shown in consecutive braille symbols. Standing in awe of this ingenious system, that could encode a language as complex as Chinese, into a system where there are only 2^6 possible symbols, I suddenly realized that western braille uses some of that magic too. We had to, with only 64 possible symbols. If you would encode all small and capital letters differently, that would already add up to 52 characters. Add to that our 10 digits, and we've already used up 62 characters, and we're still far from encoding everything. We haven't encoded any punctuation marks yet, the period, colon, semicolon, comma, question mark, exclamation mark and many more also want to be written down, and what about mathematical symbols for addition, division, multiplication and subtraction,...

Here are a couple of tricks that braille uses to circumvent these issues. First of all, we reuse our small leters as capitals, but than with a capital sign in front of them. So to write anyone's name, you'd first write a capital sign, indicating that the letter after this sign is a capital. This sign consists of point 4 and 6. Some abbreviaions and acronyms are written in all caps, and for that, we've got the permanent capital sign, which turns the entire next word into all caps. This significantly reduces the number of characters needed, but there's even more. We also reuse our letters as numbers. The a is the 1, the b is the 2 and so on until j, which is 0. You first write a number mark, and then, any characters that follow before a space, or another non-numerical character are now numbers. for example, let's say for a moment that the dollar sign is our number mark, then 123450 becomes $abcdej.

coming from my thoughts about how diferent languages may use braille in different ways, I suddenly thought about how Unicode does exactly the same thing. For the people who don't know what unicode is, it's a way to store text as numbers. A computer can only store numbers, so there needs to be a mapping from these numbers, to the characters made visible to me as I type out this blog post.

Many years ago, the most used standard was called ascii, which only used 1 byte to store each character, 1 octet to be completely precise, but I'm just going to call it a byte to keep things simple. A byte consists of 8 bits. A bit is a part of memory which can either be 1 or 0, so with 8 bits, you can form 2^8 symbols, But as technology grew more popular, internationalization was needed, and there was no way that all those arabic, chinese, russian, and all the other "weird" almphaets, would ever fit into 256 possible symbols. Unicode's sollution, was the same that braille used, multiple consecutive characters would form new ones. The similarity is not a great discovery, but still, it teases my mind to think about different ways to encode characters. Thinking about it even further, you may even consider our writing to be a character encoding, where each symbol we write is mapped to the idea of a certain character.

1 comment:

leave an interesting or creative response.