Saturday 22 December 2007

Soft hyphen in ISO-8859-1 and Unicode

I just came across this blog post about Unicode and ISO 8859-1 being unclear on how to show a soft hyphen.

The article contains links to other blog posts and documents about this topic.

I will not give a resume on the problem - read the articles if you're interested. However, I have a strong opinion on the topic: The character set standard should not define the application.

Sometimes you want to create an editor, that exactly shows the contents of a file, so that the user is able to see all bytes in the file precisely. And sometimes the editor has another purpose, like making it simple to create a sales brochure.

In the first case, a soft hyphen should be visible to the user. Think notepad... the character 0xAD should be clearly visible in notepad, no matter where you put it. In the second case, a soft hyphen character can be used to implement an application-specific soft hyphen functionality, where the hyphen is only shown when it makes sense according to the application's purpose.

Some of the articles even mention the use of soft hyphens in HTML. That's really out of scope, since HTML already redefines the layout of so much. It seems somebody has forgotten that the primary purpose of HTML is to render things differently.

No comments: