The joy of Unicode

So, back in late 2008, rather soon after we got to start working on Drizzle full time, someone discovered unicodesnowmanforyou.com, or:

☃

Since we had decided that Drizzle was going to be UTF-8 everywhere,(after seeing for years how hard it was for people to get character sets correct in MySQL) we soon added ☃.test to the tree, which tried a few interesting things:

CREATE TABLE ☃; CREATE DATABASE ☃; etc etc

Because what better to show off UTF-8 than using odd Unicode characters for table names, database names and file names. Well… it turns out we were all good except if you attempted to check out the source tree on Solaris. It was some combination of Python, Bazaar and Solaris that meant you just got python stacktraces and no source tree. So, if you look now it’s actually snowman.test and has been since the end of 2008, because Solaris 10.

A little while later, I was talking to Anthony Baxter at OSDC in Sydney and he mentioned Unicode above 2^16 in UTF-8…. so, we had clef.test (we’d learned since ☃ and we were not going to tall it 𝄢.test).

Fast forward a few years to, well, this week, and I was talking to Jeremy Kerr about petitboot and telling the tail of snowman.test. So out came the crazy Unicode characters:

  • U+1F4A9 PILE OF POO 💩
  • U+1F435 MONKEY FACE 🐵
  • U+1F431 CAT FACE 🐱
  • U+1F602 FACE WITH TEARS OF JOY 😂
  • U+1F639 CAT FACE WITH TEARS OF JOY 😹

But guess what, there is no MONKEY FACE WITH TEARS OF JOY! I know, this is just unacceptable – TEARS OF JOY should be a modifier, because you may need U+1F6B9 MENS SYMBOL 🚹 with a TEARS OF JOY modifier at some point in your life.

Anyway, another place with tests involving odd Unicode characters is good for everyone, but still lacking if you need to boot an Operating System that’s MONKEY FACE WITH TEARS OF JOY.

9 thoughts on “The joy of Unicode

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.