PHP and Unicode

In my last post about the testing goat I mentioned there’s now an official Unicode codepoint for “GOAT”, U+1F410.

At the time, I tried typing it in. Under Linux you just press ctrl-shift-u (you’ll see an underlined letter u), type the hex digits for the code you want, press space and continue on. Easy. Having installed the free Symbola font, I could see my little goat in the editor. Happy Days!

Until I went to preview the post, at which point my little goat, and everything after it, had disappeared. Fortunately it was the last thing in my post, but if it was higher up I’d have lost some of my work. Not good! It was late and I was tired so I left it out, a little disappointed.

So, looking again this evening I found that there’s a known problem that WordPress gets confused if it sees a Unicode character above U+FFFF. If you install the Full UTF-8 plugin, it works again. Without a doubt, this plugin, or something like it, should be merged into the core. Right now.

PHP and Unicode

In my job I have the dubious pleasure of maintaining a very old PHP application. Several hoops are jumped through to keep UTF-8 characters intact, but the hoops still work so I generally just leave it alone. This WordPress issue just had me googling again, and it seems to confirm that PHP (which is the language WordPress is written in) still doesn’t support Unicode natively. Really. In 2014.

It seems that Unicode support for PHP was first proposed in 2005 for what was planned to be PHP 6. Nine years later, and we’re just at 5.6.1. I came across this presentation on Slideshare from 2011 describing how the PHP+Unicode project reached a certain point and just ran out of steam. It seems nothing has happened since.

The nine years of bad history associated with the name “PHP 6” even has people suggesting that the next actual major release of PHP should be called “PHP 7”. It’s that bad.

Conclusion, for now

That PHP application I maintain is well over ten years old. It’s fairly stable, but has accumulated various bits of cruft over time. Adding new features is awkward and really it needs a rewrite. Since it uses lots of international characters I’d really like clean Unicode support, so I’m strongly drawn to using Python 3. It’s nearly 6 years old and supports Unicode properly. Now I’ve to pick a web framework. I’ll probably have a go with Django for now, simply because Harry’s TDD book uses it.

Oh and finally, just because I can, even though WordPress doesn’t want me to, here’s a goat: 🐐

Assuming you’ve got a font for it, of course!

The testing goat?

Test Driven Development with Python at PyConIE14
Test Driven Development with Python at PyConIE14

I’m not new to Python at all. I still have a copy of AndrĂ© Lessa’s Python Developer’s Handbook, which the receipt says I bought on 15th March 2001, and covers Python 1.6. Unfortunately in all the years since I’ve never used it much. My postgrad studies mostly used Verilog and my day job generally involves bash scripts and maintaining some really old PHP.

Still, it’s a language I feel I want to use a lot, and I’ve attended the last two PyCon’s in Ireland. At PyConIE 2013 I went to a tutorial on Test-Driven Development by Harry Percival, and at PyConIE 2014 I won a copy of his book Test-Driven Development in Python. I say won, but simply there were 40 books being given away (20 of these, and 20 of High Performance Python) and I was 40th in the queue. It feels like winning something, at least đŸ™‚

Anyway, at the tutorial 2013 Harry made reference to the “Testing Goat”, and I thought it was just a whimsical idea of his, but the goat was back in 2014 and it’s on the cover of his book.

A bit of googling and it seems the Python Testing Goat is a thing.

As best as i can tell, the Testing Goat was a running joke at PyCon 2010 (see here) and it’s become a mascot for Python Testing ever since. There was even a successful campaign to have O’Reilly put a goat on Harry’s book instead of the expected snake.

U+1F410 GOAT
U+1F410 GOAT

2010 was also the year Unicode 6.0 was released, which added (amongst other things) U+1F410 GOAT. Surely not a coincidence?

Hello world!

After having a blank site here for (too) many years, it’s finally time to put something here. I’m just at the end of PyConIE 2014 and energised to try writing about programming and IT stuff in general. There’s the theory that if you can’t explain something you don’t really understand it, so I’m going to try that approach to writing posts here about whatever I’m working on. We’ll see how it works out!