Unicode, UTF-8 and you

May 23, 2015

Unicode is not a panacea. Some people’s names can’t even be written in unicode. However, as far as universal encodings go, it is the best we have got — warts and all. It is the only reasonable way to represent text inside programs, except for very very specialized needs (no, you don’t qualify).

Now, programs are made of libraries, and often there are several layers of abstraction between the library and the program. Sometimes, some weird abstraction layer in the middle will make it hard to convey user configuration into the library’s guts. Code should figure things out itself, most of the time.

So, there are several ways to make dealing with unicode not-horrible.

Unicode internally

I’ve already mentioned it, but it bears repeating. Internal representation should use the language’s built-in type (str in Python 3, String in Java, unicode in Python 2). All formatting, templating, etc. should be, internally, represented as taking unicode parameters and returning unicode results.


Obviously, when interacting with an external protocol that allows the other side to specify encoding, follow the encoding it specifies. Your program should support, at least, UTF-8, UTF-16, UTF-32 and Latin-1 through Latin-9. When choosing output encoding, choose UTF-8 by default. If there is some way for the user to specify an encoding, allow choosing between that and UTF-16. Anything else should be under “Advanced…” or, possibly, not at all.


When reading input that is not marked with an encoding, attempt to decode as UTF-8, then as UTF-16 (most UTF-16 decoders will auto-detect endianity, but it is pretty easy to hand-hack if people put in the BOM. UTF-8/16 are unlikely to have false positives, so if either succeeds, it’s likely correct. Otherwise, as-ASCII-and-ignore-high-order is often the best that can be done. If it is reasonable, allow user-intervention in specifying the encoding.

When writing output, the default should be UTF-8. If it is non-trivial to allow user specification of the encoding, that is fine. If it is possible, UTF-16 should be offered (and BOM should be prepended to start-of-output). Other encodings are not recommended if there is no way to specify them: the reader will have to guess correctly. At the least, giving the user such options should be hidden behind an “Advanced…” option.

The most popular I/O that does not have explicit encoding, or any way to specify one, is file names on UNIX systems. UTF-8 should be assumed, and reasonably recovered from when it proves false. No other encoding is reasonable (UTF-16 is uniquely unsuitable since UNIX filenames cannot have NULs, and other encodings cannot encode some characters).

Lessons learned in porting to PyPy

April 30, 2015

I’m running the ncolony tests with pypy so I can add PyPy support. I expected it to be a no-op — but turned out I have ingrained expectations that are no longer true: moreover, PyPy is right, and I’m wrong.

  • “hello” is “hello” –> not necessarily true. PyPy does not intern strings like CPython does.
  • file(name, ‘w’).write(“string”) is bad and you should not do that. PyPy is garbage collected, not ref counted, so the file might not be closed for a while.

That was it! It was actually a pleasant experience :)

On Authentication in Web Applications

April 10, 2015

So you are writing a web application. Maybe it’s “2.0”, maybe it’s “1.0”. Maybe it’s 3.0? I am not sure if that’s a thing. Anyway, the important thing is that you want to know that people are who they say they are (I guess not if you’re reimplementing 4Chan or other anonymous-only applications). So you have some buttons that say “Log in” and “Sign up”. In the “Sign Up” page, you have the user enter in their e-mail address (twice, of course, because otherwise they’ll misspell it) and the password (twice, same reason), and possibly a username. You check the password is “strong enough”, and you have a little widget that rates it “weak”, “fair”, “strong” or whatever descriptive words you feel like today. Of course, being a civilized person, you store the passwords on the backend salted and hashed. You make sure that the password comparison is resistant to side-channel attacks. You add a CAPTCHA for the “forgot password” page to prevent mass-attacking it. Since you know your users are almost certainly using the same password on other sites, that do not do all of that, you also offer a 2-factor authentication scheme using Google Authenticator and falling back to SMS codes. Of course, before you store an SMS number as the fallback, you send it a trial code to make sure it is correct (right, WordPress.com?)

And, of course, no users use your 2-factor scheme, one day they get fished, and all their accounts are compromised.

Please note that the above paragraph is the absolute minimum you should do to be a responsible thing that says “sign up with your e-mail and password”. Also highly recommended is participating in a white-hat bug bounty, hiring dedicated pen-testers and having a lot of server-side heuristics to detect a brute-force attack and shut it down immediately.

Do you know who has the resources to do all of that correctly? I can think of two companies that get it all correct. Their names start with consecutive letters of the alphabet… :)

Yep, Facebook and Google actually have the security teams and expertise to check every single one of those boxes (with the exception of the silly little widget that rates your password strength which never in the history of mankind has ever caused a user to choose a different password, because they only remember one password for all the sites they use and it doesn’t change.) Please, for the love of kittens, puppies and hedgehogs, put a little “Sign-up with Facebook” and “Sign-up with Google+” widgets on your web-site. If you are worried about Facebook and Google “capturing” your users, just make sure to grab their (verified!) e-mail addresses when they sign up through OAuth, so that if you ever want to authenticate yourself, all you need to do is just have a “Forgot/recover password” widget, and you are on your way. You can even e-mail your users to tell them “Hey, FB/GOOG screwed us over, so start logging in with your password, and here is how you can recover the password”.

There really is no excuse not to offer this, in 2015. Unless you think your security team is roughly as good as Facebook’s.

Deploying With Docker

March 23, 2015

Glyph wrote about an interesting way to deploy Python applications to Docker. I wanted to try it out, and exercise ncolony a little. The result is NaNoAuto, which is still very preliminary. It was mostly written as a way to test out deploying to Docker and exercise ncolony. Several notes that I learned the hard way:

  • On Ubuntu, all docker client commands need to run as root to access the docker container’s UNIX domain socket. I think there is a way to run it on a host/port and ask for a username/password, but I have not figured that out.
  • Glyph’s scripts need a recent version of Docker. The built-in version for Ubuntu is too old — make sure to install docker from Docker’s apt repository.
  • The entry point you give Docker must not exit — and so must not daemonize. Glyph’s app used an explicit reactor running it (well, even worse — Klein’s implicit reactor). Civilized people, of course, use ‘twistd’. When building things like a civilized person, please remember to pass the “–nodaemonize” flag (AKA “-n”) to twistd.
  • On my personal TODO list is to add a hash checker to the runner, so that the wheels installed from the internet can have their hashes matched. It is possible that “peep” could play a role here.
  • The scripts are divided into “build-base” and “do-build”. “build-base” is the long-running but fairly stable part — it only changes if you change the basic platform (OS, Python interpreter, needed non-Pythonic parts). The “do-build” part is fairly fast, and needs to be run for every change to a Python file.

Note that this still means a little bit of overhead on top of building a new wheel for each change to a Python file. It should be possible to write something on top of “docker exec” that will copy Python files directly from source to container so as to facilitate even quicker edit/debug cycles.

Announcing: NColony — the infrastructure for your Python

March 14, 2015

For the last few days, I have been working on infrastructure for writing multi-process servers: basically, something that would be a good start-command for your Docker, or something you put in your ‘@reboot’ crontab to bring the entire thing app. It is still not tested, mature or production ready — there are still a lot of unimplemented features. I am putting the announcement up now, though, because at least everything that still needs to be done (as far as I know) to get it feature-ready is documented as an issue.

The code, issues and so on are all on the Github for ncolony. Patches for existing issues are welcome, as well as new issues. Questions, comments and code-critiques are also more than welcome.

sphinx-quickstart: Bad idea, or worst idea?

March 3, 2015

When reading the documentation for sphinx, it suggests getting started by running the sphinx-quickstart “wizard”. Inspired by such beloved pieces of software like PowerPoint, sphinx-quickstart will ask you questions about your project and then manufacture a bunch of files which, when built, produce pretty documentation. This…this is wrong. We don’t start writing a new project in Python by having it ask a bunch of questions and then manufacture setup.y. We document, we have samples and we let people who are, ideally, capable of writing complicated Python module to create those using their favorite editor.

One might think perhaps this is a design limitation of sphinx  but it is not really. It’s just a silly little thing in the documents. For a minimal (and somewhat typical) sphinx set up all you need is:



.. toctree::
   :maxdepth: 2


master_doc = 'index'
project = 'PROJECT'
copyright = '2015, YOUR NAME'
author = 'YOUR NAME'
version = release = '0.0.1'

Building the documentation, something that sphinx-quickstart will helpfully create a Makefile (that one calls recursively from their Python Makefile, I guess) and Windows batch file [in sphinx-quickstart’s defense, it asks whether to do those things]. Again, one might have the impression that perhaps sphinx is so complicated one might need those. Here is the complicated command needed to build the documentation:

sphinx-build -b html . _build/html

Sphinx is pretty well documented. Starting from this skeleton, it is easy to add extensions, change configuration options, etc. There is absolutely no excuse for sphinx-quickstart, a blemish on an otherwise pretty nifty documentation system. If you were also turned off by needing to run a quickstart script that fills your source directory with things, fear no more. Build your skeleton yourself, and enjoy beautiful docs!

Clash of Clans: War strategy tips

October 22, 2014

[I wanted a central place to put my CoC Clan War strategy tips so I can refer people to them.]


  • A town hall outside the walls is great for farming, but your war base should have amazingly good protection of the town hall, otherwise they get, essentially, a free one star (together with the one star for 50% that people can get by just taking down the outermost buildings with an archer rush, this almost always means free two stars).
  • Air defenses should be well protected, or even a simple “take out the air defenses” together with a castle dragon can let even relatively low-level town hall take out pretty impressive bases.
  • Do not bother with walls around your town hall. If giants got to the point where they have nothing to do but to take out your town hall, you have bigger problems and otherwise, the archers will take out your town hall while leaving the wall intact. Instead, use walls to protect the giants’ favorite targets: wizards and mortars.
  • Avoid the “eggshell” design: one strong outer wall with no internal walls. Once the outer wall is breached, your base is effectively wall-free.


  • Meta tip #1: Remember to question your instincts if you raid for trophies or loot a lot. War has different incentive schemas and different pros and cons.
  • Meta tip #2: [Continuation of meta tip #1] You have more than 30 seconds to figure out your strategy. You can look at the enemy village, and even tap buildings to see ranges. Spend some times thinking about your strategy before sending in your first troop,
  • If you are not the first to attack, watch replays of all previous attacks. Make sure to memorize:
    • Location of traps
    • Location of hidden tesla towers
    • Composition of clan castle
  • Go for the three stars. A one-star attack is almost useless, and a two-star attack is worth only 66% of a three star attack. Only go for the one-star if nothing else seems realistic.
  • The clan castle almost certainly has troops in it. Probably a mix of archers and wizards.
    • Check the range of the clan castle ahead of time, and plan how to draw the troops out.
    • Giants and, to a lesser extent, hog riders, are extremely vulnerable to castle troops, because they do not fight back. Clear castle troops before sending them in, if possible.
    • Make sure to draw out troops and kill them with either a lightening spell or archers/wizards (you’ll probably need the equivalent of 20-30 housing spaces, in either archers or wizards, to demolish a 20-25 sized castle).
    • If you draw out troops, try drawing them out to an area that’s free of other defensive buildings.
    • Make sure you drew out all troops: some will sometimes stay in. Better safe than sorry.
  • If you are going to send in dragons or healers, it is probably a good idea to get rid of the air defenses using giants or hog riders first. Air defenses are extremely effective against flying troops. [This does not apply if you plan your attack taking into the air defenses into account and being OK with the flying troops having a limited life time.]
  • Walls should be dealt with, if possible, by wall breakers and not keeping the giants there hacking at the wall. First, this is an inefficient use of the giants’ hit points, and worse, this wastes a lot of time. Time is of the essence for getting all three stars.
  • Send troops (somewhat) en-masse. Sending the archers or wizards one-by-one just lets the defensive buildings take them out one by one. Even better, send mass amounts of archers or wizards hidden behind giants to let giants soak up the pain while the archers take out buildings.


Get every new post delivered to your Inbox.

Join 318 other followers