New site, new blog, new book

May 5, 2018

My Little Subclass: Inheritance is Magic

April 29, 2017

New post on my blog.


Twisted Tutorial Webinar

April 19, 2017

Glyph and I are giving a tutorial about Twisted and web services at PyCon. In order to try it out, we are giving a webinar. Please come, learn, and let us know if you like it!


PYTHONPATH Considered Harmful

April 13, 2017

Another post on my new blog.


Shipping Python Applications in Docker

March 20, 2017

Posted on my new blog as an experiment!


On Raising Exceptions in Python

December 10, 2016

There is a lot of Python code in the wild which does something like:

raise SomeException("Could not fraz the buzz:"
                    "{} is less than {}".format(foo, quux)

This is, in general, a bad idea. Exceptions are not program panics. While they sometimes do terminate the program, or the execution thread with a traceback, they are different in that they can be caught. The code that catches the exception will sometimes have a way to recover: for example, maybe it’s not that important for the application to fraz the buzz if foo is 0. In that case, the code would look like:

try:
    some_function()
except SomeException as e:
    if ???

Oh, right. We do not have direct access to foo. If we formatted better, using repr, at least we could tell the difference between 0 and “0”: but we still would have to start parsing the representation out of the string.

Because of this, in general, it is better to raise exceptions like this:

raise SomeException("Could not fraz the buzz: foo is too small", foo, quux)

This way exception handling has a lot of power: it can introspect foo, introspect quux and introspect the string. If by some reason the exception class is raised and we want to verify the reason, checking string equality, while not ideal, is still better than trying to match string parts or regular expression matching.

When the exception is presented to the user interface, in that case, it will not look as nice. Exceptions, in general, should reach the UI only in extreme circumstances: and in those cases, having something that has as much information is useful for root cause analysis.


Moshe’z Messaging Preferences

December 8, 2016

The assumption here is that you have my phone number. If you don’t have my phone number, and you think that’s an oversight on my part, please send me an e-mail at zadka.moshe@gmail.com and ask for it. If you don’t have my phone number because I don’t know you, I am usually pretty responsive on e-mail.

In order of preference:


Don’t Mock the UNIX Filesystem

December 2, 2016

When writing unit tests, it is good to call functions with “mocks” or “fakes” — objects with equivalent interface but a simple, “fake” implementation. For example, instead of a real socket object, something that has recv() but returns “hello” the first time, and an empty string the second time. This is great! Instead of testing the vagaries of the other side of a socket connection, you can focus on testing your code — and force your code to handle corner cases, like recv() returning partial messages, that happen rarely on the same host (but not so rarely in more complex network environments).

There is one OS interface which it is wise not to mock — the venerable UNIX file system. Mocking the file system is the classic case of low-ROI effort:

  • It is easy to isolate: if functions get a parameter of “which directory to work inside”, tests can use a per-suite temporary directory. Directories are cheap to create and destroy.
  • It is reliable: the file system rarely fails — and if it does, your code is likely to get weird crashes anyway.
  • The surface area is enormous: open(), but also os.open, os.mkdir, os.rename, os.mknod, os.rename, shutil.copytree and others, plus modules calling out to C functions which call out to C’s fopen().

The first two items decrease the Return, since mocking the file system does not make the tests easier to write or the test run more reproducible, while the last one increases the Investment.

Do not mock the file system, or it will mock you back.


Belt & Suspenders: Why put a .pex file inside a Docker container?

November 19, 2016

Recently I have been talking about deploying Python, and some people had the reasonable question: if a .pex file is used for isolating dependencies, and a Docker container is used for isolating dependencies, why use both? Isn’t it redundant?

Why use containers?

I really like glyph’s explanation for containers: they isolate not just the filesystem stack but the processes and the network, giving a lot of the power that UNIX was supposed to give but missed out on. Containers isolate the file system, making it easier for code to write/read files from known locations. For example, its log files will be carefully segregated, and can be moved to arbitrary places by the operator without touching the code.

The other part is that none of the reasonable options packages Python and this means that a pex file would still have to be tested with multiple Pythons, and perhaps do some checking at start-up that it is using the right interpreter. If PyPy is the right choice, it is the choice the operator would have to make and implement.

Why use pex?

Containers are an easy sell. They are right on the hype train. But if we use containers, what use is pex?

In order to explain, it is worthwhile comparing a correctly built runtime container that is not using pex, with one that is: (parts that are not relevant have been removed)

ADD wheelhouse /wheelhouse
RUN . /appenv/bin/activate; \
    pip install --no-index -f wheelhouse DeployMe
COPY twist.pex /

Note that in the first option, we are left with extra gunk in the /wheelhouse directory. Note also that we still have to have pip and virtualenv installed in the runtime container. Pex files bring the double-dutch philosophy to its logical conclusion: do even more of the build on the builder side, do even less of it on the runtime side.


Deploying with Twisted: Saying Hello

November 13, 2016

Too Long: Didn’t Read

The build script builds a Docker container, moshez/sayhello:.

$ ./build MY_VERSION
$ docker run --rm -it --publish 8080:8080 \
  moshez/sayhello:MY_VERSION --port 8080

There will be a simple application running on port 8080.

If you own the domain name hello.example.com, you can point it at a machine that the domain resolves to and then run:

$ docker run --rm -it --publish 443:443 \
  moshez/sayhello:MY_VERSION --port le:/srv/www/certs:tcp:443 \
  --empty-file /srv/www/certs/hello.example.com.pem

It will result in the same application running on a secure web site: https://hello.example.com.

 

All source code is available on GitHub.

Introduction

WSGI has been a successful standard. Very successful. It allows people to write Python applications using many frameworks (Django, Pyramid, Flask and Bottle, to name but a few) and deploy using many different servers (uwsgi, gunicorn and Apache).

Twisted makes a good WSGI container. Like Gunicorn, it is pure Python, simplifying deployment. Like Apache, it sports a production-grade web server that does not need a front end.

Modern web applications tend to be complex beasts. In order to be trusted by users, they need to have TLS support, signed by a trusted CA. They also need to transmit a lot of static resources — images, CSS and JavaScript files, even if all HTML is dynamically generated. Deploying them often requires complicated set-ups.

Containers

Container images allow us to package an application with all of its dependencies. They often cause a temptation to use those as the configuration management. However, Dockerfile is a challenging language to write big parts of the application in. People writing WSGI applications probably think Python is a good programming language. The more of the application logic is in Python, the easier it is for a WSGI-based team to master it.

PEX

Pex is a way to package several Python “distributions” (sometimes informally called “Packages”, the things that are hosted by PyPI) into one file, optionally with an entry-point so that running the file will call a pre-defined function. It can take an explicit list of wheels but can also, as in our example here, take arguments compatible with the ones pip takes. The best practice is to give it a list of wheels, and build the wheels with pip wheel.

pkg_resources

The pkg_resources module allows access to files packaged in a distribution in a way that is agnostic to how the distribution was deployed. Specifically, it is possible to install a distribution as a zipped directory, instead of unpacking it into site-packages. The code:pex format relies on this feature of Python, so adherence to using pkg_resources to access data files is important in order to not break code:pex compatibility.

Let’s Encrypt

Let’s Encrypt is a free, automated, and open Certificate Authority. It has invented the ACME protocol in order to make getting secure certificates a simple operation. txacme is an implementation of an ACME client, i.e., something that asks for certificates, for Twisted applications. It uses the server endpoint plugin mechanism in order to allow any application that builds a listening endpoint to support ACME.

Twist

The twist command-line tools allows running any Twisted service plugin. Service plugins allow us to configure a service using Python, a pretty nifty language, while still allowing specific customizations at the point of use via command line parameters.

Putting it all together

Our setup.py files defines a distribution called sayhello. In it, we have three parts:

  • src/sayhello/wsgi.py: A simple Flask-based WSGI application
  • src/sayhello/data/index.html: an HTML file meant to serve as the root
  • src/twisted/plugins/sayhello.py: A Twist plugin

There is also some build infrastructure:

  • build is a Python script to run the build.
  • build.docker is a Dockerfile designed to build pex files, but not run as a production server.
  • run.docker is a Dockerfile designed for production container.

Note that build does not push the resulting container to DockerHub.

Credits

Glyph Lefkowitz has inspired me in his blog about how to build efficient containers. He has also spoken about how deploying applications should be no more than one file copy.

Tristan Seligmann has written txacme.

Amber “Hawkowl” Brown has written “twist”, which is much better at running Twisted-based services than the older “twistd”.

Of course, all mistakes and problems here are completely my responsibility.