Docker: Are we there yet?

February 1, 2016

Obviously, we know the answer. This blog is intended to allow me to have an easy place to point people when they ask me “so what’s wrong with Docker”?

[To clarify, I use Docker myself, and it is pretty neat. All the more reason missing features annoy me.]

Docker itself:

  • User namespaces — slated to land in February 2016, so pretty close.
  • Temporary adds/squashing — currently “closed” and suggests people use work-arounds.
  • Dockerfile syntax is limited — this is related to the issue above, but there are a lot of missing features in Dockerfile (for example, a simple form of “reuse” other than chaining). No clear idea when it will be possible to actually implement the build in terms of an API, because there is no link to an issue or PR.

Tooling:

  • Image size — Minimal versions of Debian, Ubuntu or CentOS are all unreasonably big. Alpine does a lot better. People really should move to Alpine. I am disappointed there is no competition on being a “minimal container-oriented distribution”.
  • Build determinism — Currently, almost all Dockerfiles in the wild call out to the network to grab some files while building. This is really bad — it assumes networking, depends on servers being up and assumes files on servers never change. The alternative seems to be checking big files into one’s own repo.
    • The first thing to do would be to have an easy way to disable networking while the container is being built.
    • The next thing would be a “download and compare hash” operation in a build-prep step, so that all dependencies can be downloaded and verified, while the hashes would be checked into the source.
    • Sadly, Alpine linux specifically makes it non-trivial to “just download the package” from outside of Alpine.

 


Learning Python: The ecosystem

January 27, 2016

When first learning Python, the tutorial is a pretty good resource to get acquainted with the language and, to some extent, the standard library. I have written before about how to release open source libraries — but it is quite possible that one’s first foray into Python will not be to write a reusable library, but an application to accomplish something — maybe a web application with Django or a tool to send commands to your TV. Much of what I said there will not apply — no need for a README.rst if you are writing a Django app for your personal website!

However, it probably is useful to learn a few tools that the Python eco-system engineered to make life more pleasant. In a perfect world, those would be built-in to Python: the “cargo” to Python’s “Rust”. However, in the world we live in, we must cobble together a good tool-chain from various open source projects. So strap in, and let’s begin!

The first three are cribbed from my “open source” link above, because good code hygiene is always important.

Testing

There are several reasonably good test runners. If there is no clear reason to choose one, py.test is a good default. “Using Twisted” is a good reason to choose trial. Using coverage is a no-brainer. It is good to run some functional tests too. Test runners should be able to help with this too, but even writing a Python program that fails if things are not working can be useful.

Static checking

There are a lot of tools for static checking of Python programs — pylint, flake8 and more. Use at least one. Using more is not completely free (more ways to have to say “ignore this, this is ok”) but can be useful to catch more style static issue. At worst, if there are local conventions that are not easily plugged into these checkers, write a Python program that will check for them and fail if those are violated.

Meta testing

Use tox. Put tox.ini at the root of your project, and make sure that “tox” (with no arguments) works and runs your entire test-suite. All unit tests, functional tests and static checks should be run using tox.

Set tox to put all build artifacts in a build/ top-level directory.

Pex

A tox test-environment of “pex” should result in a Python EXectuable created and put somewhere under “build/”. Running your Python application to actually serve web pages should be as simple as taking that pex and running it without arguments. BoredBot shows an example of how to create such a pex that includes a web application, a Twisted application and a simple loop/sleep-based application. This pex build can take a requirements.txt file with exact dependencies, though it if it is built by tox, you can inline those dependencies directly in the tox file.

Collaboration

If you do collaborate with others on the project, whether it is open source or not, it is best if the collaboration instructions are as easy as possible. Ideally, collaboration instructions should be no more complicated than “clone this, make changes, run tox, possibly do whatever manual verification using ‘./build/my-thing.pex’ and submit a pull request”.

If they are, consider investing some effort into changing the code to be more self-sufficient and make less assumptions about its environment. For example, default to a local SQLite-based database if no “–database” option is specified, and initialize it with whatever your code needs. This will also make it easier to practices the “infinite environment” methodology, since if one file is all it takes to “bring up” an environment, it should be easy enough to run it on a cloud server and allow people to look at it.


Big O for the working programmer

December 6, 2015

Let’s say you’re writing code, and you plan out one function. You try and figure out what are the constraints on the algorithmic efficiency of this function — how good should your algorithm be? This depends, of course on many things. But less than you’d think. First, let’s assume you are ok with around a billion operations (conveniently, modern Gigahertz processors do about a billion low-level operations per second, so it’s a good first assumption.)

If your algorithm is O(n), that means n can’t be much bigger than a billion.

O(n**2) — n should be no more than the root of a billion — around 30,000.

O(n**3) — n should be no more than the third root of a billion — a thousand.

O(fib(n)) — n should be no more than 43

O(2**n) — a billion is right around 2**30, so n should be no more than 30.

O(n!) — n should be no more than 12

OK, now let’s assume you’re the NSA. You can fund a million cores, and your algorithm parallelizes perfectly. How do these numbers change?

O(n) — trillion

O(n**2) — 30 million

O(n**3) — hundred thousand

O(fib(n)) — 71

O(2**n) — 50

O(n!) — 16

You will notice that the difference between someone with a Raspberry PI and a nation-state espionage agency is important for O(n)/O(n**2) algorithms, but quickly becomes meaningless for the higher order ones. You will also notice log(n) was not really factored in — even in a billion, it would mean the difference between handling a billion and handling a hundred million.

This table is useful to memorize for a quick gut check — “is this algorithm feasible”? If the sizes you plan to attack are way smaller, than yes, if way bigger than no, and if “right around” — that’s where you might need to go to a lower-level language or micro-optimize to get it to work. It is useful to know these things before starting to implement: regardless of whether the code is going to run on a smartwatch or on an NSA data center.

Conveniently, these numbers are also useful for memory performance — whether you need to support a one GB device or if you have a terabyte of memory.

 

 


Deploying Python Applications

October 10, 2015

Related to: 2L2T: DjangoCon FeedbackDeploying Python Applications with Docker – A SuggestionDeploying With DockerSoftware You Can Use

I have tried to make an example of some of the ways to avoid mistakes with BoredBot. Hopefully the README explains the problem domain and motivation, but here is the tl;dr:

Deploying applications is a place where it is possible to do things badly, amazingly badly and “oh God!”. I hope to slightly improve on this consensus with the current state of 2015 technology and get to “well, ok, this does not suck too much”. BoredBot is basically my attempt to show how combining NaCl, NColony, Pex and Docker can get you to a simple image which can be deployed to Amazon ECS, Google Container Engine or your own Docker servers without too much trouble.

There is still a lot BoredBot doesn’t do that I think any good deployment infrastructure should support — staging vs. production, dry-run mode, etc. Pull requests and issues are happily accepted!


Personal calling card

September 19, 2015

I have decided it is time to make a new personal calling card for myself. Please let me know what you think! I am reasonably sure that I’m not the only person, but one of a small number of people, who actually unit-tested their card: yep, the code is valid Python and will actually print the right details (if you “pip install attrs”, of course).

Back:

back

Front:

front


Kushiel’s Legacy (the Phedre Trilogy) — Sort of a book review

August 12, 2015

Below spoilers for the Kushiel’s Legacy books (first trilogy). If you have not read them, I highly recommend reading before looking below — they are pretty awesome. Technically, there are also spoilers for Lord of the Rings, though if you have not read it by now, I don’t know what else to do.

Read the rest of this entry »


ncolony 0.0.2 released

June 27, 2015

Fixed:

Mostly internal cleanup release: running scripts is now nicer (all through “python -m ncolony <command>”), added Code of Conduct, releasing based on versioneer, cleaned up tox.ini, added HTTP healthchecker.

Available via PyPI and GitHub!


Follow

Get every new post delivered to your Inbox.

Join 349 other followers