Everything but the code

(Or, “So you want to build a new open-source Python project”)

This is not going to be a “here is a list of options, but you do what is right for you” pandering post. This is going to be a “this is 2015, there are right ways to do things, and here they are” post. This is going to be an opinionated, in-your-face, THIS IS BEST post. That said, if you think that anything here does not represent the best practices in 2015, please do leave a comment. Also, if you have specific opinions on most of these, you’re already ahead of the curve — the main target audience are people new to creating open-source Python projects, who could use a clear guide.

This post will also emphasize the new project. It is not worth it, necessarily, to switch to these things in an existing project — certainly not as a “whole-sale, stop the world and let’s change” thing. But when starting a new project with zero legacy, there is no reason to do things wrong.

tl:dr; Use GitHub, put MIT license in “LICENSE”, README.rst with badges, use py.test and coverage, flake8 for static checking, tox to run tests, package with setuptools, document with sphinx on RTD, Travis CI/Coveralls for continuous integration, SemVer and versioneer for versioning, support Py2+Py3+PyPy, avoid C extensions, avoid requirements.txt.

Where

When publishing kids’ photos, you do it on Facebook, because that’s where everybody is. LinkedIn is where you connect with colleagues and lead your professional life. When publishing projects, put them on GitHub, for exactly the same reason. Have GitHub pull requests be the way contributors propose changes. (There is no need to use the “merge” button on GitHub — it is fine to merge changes via git and push. But the official “how do I submit a change” should be “pull request”, because that’s what people know).

License

Put the license in a file called “LICENSE” at the root. If you do not have a specific reason to choose otherwise, MIT is reasonably permissive and compatible. Otherwise, use something like the license chooser and remember the three most important rules:

  • Don’t create your own license
  • No, really, don’t create your own license
  • Don’t do it

At the end of the license file, you can have a list of the contributors. This is an easy place to credit them. It is a good idea to ask people who send in pull requests to add themselves to the contributor list in their first one (this allows them to spell their name and e-mail exactly the way they want to).

Note that if you use the GPL or LGPL, they will recommend putting it in a file called “COPYING”. Put it in “LICENSE” (the licenses explicitly allow it as an option, and it makes it easier for people to find the license if it always has the same name).

README

The GitHub default is README.md, but README.rst (restructured text) is perfectly supported via Sphinx, and is a better place to put Python-related documentation, because ReST plays better with Pythonic toolchains. It is highly encouraged to put badges on top of the document to link to CI status (usually Travis), ReadTheDocs and PyPI.

Testing

There are several reasonably good test runners. If there is no clear reason to choose one, py.test is a good default. “Using Twisted” is a good reason to choose trial. Using the built-in unittest runner is not a good option — there is a reason the cottage industry of “test runner” evolved. Using coverage is a no-brainer. It is good to run some functional tests too. Test runners should be able to help with this too, but even writing a Python program that fails if things are not working can be useful.

Distribute your tests alongside your code, by putting them under a subpackage called “tests” of the main package. This allows people who “pip install …” to run the tests, which means sending you bug reports is a lot easier.

Static checking

There are a lot of tools for static checking of Python programs — pylint, flake8 and more. Use at least one. Using more is not completely free (more ways to have to say “ignore this, this is ok”) but can be useful to catch more style static issue. At worst, if there are local conventions that are not easily plugged into these checkers, write a Python program that will check for them and fail if those are violated.

Meta testing

Use tox. Put tox.ini at the root of your project, and make sure that “tox” (with no arguments) works and runs your entire test-suite. All unit tests, functional tests and static checks should be run using tox. It is not a bad idea to write a tox clause that builds and tests an installed wheel. This will require including all test code in the deployed package, which is a good idea.

Set tox to put all build artifacts in a build/ top-level directory.

Packaging

Have a setup.py file that uses setuptools. Tox will need it anyway to work correctly.

Structure

It is unlikely that you have a good reason to take more than one top-level name in the package namespace. Barring completely unavoidable name conflicts, your PyPI package name should be the same as your Python package name should be the same as your GitHub project. Your Python package should live at the top-level, not under “src/” or “py/”.

Documentation

Use sphinx for prose documentation. Put it in doc/ with a relevant conf.py. Use either pydoctor or sphinx autodoc for API docs. “Pydoctor” has the potential for nicer docs, sphinx is well integrated with ReadTheDocs. Configure ReadTheDocs to auto-build the documentation every check-in.

Continuous Integration

If you enjoy owning your own machines, or platform diversity in testing really matters, use buildbot. Otherwise, take advantage for free Travis CI and configure your project with a .travis.yml that breaks your tox tests into one test per Travis clause. Integrate with coveralls to have coverage monitored.

Version management

Use SemVer. Take advantage of versioneer to help you manage it.

Release

A full run of “tox” should leave in its wake tested .zip and .whl files. A successful, post-tag run of tox, combined with versioneer, should leave behind tested .zip and .whl. The release script could be as simple as “tox && git tag $1 && (tox || (git tag -d $1;exit 1) && cp …whl and zip locations… dist/”

GPG sign dist/ files, and then use “twine” to upload them to PyPI. Make sure to upload to TestPyPI first, and verify the upload, before uploading to PyPI. Twine is a great tool, but badly documented — among other things, it is hard to find information about .pypirc. “.pypirc” is an ini file, which needs to have the following sections:

.gitignore

  • build — all your build artifacts will go here
  • dist — this is where “ready to release” output will be
  • *.egg?info — this is an artifact of sdist that is really hard to put elsewhere
  • *.pyc — ignore byte-code files
  • .coverage — coverage artifact

Python versions

If all your dependencies support Python 2 and 3, support Python 2 and 3. That will almost certainly require using “six” (or one of its competitors, like “future”). Run your unit tests under both Python 2 and 3. Make sure to run your unit tests under PyPy, as well.

C extensions

Avoid, if possible. Certainly do not use C extensions for performance improvements before (1) making sure they’re needed (2) making sure they’re helpful (3) trying other performance improvements. Ideally structure your C extensions to be optional, and fall back to a slow(er) Python implementation if they are unavailable. If they speed up something more general than your specific needs, consider breaking them out into a C-only project which your Python will depend on.

If using C extensions, regardless of whether to improve performance or integrate with 3rd party libraries, use CFFI.

If C extensions have successfully been avoided, and Python 3 compatibility kept, build universal wheels.

requirements.txt

The only good “requirements.txt” file is a non-existing one. The “setup.py” file should have the dependencies (ideally as weak-versioned as possible, usually just a “>=” for a library that tends not to break backwards compatibility a lot). Tox will maintain the virtualenv needed based on the things in the tox file, and if needing to test against specific versions, this is where specific versions belong. The “requirements.txt” file belongs in Salt-style (Chef, Fab, …) configurations, Docker-style (Vagrant-, etc.) configurations or Pants-style (Buck-, etc.) build scripts when building a Pex. This is the “deployment configuration”, and needs to be decided by a deployer.

If your package has dependencies, they belong in a setup.py. Use extended_dependencies for test-only dependencies. Lists of test dependencies, and reproducible tests, can be configured in tox.ini. Tox will take care of pip-installing relevant packages in a virtualenv when running the tests.

Thanks

Thanks to John A. Booth, Dan Callahan, Robert Collins, Jack Diedrich,  Steve Holden for their reviews and insightful comments. Any mistakes that remain are mine!

Advertisements

3 Responses to Everything but the code

  1. rbtcollins says:

    I do like this, though I’d write a different list myself :P. Also – folk should really try unittest (in 3.5), unittest2 in <3.5. Its not what it used to be.

  2. What rbtcollins said. I wrote a checklist for this sort of thing earlier this year—https://thoughtstreams.io/jml/basic-stuff-around-writing-programs/—I’m glad someone came up with an opinionated version.

    Now, would you mind doing a thing that automates all of this?

  3. […] to get acquainted with the language and, to some extent, the standard library. I have written before about how to release open source libraries — but it is quite possible that one’s first […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: