__name__ == __main__ considered harmful

Every single Python tutorial shows the pattern of

# define functions, classes,
# etc.

if __name__ == '__main__':
    main()

This is not a good pattern. If your code is not going to be in a Python module, there is no reason not to unconditionally call ‘main()’ at the bottom. So this code will only be used in modules — where it leads to unpredictable effects. If this module is imported as ‘foo’, then the identity of ‘foo.something’ and ‘__main__.something’ will be different, even though they share code.

This leads to hilarious effects like @cache decorators not doing what they are supposed to, parallel registry lists and all kinds of other issues. Hilarious unless you spend a couple of hours debugging why ‘isinstance()’ is giving incorrect results.

If you want to write a main module, make sure it cannot be imported. In this case, reversed stupidity is intelligence — just reverse the idiom:

# at the top
if __name__ != '__main__':
    raise ImportError("this module cannot be imported")

This, of course, will mean that this module cannot be unit tested: therefore, any non-trivial code should go in a different module that this one imports. Because of this, it is easy to gravitate towards a package. In that case, put the code above in a module called ‘__main__.py‘. This will lead to the following layout for a simple package:

PACKAGE_NAME/
             __init__.py
                 # Empty
             __main__.py
                 if __name__ != '__main__':
                     raise ImportError("this module cannot be imported")
                 from PACKAGE_NAME import api
                 api.main()
             api.py
                 # Actual code
             test_api.py
                 import unittest
                 # Testing code

And then, when executing:

$ python -m PACKAGE_NAME arg1 arg2 arg3

This will work in any environment where the package is on the sys.path: in particular, in any virtualenv where it was pip-installed. Unless a short command-line is important, it allows skipping over creating a console script in setup.py completely, and letting “python -m” be the official CLI. Since pex supports setting a module as an entry point, if this tool needs to be deployed in other environment, it is easy to package into a tool that will execute the script:

$ pex . --entry-point SOME_PACKAGE --output-file toolname
Advertisements

One Response to __name__ == __main__ considered harmful

  1. eryk sun says:

    Testing for `’__main__’` is required for multiprocessing on Windows. The NT executive doesn’t expose a `fork` system call to the Windows environment (though it’s available to the POSIX and Linux subsystems). So the multiprocessing module has to spawn a new process and run the script with a name that’s not `’__main__’`. To avoid recursively spawning processes, setup code has to be gated by testing for `’__main__’`.

    Also, a module may have a script interface that implements a limited test or demo. I do this all the time and think it’s fine. It can be run using Python’s `-m` command-line option. That said, if a package is intended to function as both a library and an application, then it should define setuptools entry points. Installing the package creates scripts for the entry points. It also creates EXE wrappers on Windows.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: