Karol Kuczmarski's Blog

Package managers’ appreciation day

Posted on Sat 26 March 2016 in Programming • Tagged with packages, package manager, node.js, npm, C++ • Leave a comment

By now you have probably heard about the infamous “npm-gate” that swept through the developer community over the last week. It has been brought up, discussed, covered, meta-discussed, satirized, and even featured by some mainstream media. Evidently the nerds have managed to stir up some serious trouble again, and it only took them 11 lines of that strange thing they call “code”.

No good things in small packages

When looking for a culprit, the one party that everyone pounced on immediately was of course the npm itself. With its myriad of packages that could each fit in a tweet, it invites to create the exact house of cards we’ve seen collapse.

This serves as a good wake-up call, of course. But it also compels to throw the baby out with the bathwater, and draw a conclusion that may be a little too far-fetched. Like perhaps declaring the entire idea of managing dependencies “the npm way” suspect. If packages tend to degenerate into something as ludicrous as isArray — to say nothing of left-pad, which started the whole debacle — then maybe this approach to software reusability has simply bankrupted itself?

A world without *pm

I’m right away responding to that with a resounding “No!”. Package management as a concept is not responsible for the poor decision making of one specific developer collective. And anyone who might think tools like npm do more harm than good I ask: have you recently written any C++?

See, C++ is the odd one among languages that at least pretend to be keeping up with the times. It doesn’t present a package management story at all. That’s right — the C++ “ecosystem”, as it stands now, has:

no package manager
no repository of packages
no unified way of managing dependencies
no way to isolate development environments of different projects from one another

Adding any kind of third-party dependency to a C++ project — especially a portable one, which is allegedly one of C++’s strengths — is a considerable pain, even when it doesn’t require any additional libraries by itself. And environment isolation? Some people are using Linux containers (!) for this, which is like dealing with a mosquito by shooting it with a howitzer.

Billions upon billions of lines
To build a C++ binary, you must first build the userspace.

But hey, at least they can use apt-get, right?…

So, string padding incidents aside, package managers are absolutely essential. Sure, we can and should discuss the merits of their particular flavors and implementation details —like whether it’s prudent to allow “delisting” of packages. As a whole, however, package managers deserve recognition as a crucial part of modern language tooling that we cannot really do without.

Requirements for Python’s pip

Posted on Sun 21 February 2016 in Code • Tagged with Python, pip, packages, dependencies • Leave a comment

In this post I’ll describe all (hopefully all!) the various ways you can specify a single dependency for a Python package.

This assumes pip is used for installation. The list of dependencies then goes either in install_requires= parameter of the setup function within setup.py, or as a separate requirements.txt file. Commonly, it will actually go in both places, with the latter being the canonical source of truth:

from setuptools import setup

with open('requirements.txt') as rf:
    setup(
        # ...
        install_requires=rf.readlines(),
    )

More details about this approach can be found in one of my previous posts.

Here, I will concentrate on the format of a single line in requirements.txt that defines a dependency. There are numerous variants that pip supports, and they are all described in excruciating detail in PEP 440. This post shall serve as a short reference on the most useful ones.

Package name (and version)

The simplest and most common option is to identify a dependency by its package name:

SQLAlchemy

This will locate it in a global index of packages, which is sometimes called a “cheese shop”. Currently, by far the most popular package registry for Python is PyPI, and pip uses it by default¹.

Without any further modifiers, pip will download and install the “current” version of the package — either the newest, or the one designated explicitly by a maintainer. This obviously makes the dependency somewhat unpredictable, for it can mean unintended upgrades that introduce breaking changes to your code.

To prevent this, you’d normally pin the dependency to an exact version²:

SQLAlchemy==0.9.10

Other comparison operators are also available:

SQLAlchemy>=0.9.10
SQLAlchemy<1.0.0

and can even be combined:

SQLAlchemy>=0.9,<1.0.0

Specs like that will make pip find the newest version that’s within given range. Assuming your dependency follows the semantic versioning scheme, this will allow you to stay on top of any minor bugfixes and improvements to an older release (0.9.x here), without the risk of accidentally upgrading to a new one (1.x) that your code is not compatible with yet.

Repository URL

Sometimes you want to live on the bleeding edge, though, and depend not just on the latest release, but the head commit to the package’s repository. This makes sense especially in large systems that are distributed among multiple repos, and where development happens in lockstep.

For those occasions, and a few others, pip can recognize direct repository URLs. They are in the format:

$VCS+$PROTOCOL://$URL@$LABEL#egg=$PACKAGE

where the $PROTOCOL part can be optional if the version control system has a default there. That’s for example the case for git, which is of course the most important VCS you’d be interested in³:

git://git.example.com/somepackage#egg=somepackage

Note that the #egg=$PACKAGE part is not a part of the $URL, and it’s only there to give a local name for the package distribution. This is what makes it possible to refer to it later via pip, if only to remove it with pip uninstall $PACKAGE. Of course, the sanest practice is to use the PyPI moniker if possible.

When no $LABEL is given, pip will use the HEAD, trunk, tip, or the equivalent default/current revision from the repo. Often though (at least in case of Git), you would also pick a branch, tag, or even a particular commit hash:

git+https://github.com/Xion/unmatcher.git@0.1.3.1#egg=unmatcher
git+ssh://github.com/You/yourpackage.git@master#egg=yourpackage
git+https://github.com/mitsuhiko/jinja2.git@5b498453b5898257b2287f14ef6c363799f1405a#egg=Jinja2

The last two options could be a good choice even with third party packages, when you don’t want to wait for a new PyPI release to get a necessary feature or an urgent bug fix.

Local filesystem

Lastly, you can ask pip to install a package from a local directory or archive. The former option is often used with the -e (--editable) flag for pip install. This installs the package in the so-called development mode, allowing you to edit its source code in-place:

$ pip install -e /home/me/Code/myotherpackage

You almost certainly don’t want to put this line in requirements.txt: you should still be pulling the other package from PyPI. But if it’s your own one — maybe a self-contained utility library used by your main program — this setup will be very helpful for making changes to it, informed by your own usage of the package.

This can be changed with --index_url flag to pip install. Running local indexes is a good practice for Python shops, especially those that rely on pip install as part of their deployment process. ↩
If the package uses semantic versioning, a possible alternative to == is ~=, which means “compatible” version. The precise meaning of this is somewhat complicated, but it roughly means that upgrades are permitted as long as nothing in the public interface changes. ↩
Other options include hg (Mercurial), svn, and bzr (Bazaar). ↩