Introducing callee: matchers for unittest.mock

Posted on Sun 20 March 2016 in News • Tagged with Python, testing, mocking, argument matchersLeave a comment

Combined with dynamic typing, the lax object model of Python makes it pretty easy to write unit tests that involve mocking some parts of the tested code. Still, there’s plenty of third party libraries — usually with “mock” somewhere in the name — which promise to make it even shorter and simpler. There’s evidently been a need here that the standard library didn’t address adequately.

It changed, however, with Python 3.3. This version introduced a new unittest.mock module which essentially obsoleted all the other, third party solutions. The module, available also as a backport for Python 2.6 and above, is flexible, easy to use, and offers most if not all of the requisite functionality.

Called it, maybe?

Well, with one notable exception. If we mock a function, or any other callable object:

# (production code)
class ProductionClass(object):
    def foo(self):
        self.florb(...)

# (test code)
something = ProductionClass()
something.florb = mock.Mock()

we have very limited capabilities when it comes to verifying how they were called. Sure, we can be extremely specific:

something.florb.assert_called_with('this particular string')

or very lenient:

something.florb.assert_called_with(mock.ANY)  # works for _any_ argument

but there is virtually nothing in between. Suppose we don’t care too much what exact string the method was called with, only that it was some string that’s long enough? Things get awkward very fast then:

for posargs, kwargs in something.florb.call_args_list:
    arg = posargs[0]
    self.assertIsInstance(arg, str)
    self.assertGreaterEqual(len(arg), 16)
    break
else:
    self.fail('required call to %r not found' % (something.florb,))

Yes, this is what’s required: an actual loop through all the mock’s calls1; unpacking tuples of positional and keyword arguments; and manual assertions that won’t even emit useful error messages when they fail.

Eww! Surely Python can do better than that?

Meet callee

Why, of course it can! Today, I’m releasing callee: a library of argument matchers compatible with unittest.mock. It enables you to write much simpler, more readable, and delightfully declarative assertions:

from callee import String, LongerOrEqual
something.florb.assert_called_with(String() & LongerOrEqual(16))

The wide array of matchers offered by callee includes numeric, string, and collection ones. And if none suits your particular needs, it’s a trivial matter to implement a custom one.

Check the library’s documentation for more examples, or jump right in to the installation guide or a full matcher reference.


  1. A loop is not necessary if we simulate assert_called_once_with rather than assert_called_with, but we still have to dig into mock.call objects to eke out the arguments. 

Continue reading

Requirements for Python’s pip

Posted on Sun 21 February 2016 in Code • Tagged with Python, pip, packages, dependenciesLeave a comment

In this post I’ll describe all (hopefully all!) the various ways you can specify a single dependency for a Python package.

This assumes pip is used for installation. The list of dependencies then goes either in install_requires= parameter of the setup function within setup.py, or as a separate requirements.txt file. Commonly, it will actually go in both places, with the latter being the canonical source of truth:

from setuptools import setup

with open('requirements.txt') as rf:
    setup(
        # ...
        install_requires=rf.readlines(),
    )

More details about this approach can be found in one of my previous posts.

Here, I will concentrate on the format of a single line in requirements.txt that defines a dependency. There are numerous variants that pip supports, and they are all described in excruciating detail in PEP 440. This post shall serve as a short reference on the most useful ones.

Package name (and version)

The simplest and most common option is to identify a dependency by its package name:

SQLAlchemy

This will locate it in a global index of packages, which is sometimes called a “cheese shop”. Currently, by far the most popular package registry for Python is PyPI, and pip uses it by default1.

Without any further modifiers, pip will download and install the “current” version of the package — either the newest, or the one designated explicitly by a maintainer. This obviously makes the dependency somewhat unpredictable, for it can mean unintended upgrades that introduce breaking changes to your code.

To prevent this, you’d normally pin the dependency to an exact version2:

SQLAlchemy==0.9.10

Other comparison operators are also available:

SQLAlchemy>=0.9.10
SQLAlchemy<1.0.0

and can even be combined:

SQLAlchemy>=0.9,<1.0.0

Specs like that will make pip find the newest version that’s within given range. Assuming your dependency follows the semantic versioning scheme, this will allow you to stay on top of any minor bugfixes and improvements to an older release (0.9.x here), without the risk of accidentally upgrading to a new one (1.x) that your code is not compatible with yet.

Repository URL

Sometimes you want to live on the bleeding edge, though, and depend not just on the latest release, but the head commit to the package’s repository. This makes sense especially in large systems that are distributed among multiple repos, and where development happens in lockstep.

For those occasions, and a few others, pip can recognize direct repository URLs. They are in the format:

$VCS+$PROTOCOL://$URL@$LABEL#egg=$PACKAGE

where the $PROTOCOL part can be optional if the version control system has a default there. That’s for example the case for git, which is of course the most important VCS you’d be interested in3:

git://git.example.com/somepackage#egg=somepackage

Note that the #egg=$PACKAGE part is not a part of the $URL, and it’s only there to give a local name for the package distribution. This is what makes it possible to refer to it later via pip, if only to remove it with pip uninstall $PACKAGE. Of course, the sanest practice is to use the PyPI moniker if possible.

When no $LABEL is given, pip will use the HEAD, trunk, tip, or the equivalent default/current revision from the repo. Often though (at least in case of Git), you would also pick a branch, tag, or even a particular commit hash:

git+https://github.com/Xion/unmatcher.git@0.1.3.1#egg=unmatcher
git+ssh://github.com/You/yourpackage.git@master#egg=yourpackage
git+https://github.com/mitsuhiko/jinja2.git@5b498453b5898257b2287f14ef6c363799f1405a#egg=Jinja2

The last two options could be a good choice even with third party packages, when you don’t want to wait for a new PyPI release to get a necessary feature or an urgent bug fix.

Local filesystem

Lastly, you can ask pip to install a package from a local directory or archive. The former option is often used with the -e (--editable) flag for pip install. This installs the package in the so-called development mode, allowing you to edit its source code in-place:

$ pip install -e /home/me/Code/myotherpackage

You almost certainly don’t want to put this line in requirements.txt: you should still be pulling the other package from PyPI. But if it’s your own one — maybe a self-contained utility library used by your main program — this setup will be very helpful for making changes to it, informed by your own usage of the package.


  1. This can be changed with --index_url flag to pip install. Running local indexes is a good practice for Python shops, especially those that rely on pip install as part of their deployment process. 

  2. If the package uses semantic versioning, a possible alternative to == is ~=, which means “compatible” version. The precise meaning of this is somewhat complicated, but it roughly means that upgrades are permitted as long as nothing in the public interface changes. 

  3. Other options include hg (Mercurial), svn, and bzr (Bazaar). 

Continue reading

Retry idiom for Python

Posted on Wed 27 January 2016 in Code • Tagged with Python, exceptions, elseLeave a comment

A relatively little known feature of Python is the else block for control flow statements other than if.

If you haven’t heard about it before, you can provide such a block for both while and for loops, as well as any variant of the try statement, Its functionality is roughly analogous in both cases:

  • in loops, the else block is executed if the loop didn’t exit abnormally (i.e. with break)
  • in try constructs, the else block runs if no exception happened

Likely because of the unique semantics that don’t exist in other languages, neither of those constructs has been seen much in real world code. Recently, however, I’ve found they can be combined into a very pythonic pattern that’s also quite useful.

The trick

Say you have a task that won’t always succeed. Perhaps it’s a request made to a janky server, or other network operartion that’s prone to timeouts. Since failures are likely to be transient, you’d like to retry it several more times before giving up permanently.

With try/except block, you can detect those half-expected failures. With a simple loop, you can repeat the attempt for as many times as you deem feasible. Combined, they can solve the problem rather neatly:

for _ in range(MAX_RETRIES):
    try:
        # ... do stuff ...
    except SomeTransientError:
        # ... log it, sleep, etc. ...
        continue
    else:
        break
else:
    raise PermanentError()

But why?

What’s the deal with the elses here, though? Are they both necessary?

The simple answer is of course no. else after either a loop or try/except block is always a syntactic sugar. Any code that contains it can be transformed into an equivalent snippet that utilizes different techniques to achieve the same effect.

But this view isn’t very useful, for many of the essential features in any programming languages can be dismissed as superfluous using this reasoning. The real question is whether the above idiom is more readable and understandable than the alternatives.

To that, I posit, the answer is: absolutely.

Desugaring

Without the double else, this example would have to be written in a considerably more convoluted way:

retries = MAX_RETRIES
while retries > 0:
    try:
        # ... do stuff ...
        break
    except SomeTransientError:
        # ... log it, sleep, etc. ...
        retries -= 1
if retries == 0:
    raise PermanentError()

Although at first glance the difference may be minuscule, this version adds significant extra busywork the programmer has to pay careful attention to:

  • The retries variable now has to be explicit, because the final conditional statement must look at its value.
  • We can’t use a for loop anymore (e.g. for retries in range(MAX_RETRIES)), because we wouldn’t distinguish the “success at last try” and “retry limit exceeded” cases: they’d both result in retries equal to MAX_RETRIES - 1 after the loop1.
  • As a result, we have to remember to decrement the counter ourselves upon an error.

Additionally, the break is easy to miss amidst the actual logic within the try block, both for developer who writes the code and for any subsequent readers. An alternative is to move it outside of the try/except clause, but that in turn reintroduces continue into the except branch and further complicates the whole flow.

In short, the desugared version is more error-prone (all those off-by-ones!) and also quite inscrutable.


  1. Or to 1, if we count from MAX_RETRIES down to zero. 

Continue reading

URL library for Python

Posted on Fri 27 November 2015 in Code • Tagged with Python, URL, furlLeave a comment

Python has many batteries included, but a few things are still conspicuously missing.

One of them is a standardized and convenient approach to URL manipulation, akin to the URI class in Java. There are some functions in urllib, of course (or urllib.parse in Python 3), but much like their HTTP-related comrades, they prove rather verbose and somewhat clunky.

HTTP, however, is solved by the Requests package, so you may wonder if there is some analogous package for URL operations. The answer is affirmative, and the library in question is, quite whimsically, called furl.

URL in a wrap

The sole interesting part of the furl interface is the furl class. It represents a single URL, broken down to its constituents, with properties and methods for both reading them out and replacing with new values.

Thanks to this handy (and quite obvious) abstraction, common URL operations become quite simple and self-documenting:

from furl import furl


def to_absolute(url, base):
    """If given ``url`` is a relative path,
    make it relative to the ``base``.
    """
    furled = furl(url)
    if not furled.scheme:
        return furl(base).join(url).url
    return url


def is_same_origin(*urls):
    """Check whether URLs are from the same origin (host:port)."""
    origins = set(url.netloc for url in map(furl, urls))
    return len(origins) <= 1


def get_facebook_username(profile_url):
    """Get Facebook user ID from their profile URL."""
    furled = furl(profile_url)
    if not (furled.host == 'facebook.com' or
            furled.host.endswith('.facebook.com')):
        raise ValueError("not a Facebook URL: %s" % (profile_url,))
    return furled.path.segments[-1]

# etc.

This includes the extremely prevalent, yet very harmful pattern of bulding URLs through string interpolation:

url = '%s?%s' % (BASE_URL, urlencode(query_params))

Besides looking unpythonically ugly, it’s also inflexible and error-prone. If BASE_URL gains some innate query string params ('http://example.com/?a=b'), this method will start producing completely invalid urls (with two question marks, e.g. 'http://example.com/?a=b?foo=bar').

The equivalent in furl has none of these flaws:

url = furl(BASE_URL).add(query_params).url

The full package

To see the full power of furl, I recommend having a look at its API documentation. It’s quite clear and should be very easy to use.

Continue reading

Celery task in a Flask request context

Posted on Tue 03 November 2015 in Code • Tagged with Celery, Flask, Python, queue, request contextLeave a comment

Celery is an asynchronous task worker that’s frequently used for background processing in Python web apps. Rather than performing a time-consuming task within the request loop, we delegate it to a queue so that a worker process can pick it up when ready. The immediate benefit is much better latency for the end user. Pros also include easier scalability, since you can adjust the number of workers to your task load.

Examples of tasks that are usually best done in the background vary from fetching data through a third-party API to sending emails, and from pushing mobile notifications to pruning the database of stale records. Anything that may take more than a couple hundred milliseconds to complete — and isn’t absolutely essential to the current HTTP request — is typically a good candidate for asynchronous processing.

In Celery, workers are just regular Python processes, often running the exact same code that’s powering the app’s web frontend1. But unlike most of that code, they aren’t servicing any HTTP requests — they simply run some function with given arguments, both specified by whomever sent a task for execution. Indeed, those functions don’t even know who or what asked for them to be executed.

Neat, right? It is what we usually compliment as decoupling, or separation of concerns. They are valuable qualities even regardless of the UI and scaling benefits mentioned earlier.

Not quite web

But those qualities come with a trade-off. Task code is no longer a web frontend code: it doesn’t run within the comfy environment of our web framework of choice. Losing it may be quite unnerving, actually, because in a typical web application there will be many things tied directly to the HTTP request pipeline. After all, this is what web applications do — respond to HTTP requests — so it often makes perfect sense e.g. to marry the request flow with database transactions, committing or rolling those back according to HTTP status code that the app produced.

Tasks may also require a database, though, if only to assert the expected state of the world. Similar goes for memcache, a Redis instance, or basically any resource used by the frontend code. Alas, it’s quite possible the very reason we delegate work to a task is to shift lengthy interactions with those external systems away from the UI. Obviously, we’re going to need them for that!

Fake a request

So one way or another, our tasks will most likely need some initialization and/or cleanup code. And since it’s probably the same code that most HTTP request handlers require and use already, why not just pretend we’re handling a request after all?

In Flask, we can pull that off rather easily. The test_request_context method is conveniently provided to allow for faking the request context — that is, an execution environment for HTTP handlers. Like the name suggest, it is used mostly for testing, but there is nothing stopping us from using it in tasks run by Celery.

We probably don’t want to call it directly, though. What would be better is to have Celery prepare the context first, and then run the task code as if it was an HTTP handler. For even better results, the context would preserve information extracted from the actual HTTP request, one that has sent the task for execution. Moving some work to the background would then be a trivial matter, for both the task and the original handler would operate within the same environment.

Convenient? I believe so. And as I’ll demonstrate next, it isn’t very complicated to implement either.

Wrap the decorator?

At least one piece of the solution should stand out as pretty obvious. Since our intention is to wrap the task’s code in some additional packaging — the request context — it seems fairly natural to write our own @task decorator:

import functools

from myapp import app, celery


def task(**kwargs):
    """Decorator function to apply to Celery tasks.
    Executes the actual task inside Flask's test request context.
    """
    def decorator(func):
        """Actual decorator."""
        @celery.task(**kwargs)
        @functools.wraps(func)
        def wrapped(*args, **kwargs):
            with app.test_request_context():
                return func(*args, **kwargs)

        return wrapped

    return decorator

Here, app is the Flask application, and celery is the Celery object that’s often configured alongside it.

The Task class

While this technique will give us a request context inside the task code, it won’t be the request context from which the task has been sent for execution. To replicate that context correctly, we need additional support on the sender side.

Internally, Celery is converting every function we annotate with @task decorator into a subclass of Celery.app.task.Task. This process can be customized, for example by providing an explicit base= parameter to @task which specifies a custom Task subclass to use in place of the default one. Within that subclass, we’re free to override any functionality that we need to.

In our case, we’ll scrape the current request context from Flask for all relevant information, and later use it to recreate the context in the task code.

But before we get to that, let’s just create the Task subclass and move the above execution logic to it. This way, users won’t have to use a completely new @task decorator which would needlessly couple them to a specific Celery app instance:

from celery import Task

class RequestContextTask(Task):
    """Base class for tasks that run inside a Flask request context."""
    abstract = True

    def __call__(self, *args, **kwargs):
        with app.test_request_context():
            return super(RequestContextTask, self).__call__(*args, **kwargs)

Instead, they can either set this class as base= for a specific task:

@celery.task(base=RequestContextTask)
def sync_user_data(user_id):
    # ...

or make it into new default for all their tasks:

celery = Celery(...)
celery.Task = RequestContextTask

Invocation patterns

When the frontend asks for a task to be executed, it most often uses the Task.delay method. It will package a payload contaning task arguments, and send it off through a broker — usually an AMQP-based queue, such as RabbitMQ — so that a Celery worker can pick it up and actually execute.

But there are other means of task invocation. We can even run it “in place”, locally and synchronously, which is especially useful for various testing scenarios. Lastly, a task can also be retried from within its own code, terminating its current run and scheduling another attempt for some future date.

Obviously, for the RequestContextTask to be useful, it needs to behave correctly in every situation. Therefore we need to cover all the entry points I’ve mentioned — the asynchronous call, a synchronous invocation, and a task retry:

class RequestContextTask(Task):
    # ...

    def apply_async(self, args=None, kwargs=None, **rest):
        self._include_context(kwargs)
        return super(RequestContextTask, self) \
            .apply_async(args, kwargs, **rest)

    def apply(self, args=None, kwargs=None, **rest):
        self._include_context(kwargs)
        return super(RequestContextTask, self) \
            .apply(args, kwargs, **rest)

    def retry(self, args=None, kwargs=None, **rest):
        self._include_context(kwargs)
        return super(RequestContextTask, self) \
            .retry(args, kwargs, **rest)

Note that Task.apply_async is being called internally by Task.delay, so it’s only that first method that we have to override.

Context in a box

As you can deduce right away, the Flask-related magic is meant to go in the _include_context method. The idea is to prepare arguments for the eventual invocation of Flask.test_request_context, and pass them through an extra task parameter. Those arguments are relatively uncomplicated: they are just a medley of various pieces of information that we can easily obtain from the Flask’s request object:

from flask import has_request_context, request


class RequestContextTask(Task):
    CONTEXT_ARG_NAME = '_flask_request_context'

    # ...

    def _include_request_context(self, kwargs):
        """Includes all the information about current HTTP request context
        as an additional argument to the task.
        """
        if not has_request_context():
            return

        context = {
            'path': request.path,
            'base_url': request.url_root,
            'method': request.method,
            'headers': dict(request.headers),
        }
        if '?' in request.url:
            context['query_string'] = request.url[(request.url.find('?') + 1):]

        kwargs[self.CONTEXT_ARG_NAME] = context

On the worker side, we simply unpack them and recreate the context:

from flask import make_response

class RequestContextTask(Task):
    # ...
    def __call__(self, *args, **kwargs):
        call = lambda: super(RequestContextTask, self).__call__(*args, **kwargs)

        context = kwargs.pop(self.CONTEXT_ARG_NAME, None)
        if context is None or has_request_context():
            return call()

        with app.test_request_context(**context):
            result = call()
            app.process_response(make_response(result or ''))

        return result

The only tricky part is calling Flask.process_response at the end. We need to do that for the @after_request hooks to execute correctly. This is quite crucial, because those hooks are where you’d normally put important cleanup code, like commit/rollback of the database transaction.

Complete solution

To see how all those code snippets fit together, see this gist. You may also want to have a look at the article on Celery integration in Flask docs for some tips on how to integrate it with your own project.


  1. This isn’t strictly necessary, as Celery supports sending tasks for execution by explicit name. For that request to reach the worker, however, the task broker configuration must be correctly shared between sender and the worker. 

Continue reading

Actually, Python enums are pretty OK

Posted on Sun 25 October 2015 in Code • Tagged with Python, enums, serializationLeave a comment

Little over two years ago, I was writing about my skepticism towards the addition of enum types to Python. My stance has changed somewhat since then, and I now think there is some non-trivial value that enums can add to Python code. It becomes especially apparent in the circumstances involving any kind of persistence, from pickling to databases.

I will elaborate on that through the rest of this post.

Revision

First, let’s recap (or perhaps introduce) the important facts about enums in Python. An enum, or an enumeration type, is a special class that inherits from enum.Enum1 and defines one or more constants:

from enum import Enum

class Cutlery(Enum):
    knife = 'knife'
    fork = 'fork'
    spoon = 'spoon'

Their values are then converted into singleton objects with distinct identity. They available as attributes of the enumeration type:

>>> Cutlery.fork
<Cutlery.fork: 'fork'>

As a shorthand for when the values themselves are not important, the Enum class can be directly invoked with constant names as strings:

Cutlery = Enum('Cutlery', ['knife', 'fork', 'spoon'])

Resulting class offers some useful API that we’d expect from an enumeration type in any programming language:

>>> list(Cutlery)
[<Cutlery.knife: 'knife'>, <Cutlery.fork: 'fork'>, <Cutlery.spoon: 'spoon'>]
>>> [c.value for c in Cutlery]
['knife', 'fork', 'spoon']
>> Cutlery.knife in Cutlery
True
>>> Cutlery('spoon')
<Cutlery.spoon: 'spoon'>
>>> Cutlery('spork')
Traceback (most recent call last):
# (...)
ValueError: 'spork' is not a valid Cutlery

How it was done previously

Of course, there is nothing really groundbreaking about assigning some values to “constants”. It’s been done like that since times immemorial, and quite often those constants have been grouped within finite sets.

Here’s an example of a comment model in some imaginary ORM, complete with a status “enumeration”:

class Comment(Model):
    APPROVED = 'approved'
    REJECTED = 'rejected'
    IN_REVIEW = 'in_review'
    STATES = frozenset([APPROVED, REJECTED, IN_REVIEW])

    author = StringProperty()
    text = TextProperty()
    state = String(choices=STATES)

# usage
comment = Comment(author="Xion", text="Boo")
comment.state = Comment.IN_REVIEW
comment.save()

Converting it to use an Enum is relatively straightforward:

class Comment(Model):
    class State(Enum):
        approved = 'approved'
        rejected = 'rejected'
        in_review = 'in_review'

    author = StringProperty()
    text = TextProperty()
    state = StringProperty(choices=[s.value for s in State])

comment = Comment(author="Xion", text="Meh.")
comment.state = Comment.State.approved.value
comment.save()

It is not apparent, though, if there is any immediate benefit of doing so. True, we no longer need to define the STATES set explicitly, but saving on this bit of boilerplate is balanced out by having to access the enum’s value when assigning it to a string property.

All in all, it seems like a wash — though at least we’ve established that enums are no worse than their alternatives :)

Enums are interoperable

Obviously, this doesn’t exactly sound like high praise. Thing is, we haven’t really replaced the previous solution completely. Remnants of it still linger in the way the state property is declared. Even though it’s supposed to hold enum constants, it is defined as string, which at this point is more of an implementation detail of how those constants are serialized.

What were really need here is a kind of EnumProperty that’d allow us to work solely with enum objects. Before the introduction of a standard Enum base, however, there was little incentive for ORMs and other similiar libraries to provide such a functionality. But now, it makes much more sense to support enums as first-class citizens, at least for data exchange and serialization, because users can be expected to already prefer the standard Enum in their own code.

Thus, the previous example changes into something along these lines:

class Comment(Model):
    # ...
    state = EnumProperty(State)

comment = Comment(author="Xion", text="Yay!")
comment.state = Comment.State.approved  # no .value!
comment.save()

Details of EnumProperty, or some equivalent construct, are of course specific to any given data management library. In SQLAlchemy, for example, a custom column type can handle the necessary serialization and deserialization between Python code and SQL queries, allowing you to define your models like this2:

class Comment(Model):
    class State(Enum):
        # ...

    author = Column(String(255))
    text = Column(Text)
    state = Column(EnumType(State, impl=String(32)))

# usage like above

In any case, the point is to have Python code operate solely on enum objects, while the persistence layer takes care of converting between them and their serializable values.

Enums are extensible

The other advantage Enums have over loose collections of constants is that they are proper types. Like all user-defined types in Python, they can have additional methods and properties defined on their instances, or even on the type itself.

Although this capability is (by default) not as powerful as e.g. in Java — where each enum constant can override a method in its own way — it can nevertheless be quite convenient at times. Typical use cases include constant classification:

class Direction(Enum):
    left = 37
    up = 38
    right = 39
    down = 40

    @property
    def is_horizontal(self):
        return self in (Direction.left, Direction.right)

    @property
    def is_vertical(self):
        return self in (Direction.down, Direction.up)

and conversion:

    def as_vector(self):
        return {
            Direction.left: (-1, 0),
            Direction.up: (0, -1),
            Direction.right: (1, 0),
            Direction.down: (0, 1),
        }.get(self)

For the latter, it would be handy to have the Java’s ability to attach additional data to an enum constant. As it turns out, Python supports this feature natively in a very similar way. We simply have to override enum’s __new__ method to parse out any extra values from the initializer and turn them into attributes of the enum instance:

class Direction(Enum):
    left = 37, (-1, 0)
    up = 38, (0, -1)
    right = 39, (1, 0)
    down = 40, (0, 1)

    def __new__(cls, keycode, vector):
        obj = object.__new__(cls)
        obj._value_ = keycode
        obj.vector = vector
        return obj

It’s possible, in fact, to insert any arbitrary computation here that yields the final _value_ of an enum constant3. This trick can be used to, for example, construct enums that automatically number themselves.

Finally, we can add static methods, class methods, or class properties to the Enum subclass, just like we would do with any other class:

class MyEnum(Enum):
    # ...

    @classproperty
    def __values__(cls):
        return [m.value for m in cls]

Enums just are

All these things are possible primarly because of the most notable aspect of Python enums: their existence as an explicit concept. A syntactically unorganized bunch of constants cannot offer half of the highlighted features because there is nowhere to pin them on.

For that reason alone, using enums as an explicit — rather than implicit — pattern seems worthwhile. The one benefit we’re certain to reap is better code structure through separation of important concepts.


  1. The enum module is part of the Python standard library since version 3.4 but a fully functional backport is available for Python 2.x as well. 

  2. It is even possible to instruct SQLAlchemy how to map Python enums to ENUM types in database engines that support it, but details are outside of the scope of this article. 

  3. If you’re fine with the enum’s value being the whole tuple (everything after the = sign), you can override __init__ instead of __new__ (cf. the planet example from standard docs). 

Continue reading

CSS class helper for Jinja

Posted on Thu 22 October 2015 in Code • Tagged with Jinja, CSS, Python, FlaskLeave a comment

One of the great uses for a templating engine such as Jinja is reducing the clutter in our HTML source files. Even with the steady advances in the CSS1 standards, various div.wrapper, div.container, div.content, and other presentational elements are still a common fact of life. Unless you’re one of the cool kids who use Web Components with Polymer, your main option for abstracting this boilerplate away is defining some more general template macros.

As with any kind of abstraction, it’s crucial to balance broad applicability with a potential for finely-tuned customization. In the case of wrapping HTML snippets in Jinja macros, an easy way to maintain flexibility is to allow the caller to specify some crucial attributes of the root element:

{#
  Renders the markup for a <button> from Twitter Bootstrap.
#}
{% macro button(text=none, style='default', type='button', class='') %}
  <button type="{{ type }}"
        class="btn btn-{{ style }}{% if class %} {{ class }}{% endif %}"
        {{ kwargs|xmlattr }}>
    {{ text }}
  </button>
{% endmacro %}

An element id would be a good example, and in the above macro it’s handled implicility thanks to the {{ kwargs|xmlattr }} stanza.

class, however, is not that simple, because a macro like that usually needs to supply some CSS classes of its own. The operation of combining them with additional ones, passed by the caller, is awfully textual and error-prone. It’s easy, for example, to forget about the crucial space and run two class names together.

As if CSS wasn’t difficult enough to debug already!

Let them have list

The root cause for any of those problems is working at a level that’s too low for the task. The value for a class attribute may be encoded as string, but it’s fundamentally a list of tokens. In the modern DOM API, for example, it is represented as exactly that: a DOMTokenList.

I’ve found it helpful to replicate a similar mechanism in Jinja templates. The result is a ClassList wrapper whose code I quote here in full:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from collections import Iterable, MutableSet


class ClassList(MutableSet):
    """Data structure for holding, and ultimately returning as a single string,
    a set of identifiers that should be managed like CSS classes.
    """
    def __init__(self, arg=None):
        """Constructor.
        :param arg: A single class name or an iterable thereof.
        """
        if isinstance(arg, basestring):
            classes = arg.split()
        elif isinstance(arg, Iterable):
            classes = arg
        elif arg is not None:
            raise TypeError(
                "expected a string or string iterable, got %r" % type(arg))

        self.classes = set(filter(None, classes))

    def __contains__(self, class_):
        return class_ in self.classes

    def __iter__(self):
        return iter(self.classes)

    def __len__(self):
        return len(self.classes)

    def add(self, *classes):
        for class_ in classes:
            self.classes.add(class_)

    def discard(self, *classes):
        for class_ in classes:
            self.classes.discard(class_)

    def __str__(self):
        return " ".join(sorted(self.classes))

    def __html__(self):
        return 'class="%s"' % self if self else ""

To make it work with Flask, adorn the class with Flask.template_global decorator:

from myflaskapp import app

@app.template_global('classlist')
class ClassList(MutableSet):
    # ...

Otherwise, if you’re working with a raw Jinja Environment, simply add ClassList to its global namespace directly:

jinja_env.globals['classlist'] = ClassList

In either case, I recommend following the apparent Jinja convention of naming template symbols as lowercasewithoutspaces.

Becoming classy

Usage of this new classlist helper is relatively straightforward. Since it accepts both a space-separated string or an iterable of CSS class names, a macro can wrap anything the caller would pass it as a value for class attribute:

{% macro button(text=none, style='default', type='button', class='') %}
  {% set classes = classlist(class) %}
  {# ... #}

The classlist is capable of producing the complete class attribute syntax (i.e. class="..."), or omit it entirely if it would be empty. All we need to do is evaluate it using the normal Jinja expression syntax:

  <button type="{{ type }}" {{ classes }} {{ kwargs|xmlattr }}>

Before that, though, we may want to include some additional classes that are specific to our macro. The button macro, for example, needs to add the Bootstrap-specific btn and btn-$STYLE classes to the <button> element it produces2:

  {% do classes.add('btn', 'btn-' + style) %}

After executing this statement, the final class attribute contains both the caller-provided classes, as well those two that were added explicitly.


  1. Believe it or not, we can finally center the content vertically without much of an issue! 

  2. The {% do %} block in Jinja allows to execute statements (such as function calls) without evaluating values they return. It is not exposed by default, but adding a standard jinja2.ext.do extension to the Environment makes it available. 

Continue reading

Reload Jinja macro files

Posted on Thu 24 September 2015 in Code • Tagged with Python, Jinja, Flask, Werkzeug, Flask-ScriptLeave a comment

The integration between Jinja templating engine and the Flask microframework for Python is quite seamless most of the time. It also facilitates rapid development: when we run our web application through Flask’s development server, any changes in Python code will get picked up immediately without restarting it. Similarly, Jinja’s default template loader will detect whether any template has been modified after its last compilation and recompile it if necessary.

One instance when it doesn’t seem to work that well, though, is template files that only contain Jinja macros. They apparently aren’t subject to the same caching rules that apply to regular templates. Modifications to those files alone may not be picked up by Jinja Environment, causing render_template calls in Flask to (indirectly) use their stale versions.

I’ll be watching you

This problem can be alleviated by making the server watch those macro files explicitly, not unlike the Python sources it already monitors. When running a Flask server through the Flask.run method, you can pass additional keyword arguments which aren’t handled by Flask itself, but by the underlying WSGI scaffolding called Werkzeug. The automatic reloader is actually part of it and is quite configurable. Most importantly, it allows passing a list of extra_files= to watch:

import os

app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 5000)),
        extra_files=[os.path.join(app.root_path, app.template_folder, 'macros.html')])

But of course it’s tedious and error-prone to keep this list up to date manually, so let’s try to do better.

…all of you

I’m going to assume you’re keeping all your Jinja macro files inside a single directory. This makes sense, as macros are reusable pieces of template code that are imported by multiple regular templates, so they shouldn’t be scattered around the codebase without some order. The folder may be named macros, util, common, or similar; what’s important is that all the macros have a designated place in the project source tree.

Under this assumption, it’s not difficult to programmatically compute their complete list1:

from pathlib import Path

from myapplication import app  # Flask application object


#: Directories under app's template folder that contain files
#: with only Jinja macros.
JINJA_MACRO_DIRS = ('macros',)


def get_extra_files():
    """Returns an iterable of the extra files that the Flask server
    should monitor for changes and reload the app if any has been modified.
    """
    app_root = Path(app.root_path)

    for dirname in JINJA_MACRO_DIRS:
        macros_dir = app_root / app.template_folder / dirname
        for filepath in macros_dir.rglob('*.html'):
            yield str(filepath)

If you’re using Flask blueprints, in addition to the global application templates you’d also want to monitor any blueprint-specific macro files:

        for bp in (app.blueprints or {}).values():
            macros_dir = Path(bp.root_path) / bp.template_folder / dirname
            for filepath in macros_dir.rglob('*.html'):
                yield str(filepath)

A new start

How you’re going to use the get_extra_files function from the snippet above depends on how exactly you’re running the Flask development server. In the simplest case, when Flask.run is invoked at the top-level scope of some module, it’s pretty obvious:

app.run(..., extra_files=list(get_extra_files()))

More realistically, though, you may be using the Flask-Script extension for all management tasks, including starting the server. If so, calling Flask.run will be relegated to it, so you’ll need to override the Server command to inject additional parameters there:

import os

from flask.ext.script import Manager, Server as _Server

from myapplication import app


class Server(_Server):
    def __init__(self, *args, **kwargs):
        if kwargs.get('use_reloader', True):
            kwargs.setdefault('extra_files', []).extend(get_extra_files())
        super(Server, self).__init__(*args, **kwargs)


manager = Manager(app, with_default_commands=False)

server = Server(port=int(os.environ.get('PORT', 5000)))
manager.add_command('server', server)

This new server command should work just like the default one provided by Flask-Script out of the box, except that all your Jinja macro files will now be monitored for changes.


  1. Like before, I’m using the pathlib module, for which there is a backport if you’re using Python 3.3 or older. 

Continue reading