Karol Kuczmarski's Blog

You Don’t Have to Interview like Google

Posted on Mon 28 December 2015 in Programming • Tagged with Google, interviews, startups, career, hiring • Leave a comment

If you look at discussion forums for people in the CS industry — like /r/cscareequestions or even just Hacker News — you’ll find lots of talk about the so called Big Four companies.

This is mostly in the context of applying to them, or going through their interview process. Which of the large software corporations are discussed here tends to fluctuate a little bit, but both Google and Microsoft are invariably included, with Facebook popping up more often than not.

Because of their privileged positions as very desirable places to work, these companies tend to be taken as models for others to mimic¹. Probably the most apparent and direct outcome is the increasing prevalence of “Google-style” interviews, which are now utilized by countless software shops around the world.

Whiteboard coding is how they are often called. It is a subject of an intense debate, whether or not they “work”, and adequately assess the engineering aptitude of candidates. If some high profile anecdotes are of any indication, the most common complaint is that they fail to recognize competence by ignoring previous professional work, open source contributions, conference talks, and so on.

Instead, the whiteboard interview requires demand showing a “pure” problem solving ability within a relatively short time window, all without some broader context or even the usual tools of the trade: laptop, editor, and a search engine.

As a Googler who had gone through this process and has now conducted a few of those interviews himself, I find those complaints mostly valid albeit misdirected.

The problem isn’t really that Google interviews like it does.

What’s really the issue is other companies implementing the same process, and not realizing it cannot possibly work for them.

Somewhat special

It is important to understand that Google’s stance on interviewing is influenced by some unique circumstances:

very high and steady influx of potential candidates
comparatively long tenure of the average engineer and the general focus on employee retention
huge and mostly proprietary software stack working at a scale that almost no others do

From this perspective, it makes sense to utilize cautious hiring strategies that may result in a high ratio of false negatives². When rejections are cheap but the new hires are supposed to be here for the long run — partially because of the long ramp-up time necessary to become productive — it can be costly to give candidates the benefit of the doubt.

The last point also explains why proficiency with specific technologies is less useful than general flexibility grounded in strong fundamentals. To put it shortly, Google prefers you know CS rather than CSS.

Finally, whiteboard coding is just one input to the evaluation process, but it tends to be the most trustworthy one. Besides programming aptitude, the ideal candidate would show a proven track record of technical leadership while tackling complex problems that span a broad scope.
Unfortunately, those qualities are difficult to convey efficiently and reliably through the usual industry channels: a resume, references from previous jobs, GitHub profile, etc.

Different conditions

Given the above reasoning as to why “Google-style” interviews seem to work well for Google, I hope it’s evident why they are likely a poor choice for companies that don’t share the same characteristics as the Big 4.

For one, it is highly unusual for a software shop in today’s market to command a sizeable pool for qualified candidates. Software engineering vacancies often go unfilled for weeks and months, even if the company isn’t exactly looking for “rockstars”, “ninjas”, “gunslingers”, or whatever the silly term du jour is. Those who meet the requirements usually have their pick at many different offers, too.

The reality for many companies (especially startups) is also one where they are unable or unwilling to invest much in the retention of their employees. Because the employment relationship in this industry tends to be quite volatile (which isn’t necessarily a bad thing), it often makes sense for companies to look for a near-immediate payoff when hiring.

We owe it to the prevalent open source technologies that this isn’t entirely unreasonable. If your software stack is composed entirely of components that are available in the open, you can probably find engineers who are familiar with most of them. They can be productive members of your team almost literally from day one!

Right tool for the job

The most important observation is that every company is different, and following the One True Best Practice™ will likely prevent you from utilizing the best qualities of your work place. Smaller shops, for example, could take a more personalized approach to hiring: let candidates actually sit with engineers, solve real-life problems, and have deep technical conversations.

Of course, you may still find whiteboard coding valuable in its own right. Indeed, a simple test for at least the basic programming skill appears to be a necessary sanity check.

But a full suite of difficult technical interviews with tough algorithmic problems that last a better part of the day? Most likely not.

Some seem to have succeeded to such an extent that you may occasionally hear about “Big N”. This often includes some currently large and/or successful but still “hip” startups. IT is such a fashion industry sometimes… ↩
Though on the flip side, it exacerbates the impostor syndrome among people who do get hired, as their success could easily be construed as mostly luck. ↩

Rust: first impressions

Posted on Thu 10 December 2015 in Code • Tagged with Rust, pointers, types, FP, OOP, traits • Leave a comment

Having recently been writing some C++ code at work, I had once again experienced the kind of exasperation that this cumbersome language evokes on regular basis. When I was working in it less sporadically, I was shrugging it off and telling myself it’s all because of the low level it operates on. Superior performance was the other side of the deal, and it was supposed to make all the trade-offs worthwhile.

Now, however, I realized that running close to the metal by no means excuses the sort of clunkiness that C++ permits. For example, there really is no reason why the archaically asinine separation of header & source files — with its inevitable redundancy of declarations and definitions, worked around with Java-esque contraptions such as pimpl — is still the bread and butter of C++ programs.
Same goes for the lack of sane dependency management, or a universal, portable build system. None of those would be at odds with native compilation to machine code, or runtime speeds that are adequate for real-time programs.

Rather than dwelling on those gripes, I thought it’d be more productive to look around and see what’s the modern offerring in the domain of lower level, really fast languages. The search wasn’t long at all, because right now it seems there is just one viable contender: Rust¹.

Rusty systems

Rust introduces itself as a “systems programming language”, which is quite a bold claim. What followed the last time this phrase has been applied to an emerging language — Go — was a kind of word twisting that’s more indicative of politics, not computer science.

But Rust’s pretense to the system level is well justified. It clearly provides the requisite toolkit for working directly with the hardware, be it embedded controllers or fully featured computers. It offers compilation to native machine code; direct memory access; running time guarantees thanks to the lack of GC-incuded stops; and great interoperability through static and dynamic linkage.

In short, with Rust you can wreak havoc against the RAM and twiddle bits to your heart’s content.

Safe and sound

To be fair, though, the “havoc” part is not entirely accurate. Despite its focus on the low level, efficient computing, Rust aims to be a very safe language. Unlike C, it actively tries to prevent the programmer from shooting themselves in the foot — though it will hand you the gun if you but ask for it.

The safety guarantees provided by Rust apply to resource management, with the specific emphasis on memory and pointers to it. The way that most contemporary languages deal with memory is by introducing a garbage collector which mostly (though not wholly) relieves the programmer from thinking about allocations and deallocations. However, the kind of global, stop-the-world garbage collections (e.g. mark-and-sweep) is costly and unpredictable, ruling it out as a mechanism for real-time systems.

For this reason, Rust doesn’t mandate a GC of this kind². And although it offers mechanisms that are similar to smart pointers from C++ (e.g. std::shared_ptr), it is actually preferable and safer to use regular, “naked” pointers: &Foo versus Cell<Foo> or RefCell<Foo> (which are some of the Rust’s “smart pointer” types).

The trick is in the clever compiler. As long as we use regular pointers, it is capable of detecting potential memory bugs at compilation time. They are referred to as “data races” in Rust’s terminology, and include perennial problems that will segfault any C code which wasn’t written with utmost care.

Part of those safety guarantees is also the default immutability of references (pointers). The simplest reference of type &Foo in Rust translates to something like const Foo * const in C³. You have to explicitly request mutability with the mut keyword, and Rust ensures there is always at most one mutable reference to any value, thus preventing problems caused by pointer aliasing.

But what if you really must sling raw pointers, and access arbitrary memory locations? Maybe you are programming a microcontroller where I/O is done through a special memory region. For those occasions, Rust has got you covered with the unsafe keyword:

// Read the state of a diode in some imaginary uC.
fn get_led_state(i: isize) -> bool {
    assert!(i >= 0 && i <= 4, "There are FOUR lights!");
    let p: *const u8 = 0x1234 as *const u8;  // known memory location
    unsafe { *p .offset(i) != 0 }
}

Its usage, like in the above example, can be very localized, limited only to those places where it’s truly necessary and guarded by the appropriate checks. As a result, the interface exposed by the above function can be considered safe. The unrestricted memory access can be contained to where it’s really inevitable.

Typing counts

Ensuring memory safety is not the only way in which Rust differentiates itself from C. What separates those two languages is also a few decades of practice and research into programming semantics. It’s only natural to expect Rust to take advantage of this progress.

And advantage it takes. Although Rust’s type system isn’t nearly as advanced and complex like — say — Scala’s, it exhibits several interesting properties that are indicative of its relatively modern origin.

First, it mixes the two most popular programming paradigms — functional and object-oriented — in roughly equal concentrations, as opposed to being biased towards the latter. Rust doesn’t have interfaces or classes: it has traits and their implementations. Even though they often fulfill similar purposes of abstraction and encapsulation, these constructs are closer to the concepts of type classes and their instances, which are found for example in Haskell.

Still, the more familiar notions of OOP aren’t too far off. Most of the key functionality of classes, for example, can be simulated by implementing “default” traits for user-defined types:

struct Person {
    first_name: String,
    last_name: String,
}

impl Person {
    fn new(first_name: &str, last_name: &str) -> Person {
        Person {
            first_name: first_name.to_string(),
            last_name: last_name.to_string(),
        }
    }

    fn greet(&self) {
        println!("Hello, {}!", self.first_name);
    }
}

// usage
let p = Person::new("John", "Doe");
p.greet();

The second aspect of Rust’s type system that we would come to expect from a new language is its expressive power. Type inference is nowadays a staple, and above we can observe the simplest form of it. But it extends further, to generic parameters, closure arguments, and closure return values.

Generics, by the way, are quite nice as well. Besides their applicability to structs, type aliases, functions, traits, trait implementations, etc., they allow for constraining their arguments with traits. This is similar to the abandoned-and-not-quite-revived-yet idea of concepts in C++, or to an analogous mechanism from C#.

The third common trend in contemporary language design is the use of type system to solve common tasks. Rust doesn’t go full Haskell and opt for monads for everything, but its Option and Result types are evidently the functional approach to error handling⁴. To facilitate their use, a powerful pattern matching facility is also present in Rust.

Unexpectedly pythonic

If your general go-to language is Python, you will find Rust a very nice complement and possibly a valuable instrument in your coding arsenal. Interoperability between Python and Rust is stupidly easy, thanks to both the ctypes module and the extreme simplicity of creating portable, shared libraries in Rust. Offloading some expensive, GIL-bypassing computation to a fast, native code written in Rust can thus be a relatively painless way of speeding up crucial parts of a Python program.

But somewhat more surprisingly, Rust has quite a few bits that seem to be directly inspired by Python semantics. Granted, those two languages are conceptually pretty far apart in general, but the analogies are there:

The concept of iterators in Rust is very similar to iterables in Python. Even the for loop is basically identical: rather than manually increment a counter, both in Rust and Python you iterate over a range of numbers.
Oh, and both languages have an enumerate method/ function that yields pairs of (index, element).
Syntax for method definition in Rust uses the self keyword as first argument to distinguish between instance methods and “class”/”static” methods (or associated functions in Rust’s parlance). This is even more pythonic than in actual Python, where self is technically just a convention, albeit an extremely strong one.
In either language, overloading operators doesn’t use any new keywords or special syntax, like it does in C++, C#, and others. Python accomplishes it through __magic__ methods, whereas Rust has very similarly named operator traits.
Rust basically has doctest. If you don’t know, the doctest module is a standard Python testing utility that can run usage examples found in documentation comments and verify their correctness. Rust version (rustdoc) is even more powerful and flexible, allowing for example to mark additional boilerplate lines that should be run when testing examples, but not included in the generated documentation.

I’m sure the list doesn’t end here and will grow over time. As of this writing, for example, nightly builds of Rust already offer advanced slice pattern matching which are very similar to the extended iterable unpacking from Python 3.

Is it worth it?

Depending on your background and the programming domain you are working in, you may be wondering if Rust is a language that’s worth looking into now, or in the near future.

Firstly, let me emphasize that it’s still in its early stages. Although the stable version 1.0 has been released a good couple of months ago, the ecosystem isn’t nearly as diverse and abundant as in some of the other new languages.

If you are specifically looking to deploying Rust-written API servers, backends, and other — shall I use the word — microservices, then right now you’ll probably be better served by more established solutions, like Java with fibers, asynchronous Python on PyPy, Erlang, Go, node.js, or similar. I predict Rust catching up here in the coming months, though, because the prospect of writing native speed JSON slingers with relative ease is just too compelling to pass.

The other interesting area for Rust is game programming, because it’s one of the few languages capable of supporting even the most demanding AAA+ productions. The good news is that portable, open source game engines are already here. The bad news is that most of the existing knowledge about designing and coding high performance games is geared towards writing (stripped down) C++. The community is also rather ~~stubborn~~ reluctant to adopt anything that may carry even a hint of potentially unknown performance implications. Although some inroads have been made (here’s, for example, an entity component system written in Rust), and I wouldn’t be surprised to see indie games written in Rust, it probably won’t take over the industry anytime soon.

When it comes to hardware, though, Rust may already have the upper hand. It is obviously much easier language to program in than pure C. Along with its toolchain’s ability to produce minimal executables, it makes for a compelling language for programming microcontrollers and other embedded devices.

So in short, Rust is pretty nice. And if you have read that far, I think you should just go ahead and have a look for yourself :)

Because as much as we’d like for D to finally get somewhere, at this point we may have better luck waiting for the Year of Linux on Desktop to dawn… ↩
Of course, nobody has stopped the community from implementing it. ↩
Strictly speaking, it’s the binding such as let x = &foo; that translates to it. Unadorned C pointer type Foo* would correspond to mutable binding to a mutable reference in Rust, i.e. let mut x = &mut foo;. ↩
Their Haskell equivalents are Maybe and Either type classes, respectively. ↩

URL library for Python

Posted on Fri 27 November 2015 in Code • Tagged with Python, URL, furl • Leave a comment

Python has many batteries included, but a few things are still conspicuously missing.

One of them is a standardized and convenient approach to URL manipulation, akin to the URI class in Java. There are some functions in urllib, of course (or urllib.parse in Python 3), but much like their HTTP-related comrades, they prove rather verbose and somewhat clunky.

HTTP, however, is solved by the Requests package, so you may wonder if there is some analogous package for URL operations. The answer is affirmative, and the library in question is, quite whimsically, called furl.

URL in a wrap

The sole interesting part of the furl interface is the furl class. It represents a single URL, broken down to its constituents, with properties and methods for both reading them out and replacing with new values.

Thanks to this handy (and quite obvious) abstraction, common URL operations become quite simple and self-documenting:

from furl import furl


def to_absolute(url, base):
    """If given ``url`` is a relative path,
    make it relative to the ``base``.
    """
    furled = furl(url)
    if not furled.scheme:
        return furl(base).join(url).url
    return url


def is_same_origin(*urls):
    """Check whether URLs are from the same origin (host:port)."""
    origins = set(url.netloc for url in map(furl, urls))
    return len(origins) <= 1


def get_facebook_username(profile_url):
    """Get Facebook user ID from their profile URL."""
    furled = furl(profile_url)
    if not (furled.host == 'facebook.com' or
            furled.host.endswith('.facebook.com')):
        raise ValueError("not a Facebook URL: %s" % (profile_url,))
    return furled.path.segments[-1]

# etc.

This includes the extremely prevalent, yet very harmful pattern of bulding URLs through string interpolation:

url = '%s?%s' % (BASE_URL, urlencode(query_params))

Besides looking unpythonically ugly, it’s also inflexible and error-prone. If BASE_URL gains some innate query string params ('http://example.com/?a=b'), this method will start producing completely invalid urls (with two question marks, e.g. 'http://example.com/?a=b?foo=bar').

The equivalent in furl has none of these flaws:

url = furl(BASE_URL).add(query_params).url

The full package

To see the full power of furl, I recommend having a look at its API documentation. It’s quite clear and should be very easy to use.

let: binding for Knockout

Posted on Wed 18 November 2015 in Code • Tagged with Knockout, JavaScript, web frontend, data binding • Leave a comment

Knockout is a JavaScript “framework” that has always lurked in the shadows of other, more flashy ones. Even in the days of its relative novelty, the likes of Backbone or Ember had seemed to garner more widespread interest among frontend developers. Today, this hasn’t changed much, the difference being mainly that the spotlight is now taken by new actors (*cough* React *cough*).

But Knockout has a loyal following, and for good reasons. Possibly the most imporant one is why I’ve put the word “framework” in quotes. Knockout is, first and foremost, a data binding library: it doesn’t do much else besides tying DOM nodes to JavaScript objects.

This quality makes it both easy to learn and simple to integrate. In fact, it can very well live in just some small compartments of your web application, mingling easily with any server-side templating mechanism you might be using. It also interplays effortlessly with other JS libraries, and sticks very well to whatever duct tape you use to hold your frontend stack together.

Lastly, it’s also quite extensible. We can, for example, create our own bindings rather easily, extending the declarative language used to describe relationship between the data and UI.

In this post, I’m going to demonstrate this by implementing a very simple let: binding — a kind of “assignment” of an expression to a name.

From Knockout `with:` bluff

Out of box, Knockout proffers the with: binding, a quite similar mechanism. How it may be somewhat problematic is analogous to the widely discouraged with statement in JavaScript itself. Namely, it blends several namespaces together, making it harder to determine which object is being referred to. As a result, the code is more prone to errors.

On the other hand, freeing the developer from repeating long and complicated expressions is obviously valuable. Perhaps reducing them to nil is not the right approach, though, so how about we just shorten them to a more manageable length? Well, that’s exactly what the let: binding is meant to do:

<div data-bind="let: { addr: currentUser.personalInfo.address }">
  <p data-bind="text: addr.line1"></p>
  <!-- ko if: addr.line2 -->
    <p data-bind="text: addr.line2"></p>
  <!-- /ko -->
  <p>
    <span data-bind="text: add.city"></span>,
    <span data-bind="text: add.region"></span>
  </p>
  <p data-bind="text: add.country"></p>
</div>

Making it happen turns out to be pretty easy.

Binding contract

To define a Knockout binding, up to two things are needed. We have to specify what the library should do:

when the binding is first applied to a DOM node (the init method)
when any of the observed values changes (the update method)

Not every binding has to implement both methods. In our cases, only the init is necessary, because all we have to do is modify the binding context.

What’s that? Shortly speaking, a binding context, is an object holding all the data you can potentially bind to your DOM nodes. Think of it as a namespace, or local scope: whatever’s in there can be used directly inside data-bind attributes.

`let:` it go

Therefore, all that the let: binding has to do is to extend the context with a mapping passed to it as an argument. Well, almost all:

ko.bindingHandlers['let'] = {
    init: function(element, valueAccessor, allBindings, viewModel, bindingContext) {
        var innerContext = bindingContext.extend(valueAccessor);
        ko.applyBindingsToDescendants(innerContext, element);
        return { controlsDescendantBindings: true };
    }
};

The resulting innerContext is a copy of the original bindingContext, augmented with additional properties from the argument of let: (those are available through valueAccessor). Once we have it, though, we need to handle it in a little special way.

Normally, Knockout is processing all bindings recursively, pasing down the same bindingContext (which ultimately comes from the root viewModel). But since we want to locally alter the context, we also need to interrupt this regular way and take care of the lower-level DOM nodes ourselves.

This is exactly what the overly-long ko.applyBindingsToDescendants function is doing. The only caveat is that Knockout has to be told explicitly about our intentions through the return value from init. Otherwise, it would try to apply the original bindingContext recursively, which in our case would amount to applying it twice. { controlsDescendantBindings: true } prevents Knockout from doing so erroneously.

Turn SQLAlchemy queries into literal SQL

Posted on Thu 12 November 2015 in Code • Tagged with SQLAlchemy, SQL, databases • Leave a comment

Like I mentioned in the post about adding regular expression support, the nice thing about SQLAlchemy is the clean, layered structure of various abstractions it utilizes.

Regretably, though, we know that software abstractions tend to leak. Inn particular, any ORM seem to be a poster child of this rule, especially in the popular opinion among developer crowds. By design, they’re intended to hide at least some details of the operations on a database, but those very details can be quite critical at times. There are situations when we simply want to know what exactly is going on, and how all those model classes and mappers and relationships translate to the actual SQL code.

To make it a little more concrete, let’s focus on the SQLAlchemy Query class. Given such a query, we’d like to get the final SQL representation of it, the one that’s ultimately sent to the database, It could be useful for any number of things, from logging¹ to profiling, or just displaying in the web page’s footer, or even solely for prototyping in the Python REPL.

In other words, we want to turn this:

db_session.query(User.id).filter(User.email == some_email)

into something like this:

SELECT users.id FROM users WHERE users.email = :1

regardless of the complexity of the query, the number of model classes it spans, or the number of relationship-related JOINs it involves.

It’s a dialect

There is one aspect we cannot really universalize, though. It’s the specific database backend that SQLAlchemy should compile our query for. Shrugging off syntactical and semantical differences between database engines is one thing that using an ORM can potentially buy us, but if we want to get down to the SQL level, we need to be specific about it.

In SQLAlchemy’s parlance, any specific variant of the SQL language is called a dialect. Most of the time, you’ll be interested in the particular dialect your database of choice is using. This is easily obtainable from the database Session:

dialect = db_session.bind.dialect

The resulting Dialect object is little more than a container for small tidbits of information, used by SQLAlchemy to handle various quirks of the database backends it supports. For our purposes, though, it can be treated as a completely opaque token.

Compile & unwrap

With the Dialect in hand, we can invoke the query compiler to get the textual representation of our Query. Or, to be more precise, the compiled version of the query’s Select statement

>>> query = db_session.query(User.id).filter(User.email == 'foo@example.com')
>>> print(query.statement.compile(dialect=db_session.bind.dialect))
SELECT user.id
FROM users
WHERE users.email = %(email_1)s

Perhaps unsurprisingly, even after compilation the result is still just another object: the Compiled one. As you can see, however, the actual SQL text is just its __str__‘ing representation, which we can print directly or obtain with str() or unicode().

`Query.to_sql`

But obviously, we don’t want to type the above incantations every time we need to take a peek at the generated SQL for an ORM query. Probably the best solution is to extend the Query class so that it offers this additional functionality under a new method:

from sqlalchemy.orm.query import Query as _Query


class Query(_Query):
    """Custom, enhanced SQLALchemy Query class."""

    def to_sql(self):
        """Return a literal SQL representation of the query."""
        dialect = self.session.bind.dialect
        return str(self.statement.compile(dialect=dialect))

This new Query class needs then be passed as query_cls argument to the constructor of Session. Details may vary a little bit depending on how exactly your application is set up, but in most cases, it should be easy enough to figure out.

If you’re only interested in indiscriminate logging of all queries, setting the echo parameter in create_engine may be sufficient. Another alternative is to look directly at the logging configuration options for various parts of SQLAlchemy ↩

Celery task in a Flask request context

Posted on Tue 03 November 2015 in Code • Tagged with Celery, Flask, Python, queue, request context • Leave a comment

Celery is an asynchronous task worker that’s frequently used for background processing in Python web apps. Rather than performing a time-consuming task within the request loop, we delegate it to a queue so that a worker process can pick it up when ready. The immediate benefit is much better latency for the end user. Pros also include easier scalability, since you can adjust the number of workers to your task load.

Examples of tasks that are usually best done in the background vary from fetching data through a third-party API to sending emails, and from pushing mobile notifications to pruning the database of stale records. Anything that may take more than a couple hundred milliseconds to complete — and isn’t absolutely essential to the current HTTP request — is typically a good candidate for asynchronous processing.

In Celery, workers are just regular Python processes, often running the exact same code that’s powering the app’s web frontend¹. But unlike most of that code, they aren’t servicing any HTTP requests — they simply run some function with given arguments, both specified by whomever sent a task for execution. Indeed, those functions don’t even know who or what asked for them to be executed.

Neat, right? It is what we usually compliment as decoupling, or separation of concerns. They are valuable qualities even regardless of the UI and scaling benefits mentioned earlier.

Not quite web

But those qualities come with a trade-off. Task code is no longer a web frontend code: it doesn’t run within the comfy environment of our web framework of choice. Losing it may be quite unnerving, actually, because in a typical web application there will be many things tied directly to the HTTP request pipeline. After all, this is what web applications do — respond to HTTP requests — so it often makes perfect sense e.g. to marry the request flow with database transactions, committing or rolling those back according to HTTP status code that the app produced.

Tasks may also require a database, though, if only to assert the expected state of the world. Similar goes for memcache, a Redis instance, or basically any resource used by the frontend code. Alas, it’s quite possible the very reason we delegate work to a task is to shift lengthy interactions with those external systems away from the UI. Obviously, we’re going to need them for that!

Fake a request

So one way or another, our tasks will most likely need some initialization and/or cleanup code. And since it’s probably the same code that most HTTP request handlers require and use already, why not just pretend we’re handling a request after all?

In Flask, we can pull that off rather easily. The test_request_context method is conveniently provided to allow for faking the request context — that is, an execution environment for HTTP handlers. Like the name suggest, it is used mostly for testing, but there is nothing stopping us from using it in tasks run by Celery.

We probably don’t want to call it directly, though. What would be better is to have Celery prepare the context first, and then run the task code as if it was an HTTP handler. For even better results, the context would preserve information extracted from the actual HTTP request, one that has sent the task for execution. Moving some work to the background would then be a trivial matter, for both the task and the original handler would operate within the same environment.

Convenient? I believe so. And as I’ll demonstrate next, it isn’t very complicated to implement either.

Wrap the decorator?

At least one piece of the solution should stand out as pretty obvious. Since our intention is to wrap the task’s code in some additional packaging — the request context — it seems fairly natural to write our own @task decorator:

import functools

from myapp import app, celery


def task(**kwargs):
    """Decorator function to apply to Celery tasks.
    Executes the actual task inside Flask's test request context.
    """
    def decorator(func):
        """Actual decorator."""
        @celery.task(**kwargs)
        @functools.wraps(func)
        def wrapped(*args, **kwargs):
            with app.test_request_context():
                return func(*args, **kwargs)

        return wrapped

    return decorator

Here, app is the Flask application, and celery is the Celery object that’s often configured alongside it.

The `Task` class

While this technique will give us a request context inside the task code, it won’t be the request context from which the task has been sent for execution. To replicate that context correctly, we need additional support on the sender side.

Internally, Celery is converting every function we annotate with @task decorator into a subclass of Celery.app.task.Task. This process can be customized, for example by providing an explicit base= parameter to @task which specifies a custom Task subclass to use in place of the default one. Within that subclass, we’re free to override any functionality that we need to.

In our case, we’ll scrape the current request context from Flask for all relevant information, and later use it to recreate the context in the task code.

But before we get to that, let’s just create the Task subclass and move the above execution logic to it. This way, users won’t have to use a completely new @task decorator which would needlessly couple them to a specific Celery app instance:

from celery import Task

class RequestContextTask(Task):
    """Base class for tasks that run inside a Flask request context."""
    abstract = True

    def __call__(self, *args, **kwargs):
        with app.test_request_context():
            return super(RequestContextTask, self).__call__(*args, **kwargs)

Instead, they can either set this class as base= for a specific task:

@celery.task(base=RequestContextTask)
def sync_user_data(user_id):
    # ...

or make it into new default for all their tasks:

celery = Celery(...)
celery.Task = RequestContextTask

Invocation patterns

When the frontend asks for a task to be executed, it most often uses the Task.delay method. It will package a payload contaning task arguments, and send it off through a broker — usually an AMQP-based queue, such as RabbitMQ — so that a Celery worker can pick it up and actually execute.

But there are other means of task invocation. We can even run it “in place”, locally and synchronously, which is especially useful for various testing scenarios. Lastly, a task can also be retried from within its own code, terminating its current run and scheduling another attempt for some future date.

Obviously, for the RequestContextTask to be useful, it needs to behave correctly in every situation. Therefore we need to cover all the entry points I’ve mentioned — the asynchronous call, a synchronous invocation, and a task retry:

class RequestContextTask(Task):
    # ...

    def apply_async(self, args=None, kwargs=None, **rest):
        self._include_context(kwargs)
        return super(RequestContextTask, self) \
            .apply_async(args, kwargs, **rest)

    def apply(self, args=None, kwargs=None, **rest):
        self._include_context(kwargs)
        return super(RequestContextTask, self) \
            .apply(args, kwargs, **rest)

    def retry(self, args=None, kwargs=None, **rest):
        self._include_context(kwargs)
        return super(RequestContextTask, self) \
            .retry(args, kwargs, **rest)

Note that Task.apply_async is being called internally by Task.delay, so it’s only that first method that we have to override.

Context in a box

As you can deduce right away, the Flask-related magic is meant to go in the _include_context method. The idea is to prepare arguments for the eventual invocation of Flask.test_request_context, and pass them through an extra task parameter. Those arguments are relatively uncomplicated: they are just a medley of various pieces of information that we can easily obtain from the Flask’s request object:

from flask import has_request_context, request


class RequestContextTask(Task):
    CONTEXT_ARG_NAME = '_flask_request_context'

    # ...

    def _include_request_context(self, kwargs):
        """Includes all the information about current HTTP request context
        as an additional argument to the task.
        """
        if not has_request_context():
            return

        context = {
            'path': request.path,
            'base_url': request.url_root,
            'method': request.method,
            'headers': dict(request.headers),
        }
        if '?' in request.url:
            context['query_string'] = request.url[(request.url.find('?') + 1):]

        kwargs[self.CONTEXT_ARG_NAME] = context

On the worker side, we simply unpack them and recreate the context:

from flask import make_response

class RequestContextTask(Task):
    # ...
    def __call__(self, *args, **kwargs):
        call = lambda: super(RequestContextTask, self).__call__(*args, **kwargs)

        context = kwargs.pop(self.CONTEXT_ARG_NAME, None)
        if context is None or has_request_context():
            return call()

        with app.test_request_context(**context):
            result = call()
            app.process_response(make_response(result or ''))

        return result

The only tricky part is calling Flask.process_response at the end. We need to do that for the @after_request hooks to execute correctly. This is quite crucial, because those hooks are where you’d normally put important cleanup code, like commit/rollback of the database transaction.

Complete solution

To see how all those code snippets fit together, see this gist. You may also want to have a look at the article on Celery integration in Flask docs for some tips on how to integrate it with your own project.

This isn’t strictly necessary, as Celery supports sending tasks for execution by explicit name. For that request to reach the worker, however, the task broker configuration must be correctly shared between sender and the worker. ↩

Actually, Python enums are pretty OK

Posted on Sun 25 October 2015 in Code • Tagged with Python, enums, serialization • Leave a comment

Little over two years ago, I was writing about my skepticism towards the addition of enum types to Python. My stance has changed somewhat since then, and I now think there is some non-trivial value that enums can add to Python code. It becomes especially apparent in the circumstances involving any kind of persistence, from pickling to databases.

I will elaborate on that through the rest of this post.

Revision

First, let’s recap (or perhaps introduce) the important facts about enums in Python. An enum, or an enumeration type, is a special class that inherits from enum.Enum¹ and defines one or more constants:

from enum import Enum

class Cutlery(Enum):
    knife = 'knife'
    fork = 'fork'
    spoon = 'spoon'

Their values are then converted into singleton objects with distinct identity. They available as attributes of the enumeration type:

>>> Cutlery.fork
<Cutlery.fork: 'fork'>

As a shorthand for when the values themselves are not important, the Enum class can be directly invoked with constant names as strings:

Cutlery = Enum('Cutlery', ['knife', 'fork', 'spoon'])

Resulting class offers some useful API that we’d expect from an enumeration type in any programming language:

>>> list(Cutlery)
[<Cutlery.knife: 'knife'>, <Cutlery.fork: 'fork'>, <Cutlery.spoon: 'spoon'>]
>>> [c.value for c in Cutlery]
['knife', 'fork', 'spoon']
>> Cutlery.knife in Cutlery
True
>>> Cutlery('spoon')
<Cutlery.spoon: 'spoon'>
>>> Cutlery('spork')
Traceback (most recent call last):
# (...)
ValueError: 'spork' is not a valid Cutlery

How it was done previously

Of course, there is nothing really groundbreaking about assigning some values to “constants”. It’s been done like that since times immemorial, and quite often those constants have been grouped within finite sets.

Here’s an example of a comment model in some imaginary ORM, complete with a status “enumeration”:

class Comment(Model):
    APPROVED = 'approved'
    REJECTED = 'rejected'
    IN_REVIEW = 'in_review'
    STATES = frozenset([APPROVED, REJECTED, IN_REVIEW])

    author = StringProperty()
    text = TextProperty()
    state = String(choices=STATES)

# usage
comment = Comment(author="Xion", text="Boo")
comment.state = Comment.IN_REVIEW
comment.save()

Converting it to use an Enum is relatively straightforward:

class Comment(Model):
    class State(Enum):
        approved = 'approved'
        rejected = 'rejected'
        in_review = 'in_review'

    author = StringProperty()
    text = TextProperty()
    state = StringProperty(choices=[s.value for s in State])

comment = Comment(author="Xion", text="Meh.")
comment.state = Comment.State.approved.value
comment.save()

It is not apparent, though, if there is any immediate benefit of doing so. True, we no longer need to define the STATES set explicitly, but saving on this bit of boilerplate is balanced out by having to access the enum’s value when assigning it to a string property.

All in all, it seems like a wash — though at least we’ve established that enums are no worse than their alternatives :)

Enums are interoperable

Obviously, this doesn’t exactly sound like high praise. Thing is, we haven’t really replaced the previous solution completely. Remnants of it still linger in the way the state property is declared. Even though it’s supposed to hold enum constants, it is defined as string, which at this point is more of an implementation detail of how those constants are serialized.

What were really need here is a kind of EnumProperty that’d allow us to work solely with enum objects. Before the introduction of a standard Enum base, however, there was little incentive for ORMs and other similiar libraries to provide such a functionality. But now, it makes much more sense to support enums as first-class citizens, at least for data exchange and serialization, because users can be expected to already prefer the standard Enum in their own code.

Thus, the previous example changes into something along these lines:

class Comment(Model):
    # ...
    state = EnumProperty(State)

comment = Comment(author="Xion", text="Yay!")
comment.state = Comment.State.approved  # no .value!
comment.save()

Details of EnumProperty, or some equivalent construct, are of course specific to any given data management library. In SQLAlchemy, for example, a custom column type can handle the necessary serialization and deserialization between Python code and SQL queries, allowing you to define your models like this²:

class Comment(Model):
    class State(Enum):
        # ...

    author = Column(String(255))
    text = Column(Text)
    state = Column(EnumType(State, impl=String(32)))

# usage like above

In any case, the point is to have Python code operate solely on enum objects, while the persistence layer takes care of converting between them and their serializable values.

Enums are extensible

The other advantage Enums have over loose collections of constants is that they are proper types. Like all user-defined types in Python, they can have additional methods and properties defined on their instances, or even on the type itself.

Although this capability is (by default) not as powerful as e.g. in Java — where each enum constant can override a method in its own way — it can nevertheless be quite convenient at times. Typical use cases include constant classification:

class Direction(Enum):
    left = 37
    up = 38
    right = 39
    down = 40

    @property
    def is_horizontal(self):
        return self in (Direction.left, Direction.right)

    @property
    def is_vertical(self):
        return self in (Direction.down, Direction.up)

and conversion:

    def as_vector(self):
        return {
            Direction.left: (-1, 0),
            Direction.up: (0, -1),
            Direction.right: (1, 0),
            Direction.down: (0, 1),
        }.get(self)

For the latter, it would be handy to have the Java’s ability to attach additional data to an enum constant. As it turns out, Python supports this feature natively in a very similar way. We simply have to override enum’s __new__ method to parse out any extra values from the initializer and turn them into attributes of the enum instance:

class Direction(Enum):
    left = 37, (-1, 0)
    up = 38, (0, -1)
    right = 39, (1, 0)
    down = 40, (0, 1)

    def __new__(cls, keycode, vector):
        obj = object.__new__(cls)
        obj._value_ = keycode
        obj.vector = vector
        return obj

It’s possible, in fact, to insert any arbitrary computation here that yields the final _value_ of an enum constant³. This trick can be used to, for example, construct enums that automatically number themselves.

Finally, we can add static methods, class methods, or class properties to the Enum subclass, just like we would do with any other class:

class MyEnum(Enum):
    # ...

    @classproperty
    def __values__(cls):
        return [m.value for m in cls]

Enums just are

All these things are possible primarly because of the most notable aspect of Python enums: their existence as an explicit concept. A syntactically unorganized bunch of constants cannot offer half of the highlighted features because there is nowhere to pin them on.

For that reason alone, using enums as an explicit — rather than implicit — pattern seems worthwhile. The one benefit we’re certain to reap is better code structure through separation of important concepts.

The enum module is part of the Python standard library since version 3.4 but a fully functional backport is available for Python 2.x as well. ↩
It is even possible to instruct SQLAlchemy how to map Python enums to ENUM types in database engines that support it, but details are outside of the scope of this article. ↩
If you’re fine with the enum’s value being the whole tuple (everything after the = sign), you can override __init__ instead of __new__ (cf. the planet example from standard docs). ↩

CSS class helper for Jinja

Posted on Thu 22 October 2015 in Code • Tagged with Jinja, CSS, Python, Flask • Leave a comment

One of the great uses for a templating engine such as Jinja is reducing the clutter in our HTML source files. Even with the steady advances in the CSS¹ standards, various div.wrapper, div.container, div.content, and other presentational elements are still a common fact of life. Unless you’re one of the cool kids who use Web Components with Polymer, your main option for abstracting this boilerplate away is defining some more general template macros.

As with any kind of abstraction, it’s crucial to balance broad applicability with a potential for finely-tuned customization. In the case of wrapping HTML snippets in Jinja macros, an easy way to maintain flexibility is to allow the caller to specify some crucial attributes of the root element:

{#
  Renders the markup for a <button> from Twitter Bootstrap.
#}
{% macro button(text=none, style='default', type='button', class='') %}
  <button type="{{ type }}"
        class="btn btn-{{ style }}{% if class %} {{ class }}{% endif %}"
        {{ kwargs|xmlattr }}>
    {{ text }}
  </button>
{% endmacro %}

An element id would be a good example, and in the above macro it’s handled implicility thanks to the {{ kwargs|xmlattr }} stanza.

class, however, is not that simple, because a macro like that usually needs to supply some CSS classes of its own. The operation of combining them with additional ones, passed by the caller, is awfully textual and error-prone. It’s easy, for example, to forget about the crucial space and run two class names together.

As if CSS wasn’t difficult enough to debug already!

Let them have list

The root cause for any of those problems is working at a level that’s too low for the task. The value for a class attribute may be encoded as string, but it’s fundamentally a list of tokens. In the modern DOM API, for example, it is represented as exactly that: a DOMTokenList.

I’ve found it helpful to replicate a similar mechanism in Jinja templates. The result is a ClassList wrapper whose code I quote here in full:

from collections import Iterable, MutableSet


class ClassList(MutableSet):
    """Data structure for holding, and ultimately returning as a single string,
    a set of identifiers that should be managed like CSS classes.
    """
    def __init__(self, arg=None):
        """Constructor.
        :param arg: A single class name or an iterable thereof.
        """
        if isinstance(arg, basestring):
            classes = arg.split()
        elif isinstance(arg, Iterable):
            classes = arg
        elif arg is not None:
            raise TypeError(
                "expected a string or string iterable, got %r" % type(arg))

        self.classes = set(filter(None, classes))

    def __contains__(self, class_):
        return class_ in self.classes

    def __iter__(self):
        return iter(self.classes)

    def __len__(self):
        return len(self.classes)

    def add(self, *classes):
        for class_ in classes:
            self.classes.add(class_)

    def discard(self, *classes):
        for class_ in classes:
            self.classes.discard(class_)

    def __str__(self):
        return " ".join(sorted(self.classes))

    def __html__(self):
        return 'class="%s"' % self if self else ""

To make it work with Flask, adorn the class with Flask.template_global decorator:

from myflaskapp import app

@app.template_global('classlist')
class ClassList(MutableSet):
    # ...

Otherwise, if you’re working with a raw Jinja Environment, simply add ClassList to its global namespace directly:

jinja_env.globals['classlist'] = ClassList

In either case, I recommend following the apparent Jinja convention of naming template symbols as lowercasewithoutspaces.

Becoming classy

Usage of this new classlist helper is relatively straightforward. Since it accepts both a space-separated string or an iterable of CSS class names, a macro can wrap anything the caller would pass it as a value for class attribute:

{% macro button(text=none, style='default', type='button', class='') %}
  {% set classes = classlist(class) %}
  {# ... #}

The classlist is capable of producing the complete class attribute syntax (i.e. class="..."), or omit it entirely if it would be empty. All we need to do is evaluate it using the normal Jinja expression syntax:

  <button type="{{ type }}" {{ classes }} {{ kwargs|xmlattr }}>

Before that, though, we may want to include some additional classes that are specific to our macro. The button macro, for example, needs to add the Bootstrap-specific btn and btn-$STYLE classes to the <button> element it produces²:

  {% do classes.add('btn', 'btn-' + style) %}

After executing this statement, the final class attribute contains both the caller-provided classes, as well those two that were added explicitly.

Believe it or not, we can finally center the content vertically without much of an issue! ↩
The {% do %} block in Jinja allows to execute statements (such as function calls) without evaluating values they return. It is not exposed by default, but adding a standard jinja2.ext.do extension to the Environment makes it available. ↩

Older Posts Newer Posts