Flappy Bird in 1234 bytes of Bash

Posted on Thu 25 August 2016 in Code • Tagged with Bash, shell scripting, game programming, Flappy BirdLeave a comment

Contrary to an infamous opinion from a bygone era, 640KB is not really sufficient for anyone anymore. A typical website exceeds that easily, and executable programs are usually measured in megabytes.

But what if you only had 1234 bytes to work with?…

A friend of mine, Gynvael Coldwind, organized a game programming compo1 that had precisely this limitation. Unlike most demoscene ones, however, the size limit here applies to either the final binary or its source code. This can be chosen at the participant’s discretion.

Since my currently favorite compiled language produces the exact opposite of small binaries, I was quite intrigued by the source code option. But as the rules say, the final game must run on a clean installation (only standard packages) of either Windows or Ubuntu Linux. The choice of viable languages and technologies was therefore rather limited.

It was time to get a little creative.

Game theory

What must an environment provide to be a suitable platform for game development? Not much, really. We only need to be able to:

  • put stuff on the screen
  • react to user input
  • execute time-dependent logic

You could arguably get away without the last one, but the kind of games you would end up with had gone out of fashion about half a century ago. For the “real” arcade games, we really ought to run our code at least a dozen times per second.

There’s only a handful of standard technologies that allow all of this out of the box.

I’m a wee bit out of touch with Windows these days but on Linux, there’s one thing that I really wanted to take for a serious spin. And luckily for me, it also has one extremely terse language to go hand in hand with.

I’m talking, of course, about the ANSI terminal that can be scripted in Bash. If there ever was anything that worked anywhere by default, then this got to be it2.

…put into practice

Note that I’ve stressed the “terminal” part. The shell itself is a neat instrument, but (perhaps surprisingly) it doesn’t actually concern itself with displaying anything on the screen.

This has traditionally been the job of a terminal emulator. To this end, it has a couple of special codes that are undoubtedly useful for an aspiring indie shell game developer. They are what allows us to display things in a specific position on the screen, complete with chosen color, background color, and (text) style.

So this nails down our first requisite feature.

As for the second one, the vanilla read command supports everything we may need for handling user input. The only real “trick” is passing the -n flag which makes it wait for a specific number of characters (e.g. one) rather than a whole line ending with Enter. Add a few more flags — like the one that prevents text from being echoed back to the console — and you can make a rudimentary input loop:

KEY='\0'
while :; do
    read -rsn 1 KEY
done

I can imagine, however, that you’d want to do other things besides just waiting for input. Stuff like “updating the game state” and “drawing the next frame” is generally considered pretty important in games.

Normally, we would deal with those things in between checking for input events, leading to a particular structure of the so-called real-time loop.

But the shell doesn’t really handle input via “events”. Instead, you just ask for some text and wait until you get it. There is no “peek mode” that’d allow to squeeze in some rendering logic before the next key press.

What do we do, then, with a tight loop that leaves us no wiggle room?…

Why, we take a crowbar and pry it open!

(Don’t) be alarmed

Let’s start by noticing that to run some code whenever there is nothing else to do has a rough equivalent of running it periodically. This isn’t an exactly new observation: the setTimeout function in JavaScript has been the basis of “real-time” animation since the 90s era of falling snowflakes, and up to the contemporary browser games3.

Neither does the shell nor the hosting terminal support anything like setTimeout, though. But fortunately, they don’t need to: Linux itself does. And it accomplishes it quite effortlessly, due to the sole fact of being an operating system. All we have to do is access some of its capabilities directly from the shell script:

KEY='\0'
DT=0.05  # timeout value in seconds

tick() {
    # .. do stuff ...
    ( sleep $DT; kill ALRM $$ )&
}

trap tick ALRM
tick
while :; do
    read -rsn 1 KEY
done

What we’re doing here is set up the tick function to be a signal handler. A callback, if you will.

Inside of this callback, we can do all the state updates and drawing we need, as long as we follow it with “scheduling” of the next tick call. As a direct equivalent of a setTimeout invocation, this can be done by:

  • starting a subshell to run in the background (with &)
  • letting it sleep for however long we want to delay the next update
  • sending a signal to the main script (kill $$)

The signal we chose is of course SIGALRM4. Technically, however, it can be anything, as long as we can set up a trap to actually handle it.

In any case, success! Bash is officially a game programming platform!

Integration in parts

And so having figured out the technicalities, I was faced with the crucial dilemma: what game could I actually write?

Nothing too complicated, that’s for sure. After the initial scaffolding has used up about 1/4 of the harsh size limit, I knew that radical simplicity was the order of the day.

And so I went for possibly the most trivial game ever.

flap flap
Sorry, Pong!

Then, after hours of (ahem) meticulous research, I managed to reverse-engineer the core mechanic:

  • let the bird fall down with a constant acceleration
  • to jump, give it some upwards-facing velocity

Actually coding this in Bash was mostly a matter of finding out how to perform floating-point calculations. Rather unsurprisingly, this is done through an external program, while truncating of the fractional part involves — wait for it — string formatting.

Pipe dream

Based on the above nuggets of Stack Overflow wisdom, you’ve probably figured out that Bash isn’t exactly what you would call a programming language. With a little bit of perseverance, however, we can make it do our bidding… some fraction of the time.

So far, I had the player character — a beautiful red rectangle — fall down under the constant force of gravity, and maybe ascend if the Space key has been pressed. But a heroic protagonist necessitates the presence of formidable adversaries, so my next step was to figure out how to implement this crucial gameplay mechanic.

Which one?… Pipes, of course.

Pipes in Bash.

...ahem

It was pretty evident I’m gonna need to represent them somehow, and Bash isn’t exactly known for its strong repertoire of data structures. Starting from version 4.0, it does however have arrays, so there is at least something we can work with.

Let’s not get too carried away, though. The somewhat obvious idea of mirroring the entire game field in a (pseudo) 2D array of pipe/not-pipe turned out to be completely unworkable. The fill rate of most (all?) terminal emulators is nowhere near sufficient to permit redrawing of the whole screen and maintaining FPS value above the slideshow threshold.

What I went with instead was a 1D array for the pipe itself, and a separate variable to denote its horizontal position. Working from there, it wasn’t too hard to make it move, and eventually to check for its collision with the player object.

Fitting in

That, of course, was the most important milestone.
I added an objective.
It was an actual game.

And I still had about 100 bytes left!

Speaking of size, this is probably a good moment to talk about making the most of those meager 1234 bytes. It’s not exactly surprising that it was possible mostly thanks to minification.

While it’s extremely popular for JavaScript, the same abundance of minification utilities cannot be expected when it comes of shell scripts. Still, “bash minification” does return some useful search results, and one of them is what I used to shrink the final script.

Obviously, it didn’t go without some trouble. Since the minifier does little more than to swap newlines for semicolons, it got a few bugs that had to be ironed out. No big deal, really: a small batch of handcrafted, artisanal Python was enough to paper over the issues.

The other technique you can use to slim down is obfuscation, i.e. shortening of the identifiers. As the minifier didn’t offer this feature natively, I had to take care of it myself.

This lead to adding such interesting assignments as this p:

p=printf

which absolutely shouldn’t be confused with this p:

# put text at given position: p $x $y $text
p() { echo -en "\e[$2;${1}f$3"; }

The reason it works is that in POSIX shells, variables and functions effectively form two separate namespaces. Their members are thus referred to in two different ways:

p $X $Y "\e[1;37;41mB"  # call the p() function
$p "\e[?25l"  # expand the p variable (i.e. call `printf`)

Notice how functions have longer definitions but shorter usage, while the opposite is true for variables. Who can now say that Bash doesn’t find balance in all things?

Auditory sensations

Like I mentioned before, thanks to those and similar tricks I had managed to carve out about a hundred or so bytes of free space.

Now, what could you possibly do with such a staggering amount?

Two tweets at the same time!
…no, that won’t even be one tweet.

Well, let’s add some sound effects, shall we?

Before you think that’s preposterous, remember the terminal bell. Sounding the bell is as simple as printing the "\a" character (ASCII 7), which for this reason is also known as BEL:

echo -e "\a"

Unfortunately, most terminal emulators silence the actual sound, and replace it with a visual indicator — typically a bell icon. If we want to make speakers reliably emit audible phenomena, we sadly have to look elsewhere.

Fortunately, modern Linux systems handle the sound card somewhat better than you may have remembered from a few years ago. This is usually thanks to ALSA, a dedicated subsystem in the Linux kernel, and its numerous userspace complements.

One of them is the inconspicuous speaker-test binary which, well, does exactly what it says on the can:

speaker-test  # play some noise through the speakers

You can make it play a WAV file, too, but the most interesting option is to synthesize a sine wave. By adjusting its frequency, it’s easy to play higher and lower tones, forming the building blocks for more complex sounds.

What you cannot control is the tone’s duration. That’s not a big problem, though, since we can run speaker-test in a separate process and then just kill it dead:

# play a sine wave (requires ALSA): s $frequency $duration
s() { ( speaker-test >$n -t sine -f $1 )& _p=$!; sleep $2; kill -9 $_p; }

I’ve used this approach to play a simple, two-tone sound whenever the player successfully overcomes a pipe obstacle. And I would’ve probably taken it further if “speaker_test” wasn’t such a damn long string. Unfortunately, it was one identifier I couldn’t afford to shorten, and this had put a stop to my ambitious plan of improvising a sad trombone upon player’s failure :(

; done

It wouldn’t be right to say I wasn’t very happy with the results, though. All in all, it was the most fun I had with coding in quite some time, and definitely the most amusing Bash script I’ve ever written.

FLAPPY BASH

It also got me curious what other games people have implemented purely as shell scripts. To my disappointment, there hadn’t been all that many. Of those I could find, this Snake clone in about 7KB of (unobfuscated) Bash is probably the most polished one.

As you can see then, this is clearly an under-appreciated platform that evidently displays a lot of potential! If you want to create games that are both very portable and extremely space-efficient, Bash is definitely a technology you should have a closer look at ;-)


  1. Here’s the original announcement post in Polish and its somewhat understandable Google-translated version

  2. Yes, I’m ignoring the elephant in the room which is the web browser. It’s probably because a pile of minified JavaScript doesn’t strike me as very interesting anymore :) 

  3. Nowadays, though, the requestAnimationFrame function is closer to the actual continuous processing in the background. 

  4. Regular programs could simply call the alarm function instead of forking a subprocess. But then again, regular programs could just run a normal game loop. 

Continue reading

The brave “new” world of Python 3

Posted on Mon 15 August 2016 in Code • Tagged with Python, Python 3, Unicode, lazy evaluation, iterablesLeave a comment

I’ll blurt it straight up: I’m not a big fan of Python 3.

For a long time, I resisted the appeal of various incremental improvements that early 3.x releases offered. And the world agreed with me: a mere two years ago, Python 3 wasn’t even a blip on the PyPI radar.

Lately, however, things seem to be picking up some steam.

As if to compensate for years of “good enough”, Python 3 development team has given in to a steadily accelerating feature creep. Sure, some of it results in bad ideas (or even ideas you’d hope are jokes), but it nevertheless causes an increasingly wide functional gap between the 2.x and 3.x series.

Starting from around Python 3.5, this gap becomes really noticeable, even when partially bridged with many excellent backports. The ecosystem support is also mostly there, at least insofar as “not breaking horribly when a package is used in Python 3”.

And then, of course, there is the 2.7 EoL date looming ever closer.

Given all those portents, even old curmudg… ahem… seasoned developers cannot really ignore Python 3 anymore. For better or for worse, 3.x is how Python will look like in the coming years and decades. Might as well prepare for it.

In this post, I will discuss some important issues one should be aware of before trying to switch from Python 2 to 3. I won’t be talking about all the minute changes and additions, but cover the more significant, broader concepts that mark the divide between the 2.x and 3.x generations.

The two concepts I’ll be mentioning here are Unicode (obviously) and lazy vs. eager computation.

Unicode handling

You have probably heard it before. Python 3 was going to solve your Unicode problems once and for all. You haven’t believed it, of course, like you wouldn’t believe in any other silver bullet.

Still, it may be rather surprising to learn that in Python 3, you’ll actually see much more Unicode-related errors.

And strange as it may sound, it is a good thing.

In any case, either version of Python gets the most important thing about Unicode right. They both distinguish, at the type level, between strings (of Unicode codepoints) and their encodings (sequences of bytes). The type that holds the latter is called bytes in both versions, while strings are stored within the str type in Python 3 and unicode in Python 2.

It is from this crucial distinction — or rather, failing to account for it — where all the dreaded Unicode errors ultimately stem.

But where Python 2 does poorly is in the choice of defaults. You probably know all too well that bytes there is just an alias for str. That str is a fully functional string type, even though it can only contain ASCII characters. Moreover, it is also the default: quoted string literals, for example, will be of this type unless specially marked.

This poor choice of defaults is the primary source of latent Unicode bugs in Python 2 programs.

What Python 3 does here is to help expose those bugs sooner. If you already deal with Unicode correctly in your programs — maybe because you watched this excellent talk by Ned Batchelder — your main benefit will be not having to write that u"" quotes anymore. Otherwise, it’ll force you to consider the issue from the very beginning, rather than letting you write “working” programs that crash the moment they have to process some non-ASCII input.

Laziness by default

The second major change that Python 3 brings is of similar nature. It is also a change of defaults, but the impetus for it is much less evident.

What’s different in Python 3 is that many built-in functions and methods which used to return lists are now giving out bespoke objects that only mostly behave like lists. Included in these are functions like map or filter, as well as common dictionary methods such as keys or values.

This change is usually presented as removal of unnecessary cruft:

  • itertools.ifilter is now just filter
  • xrange is now just range
  • dict.iteritems is now just dict.items

and so on.

In some cases, this is exactly what happens. For example, there is virtually no downside to the new implementation of range, especially considering the way it is used most often.

But not every built-in managed to preserve all the functionality of lists. Indeed, many have downgraded their API guaratees to those of mere generators, i.e. the most simplistic and limited flavor of Python iterables. Working with them is trickier and more error-prone than with lists, which is due to various pitfalls that generators expose us to.

Navigating around those gotchas used to be something that Python code had to opt-in to, by explicitly importing the itertools module and using its functions in place of the built-ins. What you could gain in return was increased performance, and a lesser memory footprint. All those benefits came from making the computations lazy and refraining from storage of the intermediate results.

In Python 3, however, laziness is preordained. Even if we don’t need or care about the aforementioned perks, we have to devise some way of dealing with the pervasive generators.

One option is to embrace lazy evaluation fully, and adapt to handling unspecified iterables throughout our code bases.
The risk is an increased frequency of bugs stemming from generator misuse — including a common mistake of trying to iterate over lazy foos the second time, deeper down a long function, after it’s been already exhausted.

The alternative is to engage in a lot of “defensive listing”: wrapping of unknown (or known-but-lazy) iterables in list() calls in order to “sanitize” them for later (re)use.
Examples include immediate listification of a generator object:

primes = list(filter(is_prime, range(1000)))

or preemptive conversion of an incoming iterable argument:

def do_something(foos):
    foos = list(foos)
    # ...the rest of a long function...

Even if you choose the first path, and somehow use lazy generators everywhere, conversions are still required at the serialization boundaries:

d = {'foo': 42}
json.dumps({'keys': d.keys()})  # TypeError: dict_keys(['foo']) is not JSON serializable
json.dumps({'keys': list(d.keys())})  # works

At least in this case, the lazy iterable will vocally fail with an exception, rather than silently doing nothing (in case of repeated iteration) or always posing as truthy even when it’s empty (in if iterable: checks).

from __future__ import doubts

So, here they are: the highlights of Python 3. If you are disappointed they all turned out to be mixed blessings, don’t worry: you are in a good company.

The truth is that Python 3 is more finnicky, less forgiving, and much less beginner-friendly than its predecessor. Its various superficial simplifications are almost squarely balanced by many new concerns that are thrust upon an unsuspecting programmer from the very beginning.

In one possible view, this is simply a sign that the language has matured. Perhaps it’s not a coincidence that almost exactly 18 years has passed between the first public version of Python (0.9) and the release of Python 3.0. By no conceivable means it is a toy language anymore, and it’s adequately equipped to tackle challenges presented by the computing world of today.

But on the other hand, it’s clear something is being gradually lost in the process.

It’s becoming harder to claim the language favors simplicity over complexity.
It is no longer so easy to pick which way is the obvious way to do it.
It is increasingly often that ugly replaces beautiful and nested replaces flat.

Little by little, Python itself is becoming less and less pythonic. The pace isn’t breakneck, but it’s definitely noticeable. But who knows? Maybe after two decades, a wholesale redefinition of the language’s core principles really is in order.

…Well, certainly that’s necessary if some of the latest ideas are about to get in!

Continue reading

for loops in Rust

Posted on Tue 26 July 2016 in Code • Tagged with Rust, loops, iteratorsLeave a comment

In this post, I’m going to talk about the for loop construct in Rust, as well as the related concepts of iterators and “iterables”.

Depending on your programming language background, they may seem somewhat familiar in terms of syntax & semantics, or rather mysterious and surprising. Their closest analogues exist in Python, but programmers of Java, C#, or (modern) C++ should recognize many relevant features and ideas as well.

Basics

The syntax of a for loop is so modest it’s almost spartan1:

let v = vec!["1", "2", "3"];
for x in v {
    println!("{}", x);
}

As you would expect, this prints three lines with 1, 2, 3. What is probably not as obvious is that over the course of this loop the v vector was expended. Trying to use it after the iteration, we’ll get a borrow checker error:

<anon>:6:22: 6:23 error: use of moved value: `v` [E0382]
<anon>:4         println!("{}", x);
<anon>:5     }
<anon>:6     println!("{:?}", v);
                              ^

In Rust jargon, the vector has been moved into the loop. Its ownership — and that of its individual elements — has been transfered there permanently. While definitely surprising when compared to other languages, this behavior is consistent with Rust’s ubiquitous policy of moving values by default.

Still, you may not expect it here because moving ownership is mostly seen at the function call boundaries. For most intents and purposes, however, you can picture a for_each function like this to be the equivalent of the for loop above:

for_each(v, |x| println!("{}", x));

This also gives us a hint on how we could prevent the move from happening. Rather than taking the vector itself, the function could accept only a reference to it:

for_each_ref(&v, |x| println!("{}", x));

After we translate this back to the looping syntax:

 for x in &v {
    println!("{}", x);
}
println!("{:?}", v);

we won’t get any more objections from the compiler.

Iterators and “iterables” in Rust

It is important to emphasize that this new ampersand symbol (&) is by no means a part of the syntax of the for loop itself. We have actually changed what object we’re iterating here. It is no longer Vec<T> — a vector itself — but &Vec<T>, an immutable reference to it. As a consequence, x is not a T (the element type) anymore, but a &T — a reference to an element2.

So it seems that in Rust, both Vec<T> and &Vec<T> are what we would call “iterables”: collections (or other objects) that we can get iterate over. The usual way this is implemented in various programming languages is by introducing an iterator object.

The iterator keeps track of what element it’s currently pointing to and supports at least the following basic operations:

  • getting the current element
  • advancing to the next element
  • signaling when no more elements are available

Some languages provide separate iterator methods for each of those tasks, but Rust chooses to combine them all into one. You can see that when looking at the Iterator trait: next is the only method to be provided by its implementations.

Desugaring with into-iterators

How is the iterator object created, though?

In a typical Rust manner, this job is delegated to another trait. This one is called IntoIterator, and it roughly corresponds to the “iterable” concept I’ve alluded to earlier:

// (simplified)
trait IntoIterator {
    fn into_iter(self) -> Iterator;
}

What is uniquely Rusty is the fact that into_iter — the sole method of this trait — doesn’t just create a new iterator for the collection. Instead, it effectively consumes the whole thing, leaving the new iterator as its only remnant and the only way to access the items3.

This, of course, is a direct manifestation of the Rust’s move-by-default policy. In this case, it protects us from the common problem of iterator invalidation which is probably all-too-familiar to C++ programmers. Because the collection is essentially “converted” to an iterator here, it is impossible:

  • for more than one iterator to exist at a time
  • to modify the collection while any iterators are in scope

Doesn’t all this “moving” and “consuming” sound familiar, by the way? I’ve mentioned earlier that when we iterate over a vector with a for loop, we essentially move it “into the loop”.

As you can probably deduce by now, what really happens is that IntoIterator::into_iter is invoked on the vector. Its result — the iterator object — is then repeatedly nexted until it returns None.

In a way, a for loop like this:

for x in v {
    // body
}

is therefore nothing else but a syntactic sugar for the following expanded version:

let mut iter = IntoIterator::into_iter(v);
loop {
    match iter.next() {
        Some(x) => {
            // body
        },
        None => break,
    }
}

You can see quite clearly that v is unusable not only after the loop ends, but before it even begins. This is because it has been moved into iter — into an iterator — through an into_iter method… of IntoIterator!

Simple, huh? :)

for loop is just a syntactic sugar for an IntoIterator::into_iter invocation, followed by repeated calling of Iterator::next.

The ampersand

On a more serious note, this move isn’t something that we’d always want to happen. Fortunately, we know a way to prevent it. Rather than iterating over the vector itself, use a reference to it:

for x in &v {
    // body
}

The great thing about this syntax is that everything said above still applies, up to and including the desugaring procedure. The into_iter method is still being invoked, except that this time it is done on the reference to the collection&Vec<T> rather than Vec<T>:

// (simplified)
impl IntoIterator for &Vec<T> {
    fn into_iter(self) -> Iterator<Item=&T> { ... }
}

The result is therefore an iterator that yields references to the elements (&T), rather than elements themselves (T). And because self above is also a reference, the collection isn’t really moved anywhere, which is why we can freely access it after the loop ends.

The exact same thing happens when looping over a mutable reference:

for x in &mut v {
    // body
}

except that this time into_iter is called for &mut Vec<T>. Result is therefore of type Iterator<Item=&mut T>, enabling us to modify the elements as we go through them.

No further compiler machinery is required to support those two cases, because everything is already handled by the same trait.

The IntoIterator desugaring works the same way for collections and both immutable and mutable references to them.

What about the iter() method?

So far, we’ve talked about regular for loops, and the very imperative style of computation they represent.

If you are more inclined towards functional programming, though, you may have seen and written rather different constructs, combining various “fluent” methods into expressions such as this one:

let doubled_odds: Vec<_> = numbers.iter()
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

Methods like map and filter here are called iterator adapters, and they are all defined on the Iterator trait. Not only are they very powerful and numerous, but they can also be supplemented through several third-party crates.

In order to take advantage of the adapters, however, we need to obtain an iterator for our collection first. We know that into_iter is the way loops normally do it, so in principle we could follow the same approach here:

let doubled_odds: Vec<_> = IntoIterator::into_iter(&numbers)
    .filter(|&x| x % 2 != 0).map(|&x| x * 2).collect();

To spare us the verbosity of this explicit syntax, collections normally offer an iter method which is exactly equivalent4. This method is what you will normally see in chained expressions like the one above.

v.iter() is just a shorthand for IntoIterator::into_iter(&v).

Why not both?

The last thing to note is that Rust mandates neither loops nor iterator adapters when writing code that operates on collections of elements. When optimizations are turned on in the release mode, both versions should compile to equally efficient machine code, with closures inlined and loops unrolled where necessary.

Choosing one style over the other is therefore a matter of convention and style. Sometimes the right choice may actually be a mix of both approaches, and Rust allows it without any complaints:

fn print_prime_numbers_upto(n: i32) {
    println!("Prime numbers lower than {}:", n);
    for x in (2..n).filter(|&i| is_prime(i)) {
        println!("{}", x);
    }
}

Like before, this is possible through the same for loop desugaring that involves the IntoIterator trait. In this case, Rust will simply use a no-op implementation of this trait, “converting” any existing Iterator into” itself.

Iterators themselves are also “iterables”, implementing IntoIterator::into_iter as a pass-through.

Looping around

If you want to know even more about iterators and loops in Rust, the best source at this point is probably just the official documentation. And although mastering all the iterator adapters is of course not necessary to write effective Rust code, taking a careful look at least at the collect method (and the associated FromIterator trait) is definitely helpful.


  1. The “two-semicolon” variant of the for loop doesn’t exist in Rust. Just like in Python, the equivalent is iterating over a range object, or using a regular while loop for more complex cases. 

  2. This shift is completely transparent in the loop’s body. The way it works is based on Rust’s special mechanism called the Deref coercions. Without going into too much detail (as it is way out of scope for this post), this feature allows us to treat references to objects (&T) as if they were the objects themselves (T). The compiler will perform the necessary derefencing where possible, or signal an error in case of a (rare) ambiguity. 

  3. How do we know that? It’s because into_iter takes self (rather than &self or &mut self) as its first parameter. It means that the entire object for which this method is called is moved into its body (hence the method’s name). 

  4. Curiously enough, this equivalence isn’t encoded in the type system in any way, making it technically just a convention. It is followed consistently at least in the standard library, though. 

Continue reading

str.startswith() with tuple argument

Posted on Tue 28 June 2016 in Code • Tagged with Python, strings, tuplesLeave a comment

Here’s a little known trick that’s applicable to Python’s startswith and endswith methods of str (and unicode).

Suppose you’re checking whether a string starts with some prefix:

if s.startswith('http://'):
    # totally an URL

You eventually add more possible prefixes (or suffixes) to your condition:

if s.startswith('http://') or s.startswith('https://'):
    # ...

Later on you notice the repetition and refactor it into something like this:

SCHEMES = ['http://', 'https://', 'ftp://', 'git://']
if any(s.startswith(p) for p in SCHEMES):
    # ...

or if you’re feeling extra functional:

if any(map(s.startswith, SCHEMES)):
    # ...

Turns out, however, that startswith (and endswith) support this use case natively. Rather than passing just a single string as the argument, you can provide a tuple of strings instead:

SCHEMES = ('http://', 'https://', 'ftp://', 'git://')
if s.startswith(SCHEMES):
    # ...

Either method will then check the original string against every element of the passed tuple. Both will only return True if at least one of the strings is recognized as prefix/suffix. As you can see, that’s exactly what we would previously do with any.

Somewhat surprisingly, however, the feature only works for actual tuples. Trying to pass a seemingly equivalent iterable — a list or set, for example — will be met with interpreter’s refusal:

>>> is_jpeg = filename.endswith(['.jpg', '.jpeg'])
TypeError: endswith first arg must be str, unicode, or tuple, not list

If you dig into it, there doesn’t seem to be a compelling reason for this behavior. The relevant feature request talks about consistency with the built-in isinstance function, but it’s quite difficult to see how those two are related.

In any case, this can be worked around without much difficulty:

PROTOCOLS = ('http', 'https', 'ftp', 'git')
if s.startswith(tuple(p + '://' for p in PROTOCOLS)):
    # ...

though ideally, you’d want to pack the prefixes in a tuple to begin with.

Continue reading

…or lambda?

Posted on Mon 20 June 2016 in Code • Tagged with Python, syntax, lambda, operatorsLeave a comment

a.k.a. Curious Facts about Python Syntax

In Python 3.3, a new method has been added to the str type: casefold. Its purpose is to return a “sanitized” version of the string that’s suitable for case-insensitive comparison. For older versions of Python, an alternative way that’s mostly compatible is to use the str.lower method, which simply changes all letters in the string to lowercase.

Syntax is hard

Easy enough for a compatibility shim, right? That’s exactly what I thought when I came up with this:

casefold = getattr(str, 'casefold', None) or lambda s: s.lower()

Let’s ignore for a moment the fact that for a correct handling of unicode objects in Python 2, a much more sophisticated approach is necessary. What’s rather more pertinent is that this simple code doesn’t parse:

  File "foo.py", line 42
    getattr(str, 'casefold', None) or lambda s: s.lower()
                                      ^
SyntaxError: invalid syntax

It’s not very often that you would produce a SyntaxError with code that looks perfectly valid to most pythonistas. The last time I had it happen, the explanation was rather surprising and not exactly trivial to come by.

Fortunately, there is always one place where we can definitively resolve any syntactic confusion. That place is the full grammar specification of the Python language.

It may be a little intimidating at first, especially if you’re not familiar with the ENBF notation it uses. All the Python’s language constructs are there, though, so the SyntaxError from above should be traceable to a some rule of the grammar1.

The culprit

And indeed, the offending bit is right here:

or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
...

It says, essentially, that Python defines the or expression (or_test) as a sequence of and expressions (and_test). If you follow the syntax definition further, however, you will notice that and_test expands to comparisons (a < b, etc.), arithmetic expressions (x + y, etc.), list & dict constructors ([foo, bar], etc.), and finally to atoms such as literal strings and numbers.

What you won’t see along the way are lambda definitions:

lambdef: 'lambda' [varargslist] ':' test

In fact, the branch to allow them is directly above the or_test:

test: or_test ['if' or_test 'else' test] | lambdef

As you can see, the rule puts lambdas at the same syntactical level as conditional expressions (x if a else b), which is very high up. The only thing you can do with a lambda to make a larger expression is to add a yield keyword before it2, or follow it with a comma to create a tuple3.

You cannot, however, pass it as an argument to a binary operator, even if it otherwise makes sense and even looks unambiguous. This is also why the nonsensical expressions such as this one:

1 + lambda: None

will fail not with TypeError, but also with SyntaxError, as they won’t even be evaluated.

More parentheses

Savvy readers may have noticed that this phenomenon is very much reminiscent of the issue of operator precedence.

Indeed, in Python and in many other languages it is the grammar that ultimately specifies the order of operations. It does so simply by defining how expressions can be constructed.

Addition, for example, will be of lower priority than multiplication simply because a sum is said to comprise of terms that are products:

arith_expr: term (('+'|'-') term)*
term: factor (('*'|'/'|'%'|'//') factor)*

This makes operator precedence a syntactic feature, and its resolution is baked into the language parser and handled implicitly4.

We know, however, that precedence can be worked around where necessary by enclosing the operator and its arguments in a pair of parenthesis. On the syntax level, this means creating an entirely new, top-level expression:

atom: '(' [yield_expr|testlist_comp] ')' |  # parenthesized expression
       '[' [listmaker] ']' |
       '{' [dictorsetmaker] '}' |
       '`' testlist1 '`' |
       NAME | NUMBER | STRING+)

There, it is again possible to use even the highest-level constructs, including also the silly stuff such as trying to add a number to a function:

1 + (lambda: None)

This expression will now parse correctly, and produce TypeError as expected.

In the end, the resolution of our initial dilemma is therefore rather simple:

casefold = getattr(str, 'casefold', None) or (lambda s: s.lower())

  1. Such rules are sometimes called productions of the grammar, a term from computational linguistics. 

  2. Yes, yield foo is an expression. Its result is the value sent to the generator by outer code via the send method. Since most generators are used as iterables, typically no values are passed this way so the result of a yield expression is None

  3. There are also a legacy corner cases of lambdas in list/dict/etc. comprehensions, but those only apply under Python 2.x. 

  4. This saying, there are languages where the order is resolved at later stage, after the expressions have already been parsed. They usually allow the programmer to change the precedence of their own operators, as it’s the case in Haskell

Continue reading

& vs. ref in Rust patterns

Posted on Thu 02 June 2016 in Code • Tagged with Rust, pattern matching, borrowing, referencesLeave a comment

Rust is one of those nice languages with pattern matching. If you don’t know, it can be thought of as a generalization of the switch statement: comparing objects not just by value (or overloaded equality operator, etc.) but by structure:

match hashmap.get(&key) {
    Some(value) => do_something_with(value),
    None => { panic!("Oh noes!"); },
}

It doesn’t end here. As you can see above, objects can also be destructured during the match (Some(value)), their parts assigned to bindings (value), and those bindings can subsequently be used in the match branch.

Neat? Definitely. In Rust, pattern matching is bread-and-butter of not only the match statement, but also for, (if)let, and even ordinary function arguments.

Mixing in Rust semantics

For a long time, however, I was somewhat confused as to what happens when references and borrowing is involved in matching. The two “operators” that often occur there are & (ampersand) and ref.

You should readily recognize the first one, as it is used pervasively in Rust to create references (and reference types). The second one quite obviously hints towards references as well. Yet those two constructs serve very different purposes when used within a pattern.

To add to the confusion, they are quite often encountered together:

use hyper::Url;

// print query string params of some URL
let url = Url::parse(some_url).unwrap();
let query_params: Vec<(String, String)> = url.query_pairs().unwrap_or(vec![]);
for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Lack of one or the other will be (at best) pointed out to you by the compiler, along with a suggestion where to add it. But addressing problems in this manner can only go so far. So how about we delve deeper and see what it’s really about?

Part of the reference, part of the pattern

Rust is very flexible as to what value can be a subject of pattern matching. You would be very hard pressed to find anything that cannot be used within a match statement, really. Both actual objects and references to objects are perfectly allowed:

struct Foo(i32);
// ...
let foo = &Foo(42);
match foo {
    x => println!("Matched!"),
}

In the latter case, however, we aren’t typically interested in the reference itself (like above). Instead, we want to determine some facts about the object it points to:

match foo {
    &Foo(num) => println!("Matched with number {}", num),
}

As you can see, this is where the ampersand comes into play. Just like a type constructor (Some, Ok, or Foo above), the & operator informs the Rust compiler what kind of value we’re expecting from the match. When it sees the ampersand, it knows we’re looking for references to certain objects, and not for the objects themselves.

Why is the distinction between an object and its reference important, though? In many other places, Rust is perfectly happy to blur the gap between references and actual objects1 — for example when calling most of their methods.

Pattern matching, however, due to its ability to unpack values into their constituent parts, is a destructive operation. Anything we apply match (or similar construct) to will be moved into the block by default:

let maybe_name = Some(String::from("Alice"));
// ...
match maybe_name {
    Some(n) => println!("Hello, {}", n),
    _ => {},
}
do_something_with(maybe_name)

Following the typical ownership semantics, this will prevent any subsequent moves and essentially consume the value:

error: use of partially moved value: `maybe_name` [E0382]
    do_something_with(maybe_name);
                      ^~~~~~~~~~

So just like the aforementioned type constructors (Some, etc.), the ampersand operator is simply part of the pattern that we match against. And just like with Some and friends, there is an obvious symmetry here: if & was used to create the value, it needs to be used when unpacking it.

The syntax used in a pattern that destructures an object is analogous to one used by the expression which created it.

Preventing the move

Errors like the one above often contain helpful notes:

note: `(maybe_name:core::option::Option::Some).0` moved here because it has type `collections::string::String`, which is moved by default
         Some(n) => println!("Hello, {}", n),
              ^

as well as hints for resolving them:

help: if you would like to borrow the value instead, use a `ref` binding as shown:
        Some(ref n) => println!("Hello, {}", n),

Here’s where ref enters the scene.

The message tells us that if we add a ref keyword in the suggested spot, we will switch from moving to borrowing for the match binding that follows (here, n). It will still capture its value exactly as before, but it will no longer assume ownership of it.

This is the crucial difference.

Unlike the ampersand, ref is not something we match against. It doesn’t affect what values match the pattern it’s in, and what values don’t2.

The only thing it changes is how parts of the matched value are captured by the pattern’s bindings:

  • by default, without ref, they are moved into the match arms
  • with ref, they are borrowed instead and represented as references

Looking at our example, the n binding in Some(n) is of type String: the actual field type from the matched structure. By contrast, the other n in Some(ref n) is a &String — that is, a reference to the field.

One is a move, the other one is a borrow.

ref annotates pattern bindings to make them borrow rather than move. It is not a part of the pattern as far as matching is concerned.

Used together

To finish off, let’s untangle the confusing example from the beginning of this post:

for &(ref name, ref value) in &query_params {
    println!("{}={}", name, value);
}

Since we know ref doesn’t affect whether or not the pattern matches, we could just as well have something like &(a, b). And this should be quite a bit easier to read: it clearly denotes we expect a reference to a 2-tuple of simple objects. Not coincidentally, such tuples are items from the vector we’re iterating over.

Problem is, without the refs we will attempt to move those items into the loop scope. But due to the way the vector is iterated over (&query_params), we’re only borrowing each item, so this is actually impossible. In fact, it would be a classic attempt to move out of a borrowed context.

It is also wholly unnecessary. The only thing this loop does is printing the items out, so accessing them through references is perfectly fine.

And this is exactly what the ref operator gives us. Adding the keyword back, we will switch from moving the values to just borrowing them instead.

To sum up

  • & denotes that your pattern expects a reference to an object. Hence & is a part of said pattern: &Foo matches different objects than Foo does.

  • ref indicates that you want a reference to an unpacked value. It is not matched against: Foo(ref foo) matches the same objects as Foo(foo).


  1. The technical term for this is a Deref coercion

  2. We can say that it doesn’t affect the satisfiability (or conversely, refutability) of the pattern. 

Continue reading

Mock.configure_mock fix for Python

Posted on Sat 07 May 2016 in Code • Tagged with Python, mock, patchingLeave a comment

Python’s mocking library is rather uncomplicated. Most of what it does is creating mock objects: veritable sponges that absorb every interaction with any code that we pass them to.

This simplicity is also surfaced in the API, especially in the main part of it — the mock.Mock constructor:

some_mock = mock.Mock(url='http://example.com')
assert some_mock.url == 'http://example.com'

Any arguments that we pass there become attributes on the resulting mock object. This is really useful when patching, because it allows us to completely specify the replacement object within a @mock.patch decorator:

@patch.object(requests, 'get', new=Mock(return_value=Mock(status_code=400)))
def test_404(self):
    self.assertRaises(NotFoundError):
        do_stuff()

You have to keep in mind, however, that the mock.Mock class also has some constructor arguments of its own. For this reason, there exists some potential for name collision: some of the Mocks own arguments may have the same names as the attributes we’d like to set on the mock object:

some_mock = mock.Mock(name="John Doe")  # doesn't set the `name` attribute
assert some_mock.name == "John Doe"  # blows up!

Here, the name argument is inherent to the Mock class. Its constructor will interpret it in a special way, and so it won’t set a name attribute on the resulting mock. Other possible culprits include the spec and wraps parameters, both of which have relatively common names that we may want to use as object attributes1.

Collision avoidance

It’s trivial to fix the issue, of course:

some_mock = mock.Mock()
some_mock.name = "John Doe"
assert sock_mock.name == "John Doe"

but this approach has a downside. Creating and configuring a mock is no longer a single expression, which means we cannot use it with patchers as easily as before:

@patch.object(foo, 'bar')
def test_something(self, mock_bar):
    mock_bar.name = "John Doe"
    # (...rest of the test...)

We can either configure the mock after patching, like above, or perhaps introduce some utility functions to be called inside the @patch decorator.

The almost-there method

In any case, this is somewhat disappointing. And it is even more so when we discover that there is a method called configure_mock which looks like it was designed to solve this very issue. Its arguments are always interpreted as attributes of the mock: it has no “special” or “reserved” names. Indeed, this method is what allows us to actually write the mock setup as a single expression:

some_mock.configure_mock(name="John Doe")

Problem is, this expression returns None.

Yes, configure_mock returns nothing.
Or in other words, it doesn’t return anything.
In fact, it has no return statement whatsoever.

Most importantly, it doesn’t have the return self line that’d enable us to write this:

some_mock = mock.Mock().configure_mock(name="John Doe")

Well, that is quite a let-down.

Fixing it

But hey, this is Python! Shortcomings like that don’t necessarily mean we have to fork whole libraries. Let’s just add the missing return, shall we?

from mock import Mock as _Mock

class Mock(_Mock):
    def configure_mock(self, **kwargs):
        super(Mock, self).configure_mock(**kwargs)
        return self  # <-- there!

Whew, that was quick!

…Alright, that’s actually the whole fix, but it’s close. To complete it, we need to apply the same treatment to three more Mock classes: MagicMock, NonCallableMock, and NonCallableMagicMock.

A complete solution can be seen in this gist.


  1. Collision may also occur with mock.patch constructs. The most likely offender there is probably the new parameter

Continue reading

Source code of a Python lambda

Posted on Tue 19 April 2016 in Code • Tagged with Python, functions, AST, bytecodeLeave a comment

…or: The Most Hideous Hack I’ve (Almost) Done

In callee, the argument matcher library for Python that I released recently, there is this lovely TODO note for a seemingly simple feature. When using the Matching construct with a simple lambda predicate:

mock_foo.assert_called_with(Matching(lambda x: x % 2 == 0))

it would be great to see its code in the error message if the assertion fails. Right now it’s just going to say something like <Matching <function <lambda> at 0x7f5d8a06eb18>>. Provided you don’t possess a supernatural ability of dereferencing pointers in your head, this won’t give you any immediate hint as to what went wrong. Wouldn’t it be nice if it read as, say, <Matching \x: x % 2> instead?1

So I thought: why not try and implement such a mechanism? This is Python, after all — a language where you can spawn completely new classes at runtime, walk the stack backwards (or even forward) and read the local variables, or change the behavior of the import system itself. Surely it would be possible — nay, easy — to get the source code of a short lambda function, right?

Boy, was I wrong.

Make no mistake, though: the task turned out to be absolutely doable, at least in the scope I wanted it done. But what would you think of a solution that involves not just the usual Python hackery, but also AST inspection, transformations of the source code as text, and bytecode shenanigans?…

The code, all the code, and… much more than the code

Let’s start from the beginning, though. Here’s a short lambda function, the kind of which we’d like to obtain the source code of:

is_even = lambda x: x % 2 = 0

If the documentation for Python standard library is to be believed, this should be pretty easy. In the inspect module, there is a function called no different than getsource. For our purposes, however, getsourcelines is a little more convienient, because we can easily tell when the lambda is too long:

def get_short_lambda_source(lambda_func):
    try:
        source_lines, _ = inspect.getsourcelines(lambda_func)
    except IOError:
        return None
    if len(source_lines) > 1:
        return None
    return source_lines[0].strip()

Of course if you programmed in Python for any longer period of time, you know very well that the standard docs are not to be trusted. And it’s not just that the except clause should also include TypeError, because it will be thrown when you try to pass any of the Python builtins to getsourcelines.

More important is the ambiguity of what does “source lines for an object” actually mean. “Source lines containing the object definition” would be much more accurate, and this seemingly small distinction is rather crucial here. Passing a lambda function to either getsourcelines or getsource, we’ll get its source and everything else that the returned lines included.

That’s right. Say hello to the complete is_even = assignment, and the entire assert_called_with invocation! And in case you are wondering: yes, the result will also include any end-of-line comments. No token left behind!

Trim left

Clearly this is more than we’ve bargained for. Maybe there is a way to strip away the unnecessary cruft? Python does know how to parse itself, after all: the standard ast module is a manifestation of this knowledge. Perhaps we can use it to retrieve the lambda AST node in order to turn it — and just it — back into Python code?…

def get_short_lambda_ast_node(lambda_func):
    source_text = get_short_lambda_source(lambda_func)
    if source_text:
        source_ast = ast.parse(source_text)
        return next((node for node in ast.walk(source_ast)
                     if isinstance(node, ast.Lambda)), None)

But as it turns out, getting the source text back this way is only mostly possible.

See, every substantial AST node — which is either an expression (ast.expr) or a statement (ast.stmt) — has two common attributes: lineno and col_offset. When combined, they point to a place in the original source code where the node was parsed from. This is how we can find out where to look for the definition of our lambda function.

Looks promising, right? The only problem is we don’t know when to stop looking. That’s right: nodes created by ast.parse are annotated with their start offset, but not with length nor the end offset. As a result, the best we can do when it comes to carving out the lambda source from the very first example is this:

lambda x: x % 2 == 0))

So close! Those hanging parentheses are evidently just taunting us, but how can we remove them? lambda is basically just a Python expression, so in principle it can be followed by almost anything. This is doubly true for lambdas inside the Matching construct, as they may be a part of some larger mock assertion:

mock_foo.assert_called_with(Matching(lambda x: x % 2 == 0), Integer() & GreaterThan(42))

Here, the extraneous suffix is the entirety of ), Integer() & GreaterThan(42)), quite a lot of more than just )). And that’s of course nowhere near the limit of possiblities: for one, there may be more lambdas in there, too!

Back off, slowly

It seems, however, that there is one thing those troublesome tails have in common: they aren’t syntactically valid.

Intuitively, a lambda node nested within some other syntactical constructs will have their closing fragments (e.g. )) appear somewhere after its end. Without the corresponding openings (e.g. Matching(), those fragments won’t parse.

So here’s the crazy idea. What we have is invalid Python, but only because of some unspecified number of extra characters. How about we just try and remove them, one by one, until we get something that is syntactically correct? If we are not mistaken, this will finally be our lambda and nothing else.

The fortune favors the brave, so let’s go ahead and try it:

# ... continuing get_short_lambda_source() ...

source_text = source_lines[0].strip()
lambda_node = get_short_lambda_ast_node(lambda_func)

lambda_text = source_text[lambda_node.col_offset:]
min_length = len('lambda:_')  # shortest possible lambda expression
while len(lambda_text) > min_length:
    try:
        ast.parse(lambda_text)
        return lambda_text
    except SyntaxError:
        lambda_text = lambda_text[:-1]
return None

Considering that we’re basically taking lessons from the dusty old tomes in the Restricted Section of Hogwarts library, the magic here looks quite simple. As long as there is something that can pass for a lambda definition, we try to parse it and see if it succeeds. The line that says except SyntaxError: is obviously not something for the faint of heart, but at least we are specifying what exception we anticipate catching.

And the kicker? It works. By that I mean it doesn’t return garbage results for a few obvious and not so obvious test cases, which is already more than you would normally expect from hacks of this magnitude. All the lambdas defined until this paragraph, for example, can have their source code extracted without issue.

Just one more thing

So… victory? Not quite. Astute readers may recall my promise of some bytecode arcana, and now’s the time for it.

Despite the initial success of our gradual, character dropping approach, there are cases where it doesn’t produce the correct result. Consider, for example, a lambda definition that’s nestled within a tuple2:

>>> x = lambda _: True, 0
>>> get_short_lambda_source(x[0])
lambda _: True, 0

We would of course expect the result to be lambda _: True, without a comma or zero.

Unfortunately, here’s where our earlier assumption fails rather spectacularly. The line of code extracted from AST is syntactically valid even with the extra characters. As a result, ast.parse succeeds too early and returns an incorrect definition. It should have been of a lambda contained within a tuple, but tuple is apparently what the lambda returns.

You may say that this is the sharp end of a narrow edge case, and anyone who defines functions like that deserves all the trouble they get. And sure, I wouldn’t mind if we just threw hands in the air and told them we’re simply unable to retrieve the source here. But my opinion is that it doesn’t justify serving them obviously wrong results!

A halting problem

Not if we can help it, anyway. Have a look at the expected source code and the one we’ve extracted, side by side:

lambda _: True
lambda _: True, 0

The second line isn’t just longer: it is also doing more. It isn’t just defining a lambda; it defines it, conjures up a constant 0, and then packs them both into a tuple. That’s at least two additional steps compared to the original.

Those steps have a more precise name, too: they are the bytecode instructions. Every piece of Python source is compiled to a binary bytecode before it’s executed, because the interpreter can only work with this representation. Compilation typically happens when a Python module is first imported, producing a .pyc file corresponding to its .py file. Subsequent imports will simply reuse the cached bytecode.

Moreover, any function or class object has its bytecode accessible (read-only) at runtime. There is even a dedicated data type to hold it — called simply code — with a buffer of raw bytes under one of its attributes.

Finally, the bytecode compiler itself is also available to Python programs as a built-in compile function. You don’t see it used as often as its counterparts eval and exec (which hopefully are a rare sight themselves!), but it taps into the same internal machinery of Python.

So how does it all add up? The idea is, basically, to cross-check the alleged source code of the lambda with its own bytecode. Any junk that’s still left to trim — even if syntactically valid — will surface as a divergence after compilation. Thus we can simply continue dropping characters until the bytecodes match:

lambda_text = source_text[lambda_node.col_offset:]
lambda_body_text = source_text[lambda_node.body.col_offset:]
min_length = len('lambda:_')  # shortest possible lambda expression
while len(lambda_text) > min_length:
    try:
        code = compile(lambda_body_text, '<unused filename>', 'eval')
        if len(code.co_code) == len(lambda_func.__code__.co_code):
            return lambda_text
    except SyntaxError:
        pass
    lambda_text = lambda_text[:-1]
    lambda_body_text = lambda_body_text[:-1]
return None

Okay, maybe not the exact bytes3, but stopping at the identical bytecode length is good enough a strategy. As an obvious bonus, compile will also take care of detecting syntax errors in the candidate source code, so we don’t need the ast parsing anymore.

That escalated quickly!

Believe it or not, but there aren’t any more objections to this solution, You can view it in its glorious entirety by looking at this gist.

Does it mean it is also making its cameo in the callee library?…

No, I’m afraid not.

Normally, I’m not the one to shy away from, ahem, bold solutions to tough problems. But in this case, the magnitude of hackery required is just too great, the result not satisfactory enough, the feature’s priority isn’t really all that high, and the maintenance burden it’d introduce is most likely too large.

In the end, it was great fun figuring it out: yet another example of how you can fiddle with Python to do basically anything. Still, we must not get too preoccupied with whether or not we can as to forget if we should.


  1. Backslash (\) is how lambda functions are denoted in Haskell. We want to be short and sweet, so it feels like a natural choice. 

  2. This isn’t an actual snippet from a Python REPL, because inspect.getsourcelines requires the object to be defined in a .py file. 

  3. Why we won’t always get an identical bytecode? The short answer is that some instructions may be swapped for their approximate equivalents.
    The long answer is that with compile, we aren’t able to replicate the exact closure environment of the original lambda. When a function refers to an free variable (like foo in lambda x: x + foo), it is its closure where the value for that variable comes from. For ad-hoc lambdas, this is typically the local scope of its outer function.
    Code produced by compile, however, isn’t associated with any such local scope. All free names are thus assumed to refer to global variables. Because Python uses different bytecode instructions for referencing local and global names (LOAD_FAST vs LOAD_GLOBAL), the result of compile may differ from a piece of bytecode produced in the regular manner. 

Continue reading