Karol Kuczmarski's Blog

Taking string arguments in Rust

Posted on Tue 24 December 2019 in Code • Tagged with Rust, strings, arguments, borrowing, ownership • Leave a comment

Strings of text seem to always be a complicated topic when it comes to programming. This counts double for low-level languages which expose the programmer to the full complexity of memory management and allocation.

Rust is, obviously, one of those languages. Strings in Rust are therefore represented using two distinct types: str (the string slice) and String (the owned/allocated string). Learning how to juggle those types is something you need to do very early if you want to be productive in the language.

But even after you’ve programmed in Rust for some time, you may still trip on some more subtle issues with string handling. In this post, I will concentrate on just one common task: writing a function that takes a string argument. We’ll see that even there, we can encounter a fair number of gotchas.

Just reading it

Let’s start with a simple case: a function which merely reads its string argument:

fn hello(name: &str) {
    println!("Hello, {}!", name);
}

As you’re probably well aware, using str rather than String is the idiomatic approach here. Because a &str reference is essentially an address + length, it can point to any string wheresoever: a 'static literal, a heap-allocated String, or any portion or substring thereof:

hello("world");
hello(&String::from("Alice"));
hello(&"Dennis Ritchie"[0..6]);

Contrast this with an argument of type &String:

fn hello(name: &String) {
    println!("Hello, {}!", name);
}

which mandates an actual, full-blown String object:

hello(&String::from("Bob"));
// (the other examples won't work)

There are virtually no circumstances when you would want to do this, as it potentially forces the caller to needlessly put the string on the heap. Even if you anticipate all function calls to involve actual String objects, the automatic Deref coercion from &String to &str should still allow you to use the more universal, str-based API.

Hiding the reference

If rustc can successfully turn a &String into &str, then perhaps it should also be possible to simply use String when that’s more convenient?

In general, this kind of “reverse Deref” doesn’t happen in Rust outside of method calls with &self. It seems, however, that it would sometimes be desirable; one reasonable use case involves chains of iterator adapters, most importantly map and for_each:

let strings: Vec<String> = vec!["Alice".into(), "Bob".into()];
strings.into_iter().for_each(hello);

Since the compiler doesn’t take advantage ofDeref coercions when inferring closure types, their argument types have to match exactly. As a result, we often need explicit |x| foo(x) closures which suffer from poorer readability in long Iterator or Stream-based expressions.

We can make the above code work — and also retain the ability to make calls like hello("Charlie"); — by using one of the built-in traits that generalize over the borrowing relationships. The one that works best for accepting string arguments is called AsRef¹:

fn hello<N: AsRef<str>>(name: N) {
    println!("Hello, {}!", name.as_ref());
}

Its sole method, AsRef::as_ref, returns a reference to the trait’s type parameter. In the case above, that reference will obviously be of type &str, which circles back to our initial example, one with a direct &str argument.

The difference is, however, that AsRef<str> is implemented for all interesting string types — both in their owned and borrowed versions. This obviates the need for Deref coercions and makes the API more convenient.

Own it

Things get a little more complicated when the string parameter is needed for more than just reading. For storage and potential mutation, a &str reference is not enough: you need an actual, full-blown String object.

Now, you may think this is not a huge obstacle. After all, it’s pretty easy to “turn” &str into a String:

struct Greetings {
    Vec<String> names,
}

impl Greetings {
    // Don't do this!
    pub fn hello(&mut self, name: &str) {
        self.names.push(name.clone());
    }
}

But I strongly advise against this practice, at least in public APIs. If you expose such function to your users, you are essentially tricking them into thinking their input will only ever be read, not copied, which has implications on both performance and memory usage.

Instead, if you need to take ownership of the resulting String, it is much better to indicate this in the function signature directly:

pub fn hello(&mut self, name: String) {
    self.names.push(name);
}

This shifts the burden on creating the String onto the caller, but that’s not necessarily a bad thing. On their side, the added boilerplate can pretty minimal:

let mut greetings = Greetings::new();
grettings.hello(String::from("Dylan"));  // uhm...
greetings.hello("Eva".to_string());      // somewhat better...
grettings.hello("Frank".to_owned());     // not too bad
greetings.hello("Gene".into());          // good enough

while clearly indicating where does the memory allocation happen.

It is also idiomatically used for functions taking Path parameters, i.e. AsRef<Path>. ↩

Yes, the npm ecosystem is at fault

Posted on Tue 27 November 2018 in Programming • Tagged with npm, Javascript, open source, package manager, security • Leave a comment

Even if you are like me and don’t use any (server-side) Javascript, or node.js, or npm, it may feel like every other week there are news about yet another npm snafu.

The latest incident could be the most severe one yet, at least from the security standpoint. In a nutshell, the owner of event-stream — a popular package even by npm standards — had transferred ownership of it to another person. The new owner would then publish a new version of the package which contained a targeted backdoor, intended to steal large amounts of Bitcoin from the users of one particular wallet provider.

Quite a step up from the left-pad fiasco, isn’t it?…

Like it usually happens with any major incident, lots of people are eager to put the blame on someone, or something, as quickly as possible. Rather unsurprisingly, the original owner of event-stream came up as a pretty obvious target. Without his abdication of ownership rights, the argument goes, the entire ordeal wouldn’t have happened in the first place. More broadly, as a maintainer of a popular package, he owes it to the community to make sure it remains available and safe to use, now and for the foreseeable future.

But if those are the responsibilities, then what are the rewards?… Well, in the author’s own words, “you get literally nothing from maintaing a popular package”. Looks like once the fun of it wears off, it’s just all work and no play, for little to no tangible benefit.

This problem is of course not specific to Javascript or npm. Here’s, for example, a good take on the issue from the creator of Clojure.

However, this doesn’t mean every other package manager is equally susceptible to the sort of issues that have shaken npm time and time again. To say the same thing could’ve happened to Python/Ruby/Go/Haskell/etc. is vacuous. While factually true, it’s an instance of the Fallacy of Grey: a claim that because nothing is ideal, everything must be equally flawed.

In reality, the Javascript ecosystem facilitated by npm is singularly vulnerable to problems of this kind. It follows directly from how granular the npm packages are, and how wide and deep their dependency trees get.

Indeed, it would be quite a fascinating exercise to quantify the difference numerically, by comparing the average size of a dependency graph between npm, PyPI, Rubygems, Hoogle, and so on. It’s rather obvious npm would come as a “winner”, likely with Rust’s Cargo not very far behind.

Apparently, this not unintentional either — the node.js community seems to like it this way. But as the yesterday’s incident has exposed, this state of affairs relies on the distributed conscientiousness of very many parties — in this case, npm package authors. When one inevitably falters, the impact reaches far and wide, rippling through the massively intertwined npm registry.

While it may be satisfying, then, to blame the package owner as the immediate culprit, it’s way more productive to consider how the damage could have been mitigated.

We can, for example, look at the unimaginably dense dependency graph of npm — if anyone ever dared to visualize it in its entirety — and realize it shouldn’t really have so many edges. Had it been little sparser, more similar to the graphs of PyPI or Ruby, then removing (i.e. compromising) a single leaf node would’ve had proportionally smaller impact.

So yes, it’s true that all package managers are exposed to the risk of compromised packages. It is, however, something that can be estimated and compared — and unfortunately, it is clear that for npm, this risk is currently the largest.

A Haskell retrospective

Posted on Sat 18 August 2018 in Programming • Tagged with Haskell, functional programming, type systems, Facebook • Leave a comment

Approximately a year ago, I had the opportunity to work on Sigma — a large, distributed system that protects Facebook users from spam and other kinds of abuse.

One reason it was a pretty unique experience is that Sigma is almost entirely a Haskell codebase. It was the first time I got to work with the language in a professional setting, so I was eager to see how it performs in a real-world, production-grade application.

In this (rather long) post, I’ll draw on this experience and highlight Haskell’s notable features from a practical, engineering standpoint. In other words, I’ll be interested in how much does it help with solving actual problems that arise in the field of software development & maintenance.

Haskell Who?

Before we start, however, it seems necessary to clarify what “Haskell” are we actually talking about.

Granted, this may be a little surprising. From a far-away vantage point, Haskell is typically discussed as a rather uniform language, and it is often treated as synonymous with functional programming in general.

But if you look closer, that turns out to be a bit of a misrepresentation. In reality, Haskell is a complex manifold of different components, some of which can be thought as their own sublanguages. Roughly speaking, Haskell — as it’s used in the industry and in the OSS world today — should be thought of as a cake with at least the following layers:

The base Haskell language, as defined by the Haskell ‘98 and 2010 reports. At least in theory, this is the portable version of the language that any conforming compiler is supposed to accept. In practice, given the absolute monopoly of GHC, it is merely a theoretical base that needs to be significantly augmented in order to reach some level of practical usability.
A bunch of GHC extensions that are widely considered mandatory for any real-world project. Some, like TupleSections or MultiParamTypeClasses, are mostly there to fix some surprising feature gaps that would be more confusing if you had worked around them instead. Others, like GADTs or DataKinds, open up completely new avenues for type-level abstractions.
A repertoire of common third-party libraries with unique DSLs, like conduit, pipes, or lens. Unlike many “regular” packages that merely bring in some domain-specific API, these fundamental libraries shape both the deeper architecture and the surface-level look & feel of any Haskell codebase that uses them.
A selection of less common extensions which are nevertheless encountered in Haskell code with some regularity.
Template Haskell, the language for compile-time metaprogramming whose main application is probably generics.
To be clear, neither “template” nor “generics” have anything to do with the usual meanings of those terms in C++ and Java/C#/Go¹. Rather, it refers to a kind of AST-based “preprocessing” that allows Haskell code to operate on the generic structure of user-defined types: their constructors, parameters, and record fields².
Direct use of TH in application code is extremely rare, but many projects rely on libraries which utilize it behind the scenes. A great example would be Persistent, a database interface library where the ORM uses Template Haskell to construct record types from a DB schema at compile time.

There is a language in my type system

What’s striking about this ensemble of features and ideas is that most of them don’t seem to follow from the ostensible premise of the language: that it is functional, pure / referentially transparent, and non-strict / lazily evaluated. Instead, they are mostly a collection of progressively more sophisticated refinements and applications of Haskell’s type system.

This singular focus on type theory — especially in the recent years³ — is probably why many people in the wider programming world think it is necessary to grok advanced type system concepts if you even want to dabble in functional programming

That is, of course, patently untrue⁴. Some features of a strong static type system are definitely useful to have in a functional language. You can look at Elm to see how awkward things become when you deprive an FP language of its typeclasses and composition sugar.

But when the focus on type systems becomes too heavy, the concepts keep piling up and the language becomes increasingly impenetrable. Eventually, you may end up with an ecosystem where the recommended way to implement an HTTP API is to call upon half a dozen compiler extensions in order to specify it as one humongous type.

But hey, isn’t it desirable to have this kind of increased type safety?

In principle, the answer would of course be yes. However, the price we pay here is in the precious currency of complexity, and it often turns out to be way too high. When libraries, frameworks, and languages get complicated and abstract, it’s not just safety and/or productivity that can (hopefully) increase — it is also the burden on developers’ thought processes. While the exact threshold of diminishing or even negative returns is hard to pinpoint, it can definitely be reached even by the smartest and most talented teams. Add in the usual obstacles of software engineering — shifting requirements, deadlines, turnover — and you may encounter it much sooner than you think.

For some, this is a sufficient justification to basically give up on type systems altogether. And while I’d say such a knee-jerk reaction is rather excessive and unwarranted, it is at least equally harmful to letting your typing regime grow in boundless complexity. Both approaches are just too extreme to stand the test of practicality.

The legacy of bleeding edge

In other words, Haskell is hard and this does count as one of its serious problems. This conclusion isn’t exactly novel or surprising, even if some people would still argue with it.

Suppose, however, that we have somehow caused this issue to disappear completely. Let’s say that through some kind of divine intervention, it was made so that the learning curve of Haskell is no longer a problem for the majority of programmers. Maybe we found a magic lamp and — for the lack of better ideas — we wished that everyone be as proficient in applicative parsers as they are in inheritance hierarchies.

Even in this hypothetical scenario, I posit that the value proposition of Haskell would still be a tough sell.

There is this old quote from Bjarne Stroustrup (creator of C++) where he says that programming languages divide into those everyone complains about, and those that no one uses.
The first group consists of old, established technologies that managed to accrue significant complexity debt through years and decades of evolution. All the while, they’ve been adapting to the constantly shifting perspectives on what are the best industry practices. Traces of those adaptations can still be found today, sticking out like a leftover appendix or residual tail bone — or like the built-in support for XML in Java.

Languages that “no one uses”, on the other hand, haven’t yet passed the industry threshold of sufficient maturity and stability. Their ecosystems are still cutting edge, and their future is uncertain, but they sometimes champion some really compelling paradigm shifts. As long as you can bear with things that are rough around the edges, you can take advantage of their novel ideas.

Unfortunately for Haskell, it manages to combine the worst parts of both of these worlds.

On one hand, it is a surprisingly old language, clocking more than two decades of fruitful research around many innovative concepts. Yet on the other hand, it bears the signs of a fresh new technology, with relatively few production-grade libraries, scarce coverage of some domains (e.g. GUI programming), and not too many stories of commercial successes.

There are many ways to do it

Nothing shows better the problems of Haskell’s evolution over the years than the various approaches to handling strings and errors that it now has.⁵

String theory

Historically, String has been defined as a list of Characters, which is normally denoted as the [Char] type. The good thing about this representation is that many string-based algorithms can simply be written using just the list functions.

The bad thing is that Haskell lists are the so-called cons lists. They consist of the single element (called head), followed by another list of the remaining elements (called tail). This makes them roughly equivalent to what the data structures theory calls a singly-linked list — a rarely used construct that has a number of undesirable characteristics:

linear time (O(n)) for finding a specific element in the list
linear time for finding an element with a specific index in the list
linear time for insertion in the middle of the list
poor cache coherency due to scattered allocations of list nodes⁶

On top of that, keeping only a single character inside each node results in a significant waste of memory.

Given those downsides, it isn’t very surprising that virtually no serious Haskell program uses Strings for any meaningful text processing. The community-accepted replacement is the text package, whose implementation stores strings inside packed arrays, i.e. just as you would expect. As a result, Haskell has at least two main types of “strings” — or even three, since Text has both lazy and strict variants.

That’s not all, however: there is also the bytestring package. Although technically it implements generic byte buffers, its API has been pretty rich and enticing. As a result, many other packages would rather use ByteStrings directly in their interfaces than to incur the conversions to and from Text.
And just like in case of Text, separate lazy and strict variants of ByteString are also available. But unlike Text, byte strings also have Word8 and Char8 versions, where the latter is designed to handle legacy cases of ASCII-exclusive text support.

Well, I hope you kept count of all these types! I also hope you can memorize the correct way of converting between them, because it’s commonplace to see them used simultaneously. This may sometimes happen even within the same library, but it definitely occurs in application code that utilizes many different dependencies. What it usually results in are numerous occurrences of something like Text.pack . foo . Text.unpack, with conversion functions copiously sprinkled in to help win in the Type Tetris.

Errors and how to handle them

A somewhat similar issue applies to error handling. Over the years, Haskell has tried many approaches to this problem, often mixing techniques that are very rarely found in a single language, like exceptions combined with result types.

Nowadays, there is some consensus about those mistakes of the past, but the best we got is their deprecation: the current version of GHC still supports them all.

What are all those techniques? Here’s an abridged list:

the error function, terminating the program with a message (which is obviously discouraged)
the fail method of the Monad typeclass (which is now deprecated and moved to MonadFail)
the MonadError class with the associated ErrorT transformer, now deprecated in favor of…
a different MonadError class, with ExceptT as the new transformer
exceptions in the IO monad, normally raised by the standard I/O calls to signal abnormal conditions and error; however, libraries and application code are free to also throw them and use for their own error handling
the Either sum type / monad, which is essentially a type-safe version of the venerable return codes

If you really stretched the definition of error handling, I could also imagine counting Maybe/MaybeT as yet another method. But even without it, that’s half a dozen distinct approaches which you are likely to encounter in the wild in one form or another.

Implicit is better than explicit

The other kind of troublesome legacy of Haskell relates to the various design choices in the language itself. They reflect ideas straight from the time they were conceived in, which doesn’t necessarily agree with the best engineering practices as we understand them now.

Leaky modules

Take the module system, for example.

Today, it is rather uncontroversial that the purpose of splitting code into multiple submodules is to isolate it as much as possible and prevent accidental dependencies. The benefit of such isolation is better internal cohesion for each module. This can simplify testing, improve readability, foster simplicity, and reduce cognitive burden on the maintainers.

Contemporary languages help achieving this goal by making inter-module dependencies explicit. If you want to use a symbol (functions, class) from module A inside another module B, you typically have to both:

declare it public in module A
explicitly import its name in module B

The first step helps to ensure that the API of module A is limited and tractable. The second step does the same to the external dependencies of module B.

Unfortunately, Haskell requires neither of those steps. In fact, it encourages precisely the opposite of well-defined, self-contained modules, all by the virtue of its default behaviors:

the default module declaration (module Foo where ...) implicitly declares every symbol defined in the module Foo as public and importable by others
the default import statement (import Foo) brings in every public symbol from the module Foo into the global namespace of the current module

In essence, this is like putting public on each and every class or method that you’d define in a Java project, while simultaneously using nothing but wildcard (star) imports. In a very short order, you will end up with project where everything depends on everything else, and nothing can be developed in isolation.

Namespaces are apparently a bad idea

Thankfully, it is possible to avoid this pitfall by explicitly declaring both your exported and imported symbols:

-- Foo.hs --
module Foo ( foo, bar ) where

foo = ...
bar = ...
baz = ...  -- not exported

-- Bar.hs --
import Foo (foo)
-- `bar` is inaccessible here, but `foo` is available

But while this helps fighting the tangle of dependencies, it still results in cluttering the namespace of any non-trivial module with a significant number of imported symbols.

In many other languages, you can instead import the module as a whole and only refer to its members using qualified names. This is possible in Haskell as well, though it requires yet another variant of the import statement:

import qualified Data.Text as Text

duplicateWords :: Text.Text -> Text.Text
duplicateWords = Text.unwords . map (Text.unwords . replicate 2) . Text.words

What if you want both, though? In the above code, for example, the qualified name Text.Text looks a little silly, especially when it’s such a common type. It would be nice to import it directly, so that we can use it simply as Text.

Unfortunately, this is only possible when using two import statements:

import Data.Text (Text)
import qualified Data.Text as Text

duplicateWords :: Text -> Text
duplicateWords = Text.unwords . map (Text.unwords . replicate 2) . Text.words

You will find this duplication pervasive throughout Haskell codebases. Given how it affects the most important third-party packages (like text and bytestring), there have been a few proposals to improve the situation⁷, but it seems that none can go through the syntax bikeshedding phase.

Contrast this with Rust, for example, where it’s common to see imports such as this:

use std::io::{self, Read};

fn read_first_half(path: &Path) -> io::Result<String> {
    // (omitted)
}

where self conveniently stands for the module as a whole.

Wild records

Another aspect of the difficulties with keeping your namespaces in check relates to Haskell record types — its rough equivalent of structs from C and others.

When you define a record type:

data User = User { usrFirstName :: String
                 , usrLastName :: String
                 , usrEmail :: String
                 } deriving (Show)

you are declaring not one but multiple different names, and dumping them all straight into the global namespace. These names include:

the record type (here, User)
its type constructor (also User, second one above)
all of its fields (usrFirstName, usrLastName, usrEmail)

Yep, thats right. Because Haskell has no special syntax for accessing record fields, each field declaration creates an unqualified getter function. Combined with the lack of function overloading, this creates many opportunities for name collisions.

This is why in the above example, Hungarian notation is used to prevent those clashes. Despite its age and almost complete disuse in basically every other language, it is still a widely accepted practice in Haskell⁸.

Purity beats practicality

We have previously discussed the multiple ways of working with strings and handling errors in Haskell. While somewhat confusing at times, there at least appears to be an agreement in the community as to which one should generally be preferred.

This is not the case for some subtler and more abstract topics.

Haskell is, famously, a purely functional programming language. Evaluating functions, in a mathematical sense, is all a Haskell program is supposed to be doing. But the obvious problem is that such a program wouldn’t be able to do anything actually useful; there needs to be some way for it to effect the environment it runs in, if only to print the results it computed.

How to reconcile the functional purity with real-world applications is probably the most important problem that the Haskell language designers have to contend with. After a couple of decades of research and industrial use it still doesn’t have a satisfactory answer.

Yes, there is the IO monad, but it is a very blunt instrument. It offers a distinction between pure code and “effectful” code, but allows for no granularity or structure for the latter. An IO-returning function can do literally anything, while a pure function can only compute some value based on its arguments. Most code, however, is best placed somewhere between those two extremes.

How to represent different varieties of effects (filesystem, logging, network, etc.)?
How to express them as function constraints that can be verified by the compiler?
How to compose them? How to extend them?

These (and others) are still very much open questions in the Haskell community. The traditional way of dealing with them are monad transformers, but they suffer from many shortcomings⁹. More recent solutions like effects or free monads are promising, but exhibit performance issues that likely won’t be solvable without full compiler support. And even so, you can convincingly argue against those new approaches, which suggests that we may ultimately need something else entirely.

Of course, this state of affairs doesn’t really prevent anyone from writing useful applications in Haskell. “Regular” monads are still a fine choice. Indeed, even if you end up stuffing most of your code inside plain IO, it will already be a step up compared to most other languages.

Good Enough™

Incidentally, something similar could probably be said about the language as a whole.

Yes, it has numerous glaring flaws and some not-so-obvious shortcomings.
Yes, it requires disciplined coding style and attention to readability.
Yes, it will force you to courageously tackle problems that are completely unknown to programmers using other languages.
In the end, however, you will probably find it better than most alternatives.

Basically, Haskell is like pizza: even when it’s bad, it is still pretty good.

But what’s possibly the best thing about it is that you don’t even really need to adopt Haskell in order to benefit from its innovations (and avoid the legacy warts).

There is already a breed of mainstream languages that can aptly be characterized as “Haskell-lite”: heavily influenced by FP paradigms but without subscribing to them completely. The closest example in this category is of course Scala, while the newest one would be Rust.
In many aspects, they offer a great compromise that provides some important functional features while sparing you most of the teething issues that Haskell still has after almost 30 years. Functional purists may not be completely satisfied, but at least they’ll get to keep their typeclasses and monoids.

And what if you don’t want to hear about this FP nonsense at all?… Well, I’m afraid it will get harder and harder to avoid. These days, it’s evidently fine for a language to omit generics but it seems inconceivable to ditch first-class functions. Even the traditional OOP powerhouse like Java cannot do without support for anonymous (“lambda”) functions anymore. And let’s not forget all the numerous examples of monadic constructs that pervade many of the mature APIs, libraries, and languages.

So even if you, understandably, don’t really want to come to Haskell, it’s looking more and more likely that Haskell will soon come to you :)

In case of Go, I’m of course referring to a feature that’s notoriously missing from the language. ↩
For a close analogue in languages other than Haskell, you can look at the current state of procedural macros in Rust (commonly known as “custom derives”). ↩
What seems to excite the Haskell community in 2018, for example, are things like linear types and dependent types. ↩
The obvious counterexample is Clojure and its cousins in the Lisp family of languages. ↩
Although the abundance of pretty-printing libraries is high up there, too :) ↩
This can be mitigated somewhat by using a contiguous chunk of memory through a dedicated arena allocator, or implementing the list as an array. ↩
See for example this project. ↩
Some GHC extensions like DisambiguateRecordFields allow for correct type inference even in case of “overloaded” field names, though. ↩
To name a few: they don’t compose well (e.g. can only have one instance of a particular monad in the stack); they can cause some extremely tricky bugs; they don’t really cooperate with the standard library which uses IO everywhere (often requiring tricks like this). ↩

Add examples to your Rust libraries

Posted on Wed 28 February 2018 in Code • Tagged with Rust, Cargo, examples, documentation, packaging • Leave a comment

When you’re writing a library for other programs to depend on, it is paramount to think how the developers are going to use it in their code.

The best way to ensure they have a pleasant experience is to put yourself in their shoes. Forget the internal details of your package, and consider only its outward interface. Then, come up with a realistic use case and just implement it.

In other words, you should create complete, end-to-end, and (somewhat) usable example applications.

Examples are trouble

You may think this is asking a lot, and I wouldn’t really disagree here.

In most languages and programming platforms, it is indeed quite cumbersome to create example apps. This happens for at least several different reasons:

It typically requires bootstrapping an entire project from scratch. If you are lucky, you will have something like create-react-app to get you going relatively quickly. Still, you need to wire up the new project so that it depends on the source code of your library rather than its published version, and this tends to be a non-standard option — if it is available at all.
It’s unclear where should the example code live. Should you just throw it away, once it has served its immediate purpose? I’m sure this would discourage many people from creating examples in the first place. It’s certainly better to keep them in the version control, allowing their code to serve as additional documentation.

But if you intend to do this, you need to be careful not to deploy the example along with your library when you upload it to the package registry for your language. This may require maintaining an explicit blacklist and/or whitelist, in the vein of MANIFEST files in Python.
Examples may break as the library changes. Although example apps aren’t integration tests that have a clear, expected outcome, they should at the very least compile correctly.

The only way to ensure that is to include them in the build/test pipeline of your library. To accomplish this, however, you may need to complicate your CI setup, perhaps by introducing additional languages like Bash or Python.
It’s harder to maintain quality of example code. Any linters and static analyzers that you’re normally running will likely need to be configured to also apply to the examples. On the other hand, however, you probably don’t want those checkers to be too strict (it’s just example code, after all), so you may want to turn off some of the warnings, adjust the level of others, and so on.

So essentially, writing examples involves quite a lot of hassle. It would be great if the default tooling of your language helped to lessen the burden at least a little bit.

Well, good news! If you’re a Rust programmer, the language has basically got you covered.

Cargo — the standard build tool and package manager for Rust — has some dedicated features to support examples as a first-class concept. While it doesn’t completely address all the pain points outlined above, it goes a long way towards minimizing them.

What are Cargo examples?

In Cargo’s parlance, an example is nothing else but a Rust source code of a standalone executable¹ that typically resides in a single .rs file. All such files should be places in the examples/ directory, at the same level as src/ and the Cargo.toml manifest itself².

Here’s the simplest example of, ahem, an example:

// examples/hello.rs
fn main() {
    println!("Hello from an example!");
}

You can run it through the typical cargo run command; simply pass the example name after the --example flag:

$ cargo run --example hello
Hello from an example!

It is also possible to run the example with some additional arguments:

$ cargo run --example hello2 -- Alice
Hello, Alice!

which are relayed directly to the underlying binary:

// examples/hello2.rs
use std::env;

fn main() {
    let name = env::args().skip(1).next();
    println!("Hello, {}!", name.unwrap_or("world".into()));
}

As you can see, the way we run examples is very similar to how we’d run the src/bin binaries, which some people use as normal entry points to their Rust programs.

The important thing is that you don’t have to worry what to do with your example code anymore. All you need to do is drop it in the examples/ directory, and let Cargo do the rest.

Dependency included

Of course in reality, your examples will be at least a little more complicated than that. For one, they will surely call into your library to use its API, which means they need to depend on it & import its symbols.

Fortunately, this doesn’t complicate things even one bit.

The library crate itself is already an implied dependency of any code inside the examples/ directory. This is automatically handled by Cargo, so you don’t have to modify Cargo.toml (or do anything else really) to make it happen.

So without any additional effort, you can just to link to your library crate in the usual manner, i.e. by putting extern crate on top of the Rust file:

// examples/real.rs
extern crate mylib;

fn main() {
    let thing = mylib::make_a_thing();
    println!("I made a thing: {:?}", thing);
}

This goes even further, and extends to any dependency of the library itself. All such third-party crates are automatically available to the example code, which proves handy in common cases such as Tokio-based asynchronous APIs:

// example/async.rs
extern crate mylib;
extern crate tokio_core;  // assuming it's in mylib's [dependencies]

fn main() {
    let mut core = tokio_core::reactor::Core::new().unwrap();
    let thing = core.run(mylib::make_a_thing_asynchronously()).unwrap();
    println!("I made a thing: {:?}", thing);
}

More deps

Sometimes, however, it is very useful to pull in an additional package or two, just for the example code.

A typical case may involve logging.

If your library uses the usual log crate to output debug messages, you probably want to see them printed out when you run your examples. Since the log crate is just a facade, it doesn’t offer any built-in way to pipe log messages to standard output. To handle this part, you need something like the env_logger package:

// example/with_logging.rs
extern crate env_logger;
extern crate mylib;

fn main() {
    env_logger::init();
    println("{:?}", mylib::make_a_thing());
}

To be able to import env_logger like this, it natually has to be declared as a dependency in our Cargo.toml.

We won’t put it in the [dependencies] section of the manifest, however, as it’s not needed by the library code. Instead, we should place it in a separate section called [dev-dependencies]:

[dev-dependencies]
env_logger = "0.5"

Packages listed there are shared by tests, benchmarks, and — yes, examples. They are not, however, linked into regular builds of your library, so you don’t have to worry about bloating it with unnecessary code.

Growing bigger

So far, we have seen examples that span just a single Rust file. Practical applications tend to be bigger than that, so it’d be nice if we could provide some multi-file examples as well.

This is easily done, although for some reason it doesn’t seem to be mentioned in the official docs.

In any case, the approach is identical to executables inside src/bin/. Basically, if we have a single foo.rs file with executable code, we can expand it to a foo/ subdirectory with foo/main.rs as the entry point. Then, we can add whatever other submodules we want — just like we would do for a regular Rust binary crate:

// examples/multifile/main.rs
extern crate env_logger;
extern crate mylib;

mod util;

fn main() {
    env_logger::init();
    let ingredient = util::create_ingredient();
    let thing = mylib::make_a_thing_with(ingredient);
    println("{:?}", thing);
}

// examples/multifile/util.rs

pub fn create_ingredient() -> u64 {
    42
}

Of course, it won’t be often that examples this large are necessary. Showing how a library can scale to bigger applications can, however, be very encouraging to potential users.

Maintaining maintainability

Thus far, we have discussed how to create small and larger examples, how to use additional third-party crates in example programs, and how to easily build & run them using built-in Cargo commands.

All this effort spent on writing examples would be of little use if we couldn’t ensure that they work.

Like every type of code, examples are prone to breakage whenever the underlying API changes. If the library is actively developed, its interface represents a moving target. It is quite expected that changes may sometimes cause old examples to stop compiling.

Thankfully, Cargo is very dilligent in reporting such breakages. Whenever you run:

$ cargo test

all examples are built simultaneously with the execution of your regular test suite³. You get the compilation guarantee for your examples essentially for free — there is no need to even edit your .travis.yml, or to adjust your continuous integration setup in any other way!

Pretty neat, right?

This saying, you should keep in mind that simply compiling your examples on a regular basis is not a foolproof guarantee that their code never becomes outdated. Examples are not integration tests, and they won’t catch important changes in your implementation that aren’t breaking the interface.

Examples-Driven Development?

You may be wondering then, what’s exactly the point of writing examples? If you got tests on one hand to verify correctness, and documentation on the other hand to inform your users, then having a bunch of dedicated executable examples may seem superfluous.

To me, however, an impeccable test suite and amazing docs — which also remain comprehensive and awesome for an entire lifetime of the library! — sound a bit too much like a perfect world :) Adding examples to the mix can almost always improve things, and their maintenance burden should, in most cases, be very minimal.

But I have also found out that starting off with examples early on is a great way to validate the interface design.

Once the friction of creating small test programs has been eliminated, they become indispensable for prototyping new features. Wanna try out that new thing you’ve just added? Simple: just make a quick example for it, run it, and see what happens!

In many ways, doing this feels similar to trying out things in a REPL — something that’s almost exclusive to dynamic/interpreted languages. But unlike mucking around in Python shell, examples are not throwaway code: they become part of your project, and remain useful for both you & your users.

It is also possible to create examples which are themselves just libraries. I don’t think this is particularly useful, though, since all you can do with such examples is build them, so they don’t provide any additional value over normal tests (and especially doc tests). ↩
Because they are outside of the src/ directory, examples do not become a part of your library’s code, and are not deployed to crates.io. ↩
You can also run cargo build --examples to only compile the examples, without running any kind of tests. ↩

Unfolding a Stream of paginated items

Posted on Wed 24 January 2018 in Code • Tagged with Rust, Tokio, streams, HTTP • Leave a comment

My most recent Rust crate is an API client for the Path of Exile’s public stash tabs. One problem that I had to solve while writing it was to turn a sequence of paginated items (in this case, player stash tabs) into a single, asynchronous Stream.

In this post, I’ll explain how to use the Stream interface, along with functions from the futures crate, to create a single Stream from multiple batches of entities.

Pagination 101

To divide a long list of items into pages is a very common pattern in many HTTP-based APIs.

If the client requests a sequence of entities that would be too large to serve as a single response, there has to be some way to split it over multiple HTTP roundtrips. To accomplish that, API servers will often return a constant number of items at first (like 50), followed by some form of continuation token:

$ curl http://api.example.com/items
{
    "items": [
        {...},
        {...},
        {...}
    ],
    "continuationToken": "e53c68db0ee412ac239173db147a02a0"
}

Such token is preferably an opaque sequence of bytes, though sometimes it can be an explicit offset (index) into the list of results¹. Regardless of its exact nature, clients need to pass the token with their next request in order to obtain another batch of results:

$ curl 'http://api.example.com/items?after=e53c68db0ee412ac239173db147a02a0'
{
    "items": [
        {...},
        {...}
    ],
    "continuationToken": "4e3986e4c7f591b8cb17cf14addd40a6"
}

Repeat this procedure for as long as the response contains a continuation token, and you will eventually go through the entire sequence. If it’s really, really long (e.g. it’s a Twitter firehose for a popular hashtag), then you may of course hit some problems due to the sheer number of requests. For many datasets, however, this pagination scheme is absolutely sufficient while remaining relatively simple for clients to implement.

Stream it in Rust

What the client code would typically do, however, is to hide the pagination details completely and present only the final, unified sequence of items. Such abstraction is useful even for end-user applications, but it’s definitely expected from any shared library that wraps the third-party API.

Depending on your programming language of choice, this abstraction layer may be very simple to implement. Here’s how it could be done in Python, whose concepts of iterables and generators are a perfect fit for this task²:

import requests

def iter_items(after=None):
    """Yield items from an example API.
    :param after: Optional continuation token
    """
    while True:
        url = "http://api.example.com/items"
        if after is not None:
            url += "?after=%s" % after
        response = requests.get(url)
        response.raise_for_status()
        for item in response.json()['items']:
            yield item
        after = response.json().get("continuationToken")
        if after is None:
            break

# consumer
for item in iter_items():
    print(item)

In Rust, you can find their analogues in the Iterator and Stream traits, so we’re off to a pretty good start. What’s missing, however, is the equivalent of yield: something to tell the consumer “Here, have the next item!”, and then go back to the exact same place in the producer function.

This ability to jump back and forth between two (or more) functions involves having a language support for coroutines. Not many mainstream languages pass this requirement, although Python and C# would readily come to mind. In case of Rust, there have been some nightly proposals and experiments, but nothing seems to be stabilizing anytime soon.

DIY streaming

But of course, if you do want a Stream of paginated items, there is at least one straightforward solution: just implement the Stream trait directly.

This is actually quite a viable approach, very similar to rolling out a custom Iterator. Some minor differences stem mostly from a more complicated state management in Stream::poll compared to Iterator::next. While an iterator is either exhausted or not, a stream can also be waiting for the next item to “arrive” (Ok(Async::NotReady)), or have errored out permanently (Err(e)). As a consequence, the return value of Stream::poll is slightly more complex than just plain Option, but nevertheless quite manageable.

Irrespective of difficulty, writing a custom Stream from scratch would inevitably involve a lot of boilerplate. You may find it necessary in more complicated applications, of course, but for something that’s basically a glorified while loop, it doesn’t seem like a big ask to have a more concise solution.

The stream unfolds

Fortunately there is one! Its crucial element is the standalone stream::unfold function from the futures crate:

pub fn unfold<T, F, Fut, It>(init: T, f: F) -> Unfold<T, F, Fut> where
    F: FnMut(T) -> Option<Fut>,
    Fut: IntoFuture<Item = (It, T)>,

Reading through the signature of this function can be a little intimidating at first. Part of it is Rust’s verbose syntax for anything that involves both trait bounds and closures³, making stream::unfold seem more complicated than it actually is. Indeed, if you’ve ever used Iterator adapters like .filter_map or .fold, the unfold function will be pretty easy to understand. (And if you haven’t, don’t worry! It’s really quite simple :))

If you look closely, you’ll see that stream::unfold takes the following two arguments:

first one is essentially an arbitrary initial value, called a seed
second one is a closure that receives the seed and returns an optional pair of values

What are those values?… Well, the entire purpose of the unfold function is to create a Stream, and a stream should inevitably produce some items. Consequently, the first value in the returned pair will be the next item in the stream.

And what about the second value? That’s just the next state of the seed! It will be received by the very same closure when someone asks the Stream to produce its next item. By passing around a useful value — say, a continuation token — you can create something that’s effectively a while loop from the Python example above.

The last important bits about this pair of values is the wrapping.

First, it is actually a Future, allowing your stream to yield objects that it doesn’t quite have yet — for example, those which ultimately come from an HTTP response.

Secondly, its outermost layer is an Option. This enables you to terminate the stream when the underlying source is exhausted by simply returning None. Until then, however, you should return Some with the (future of) aforementioned pair of values.

Paginate! Paginate!

If you have doubts about how all those pieces of stream::unfold fit in, then looking at the usage example in the docs may give you some idea of what it enables you to do. It’s a very artificial example, though: the resulting Stream isn’t waiting for any asynchronous Futures, which is the very reason you’d use a Stream over an Iterator in the first place⁴.

We can find a more natural application for unfold if we go back to our original problem. To reiterate, we want to repeatedly query an HTTP API for a long list of items, giving our callers a Stream of such items they can process at their leisure. At the same time, all the details about pagination and handling of continuation tokens or offsets should be completely hidden from the caller.

To employ stream::unfold for this task, we need two things: the initial seed, and an appropriate closure.

I have hinted already at using the continuation token as our seed, or the state that we pass around from one closure invocation to another. What remains is mostly making the actual HTTP request and interpreting the JSON response, for which we’ll use the defacto standard Rust crates: hyper, Serde, and serde_json:

use std::error::Error;

use futures::{future, Future, stream, Stream};
use hyper::{Client, Method};
use hyper::client::Request;
use serde_json;
use tokio_core::reactor::Handle;

const URL: &str = "http://api.example.com/items";

fn items(
    handle: &Handle, after: Option<String>
) -> Box<Stream<Item=Item, Error=Box<Error>>>
{
    let client = Client::new(handle);
    Box::new(stream::unfold(after, move |cont_token| {
        let url = match cont_token {
            Some(ct) => format!("{}?after={}", URL, ct),
            None => return None,
        };
        let req = Request::new(Method::Get, url.parse().unwrap());
        Some(client.request(req).from_err().and_then(move |resp| {
            let status = resp.status();
            resp.body().concat2().from_err().and_then(move |body| {
                if status.is_success() {
                    serde_json::from_slice::<ItemsResponse>(&body)
                        .map_err(Box::<Error>::from)
                } else {
                    Err(format!("HTTP status: {}", status).into())
                }
            })
            .map(move |items_resp| {
                (stream::iter_ok(items_resp.items), items_resp.continuation_token)
            })
        }))
    })
    .flatten())
}

#[derive(Deserialize)]
struct ItemsResponse {
    items: Vec<Item>,
    #[serde(rename = "continuationToken")]
    continuation_token: Option<String>,
}

While this code may be a little challenging to decipher at first, it’s not out of line compared to how working with Futures and Streams looks like in general. In either case, you can expect a lot of .and_then callbacks :)

There is one detail here that I haven’t mentioned previously, though. It relates to the stream::iter_ok and Stream::flatten calls which you may have already noticed.
The issue with stream::unfold is that it only allows to yield an item once per closure invocation. For us, this is too limiting: a single batch response from the API will contain many such items, but we have no way of “splitting” them.

What we can do instead is to produce a Stream of entire batches of items, at least at first, and then flatten it. What Stream::flatten does here is to turn a nested Stream<Stream<Item>> into a flat Stream<Item>. The latter is what we eventually want to return, so all we need now is to create this nested stream of streams.

How? Well, that’s actually pretty easy.

We can already deserialize a Vec<Item> from the JSON response — that’s our item batch! — which is essentially an iterable of Items⁵. Another utility function from the stream module, namely stream::iter_ok, can readily turn such iterable into a “immediate” Stream. Such Stream won’t be asynchronous at all — its items will have been ready from the very beginning — but it will still conform to the Stream interface, enabling it to be flattened as we request.

But wait! There is a bug!

So in the end, is this the solution we’re looking for?…

Well, almost. First, here’s the expected usage of the function we just wrote:

let mut core = tokio_core::reactor::Core::new().unwrap();
core.run({
    let continuation_token = None;  // start from the beginning
    items(&core.handle(), continuation_token).for_each(|item| {
        println!("{:?}", item);
        Ok(())
    })
}).unwrap();

While this is more complicated than the plain for loop in Python, most of it is just Tokio boilerplate. The notable part is the invocation of items(), where we pass None as a continuation token to indicate that we want the entire sequence, right from its beginning.

And since we’re talking about fetching long sequences, we would indeed expect a lot of items. So it is probably quite surprising to hear that the stream we’ve created here will be completely empty.

…What? How?!

If you look again at the source code of items(), the direct reason should be pretty easy to find. The culprit lies in the return None branch of the first match. If we don’t pass Some(continuation_token) as a parameter to items(), this branch will be hit immediately, terminating the stream before it had a chance to produce anything.

It may not be very clear how to fix this problem. After all, the purpose of the match was to detect the end of the sequence, but it apparently prevents us from starting it in the first place!

Looking at the problem from another angle, we can see we’ve conflated two distinct states of our stream — “before it has started” and “after it’s ended” — into a single one (“no continuation token”). Since we obviously don’t want to make the after parameter mandatory — users should be able to say “Give me everything!” — we need another way of telling those two states apart.

In terms of Rust types, it seems that Option<String> is no longer sufficient for encoding all possible states of our Stream. Although we could try to fix that in some ad-hoc way (e.g. by adding another bool flag), it feels cleaner to just define a new, dedicated type. For one, this allows us to designate a name for each of the states in question, improving the readability and maintainability of our code:

enum State {
    Start(Option<String>),
    Next(String),
    End,
}

Note that we can put this definition directly inside the items() function, without cluttering the module namespace. All the relevant details of our Stream are thus nicely contained within a single function:

fn items(
    handle: &Handle, after: Option<String>
) -> Box<Stream<Item=Item, Error=Box<Error>>>
{
    // (definition of State enum can go here)

    let client = Client::new(handle);
    Box::new(stream::unfold(State::Start(after), move |state| {
        let cont_token = match state {
            State::Start(opt_ct) => opt_ct,
            State::Next(ct) => Some(ct),
            State::End => return None,
        };
        let url = match cont_token {
            Some(ct) => format!("{}?after={}", URL, ct),
            None => URL.into(),
        };
        let req = Request::new(Method::Get, url.parse().unwrap());
        Some(client.request(req).from_err().and_then(move |resp| {
            let status = resp.status();
            resp.body().concat2().from_err().and_then(move |body| {
                if status.is_success() {
                    serde_json::from_slice::<ItemsResponse>(&body)
                        .map_err(Box::<Error>::from)
                } else {
                    Err(format!("HTTP status: {}", status).into())
                }
            })
            .map(move |items_resp| {
                let next_state = match items_resp.continuation_token {
                    Some(ct) => State::Next(ct),
                    None => State::End,
                };
                (stream::iter_ok(items_resp.items), next_state)
            })
        }))
    })
    .flatten())
}

Sure, there is a little more bookkeeping required now, but at least all the items are being emitted by the Stream as intended.

You can see the complete source in the playground here.

Furthermore, the token doesn’t have to come as part of the HTTP response body. Some API providers (such as GitHub) may use the Link: header to point directly to the next URL to query. ↩
This example uses “traditional”, synchronous Python code. However, it should be easy to convert it to the asynchronous equivalent that works in Python 3.5 and above, provided you can replace requests with some async HTTP library. ↩
If you are curious whether other languages could express it better, you can check the Data.Conduit.List.unfold function from the Haskell’s conduit package. For most intents and purposes, it is their equivalent of stream::unfold. ↩
Coincidentally, you can create iterators in the very same manner through the itertools::unfold function from the itertools crate. ↩
In more technical Rust terms, it means Vec implements the IntoIterator trait, allowing anyone to get an Iterator from it. ↩

Terminating a Stream in Rust

Posted on Sat 16 December 2017 in Code • Tagged with Rust, streams, Tokio, async • Leave a comment

Here’s a little trick that may be useful in dealing with asynchronous Streams in Rust.

When you consume a Stream using the for_each method, its default behavior is to finish early should an error be produced by the stream:

use futures::prelude::*;
use futures::stream;
use tokio_core::reactor::Core;

let s = stream::iter_result(vec![Ok(1), Ok(2), Err(false), Ok(3)]);
let fut = s.for_each(|n| {
    println!("{}", n);
    Ok(())
});

In more precise terms, it means that the Future returned by for_each will resolve with the first error from the underlying stream:

// Prints 1, 2, and then panics with "false".
Core::new().unwrap().run(fut).unwrap();

For most purposes, this is perfectly alright; errors are generally meant to propagate, after all.

Certain kinds of errors, however, are better off silenced. Perhaps they are expected to pop up during normal program operation, or maybe their occurrence should merely affect program execution in a particular way, and not halt it outright. In a simple case like above, you can of course check what for_each itself has returned, but that doesn’t scale to building larger Stream pipelines.

I encountered a situation like this myself when using the hubcaps library. The code I was writing was meant to search for GitHub issues within a specific repository. In GitHub API, this is accomplished by sending a search query like repo:$OWNER/$NAME, which may result in a rather obscure HTTP error (422 Unprocessable Entity) if the given repository doesn’t actually exist. But I didn’t care about this error; should it occur, I’d simply return an empty stream, because doing so was more convenient for the larger bit of logic that was consuming it.

Unfortunately, the Stream trait offers no interface that’d target this use case. There are only a few methods that even allow to look at errors mid-stream, and even fewer that can end it prematurely. On the flip side, at least we don’t have to consider too many combinations when looking for the solution ;)

Indeed, it seems there are only two Stream methods that are worthy of our attention:

Stream::then, because it allows for a closure to receive all stream values (items and errors)
Stream::take_while, because it accepts a closure that can end the stream early (but only based on items, not errors)

Combining them both, we arrive at the following recipe:

Inside a .then closure, look for Errors that you consider non-fatal and replace them with a special item value. The natural choice for such a value is None. As a side effect, this forces us to convert the regular (“successful”) items into Some(item), effectively transforming a Stream<Item=T> into Stream<Item=Option<T>>.
Looks for the special value (i.e. None) in the .take_while closure and terminate the stream when it’s been found.
Finally, convert the wrapped items back into their original form using .map, thus giving us back a Stream of T‘s.

Applying this technique to our initial example, we get something that looks like this:

let s = stream::iter_result(vec![Ok(1), Ok(2), Err(false), Ok(3)])
    .then(|r| match r {
        Ok(r) => Ok(Some(r)),  // no-op passthrough of items
        Err(false) => Ok(None) // non-fatal error, terminate the stream
        Err(e) => Err(e),      // no-op passthrough of other errors
    })
    .take_while(|x| future::ok(x.is_some()))
    .map(Option::unwrap);

If we now try to consume this stream like before:

Core::new().run(
    s.for_each(|n| { println!("{}", n); Ok(()) })
).unwrap();

it will still end after the first two items, but without producing any errors afterwards.

For a more reusable version of the trick, you can check this gist; it adds a Stream::take_while_err method through an extension trait.

This isn’t a perfect solution, however, because it requires Boxing even on nightly Rust¹. We can fix that by introducing a dedicated TakeWhileErr stream type, similarly to what native Stream methods do. I leave that as an exercise for the reader ;-)

This is due to a limitation in the impl Trait feature which prevents it from being used as a return type of trait methods. ↩

Recap of the gisht project

Posted on Fri 24 November 2017 in Programming • Tagged with Rust, gisht, CLI, GitHub, Python, testing • Leave a comment

In this post, I want to discuss some of the experiences I had with a project that I recently finished, gisht. By “finished” I mean that I don’t anticipate developing any new major features for it, though smaller things, bug fixes, or non-code stuff, is of course still very possible.

I’m thinking this is as much “done” as most software projects can ever hope to be. Thus, it is probably the best time for a recap / summary / postmortem / etc. — something to recount the lessons learned, and assess the choices made.

Some context

The original purpose of gisht was to facilitate download & execution of GitHub gists straight from the command line:

$ gisht Xion/git-outgoing  # run the https://gist.github.com/Xion/git-outgoing gist

I initially wrote its first version in Python because I’ve accumulated a sizable number of small & useful scripts (for Git, Unix, Python, etc.) which were all posted as gists. Sure, I could download them manually to ~/bin every time I used a new machine but that’s rather cumbersome, and I’m quite lazy.

Well, lazy and impatient :) I noticed pretty fast that the speed tax of Python is basically unacceptable for a program like gisht.

What I’m referring to here is not the speed of code execution, however, but only the startup time of Python interpreter. Irrespective of the machine, operating system, or language version, it doesn’t seem to go lower than about one hundred milliseconds; empirically, it’s often 2 or 3 times higher than that. For the common case of finding a cached gist (no downloads) and doing a simple fork+exec, this startup time was very noticeable and extremely jarring. It also precluded some more sophisticated uses for gisht, like putting its invocation into the shell’s $PROMPT¹.

Speed: delivered

And so the obvious solution emerged: let’s rewrite it in Rust!…

Because if I’m executing code straight from the internet, I should at least do it in a safe language.

But jokes aside, it is obvious that a language compiling to native code is likely a good pick if you want to optimize for startup speed. So while the choice of Rust was in large part educational (gisht was one of my first projects to be written in it), it definitely hasn’t disappointed there.

Even without any intentional optimization efforts, the app still runs instantaneously. I tried to take some measurements using the time command, but it never ticked into more than 0.001s. Perceptively, it is at least on par with git, so that’s acceptable for me :)

Can’t segfault if your code doesn’t build

Achieving the performance objective wouldn’t do us much good, however, if the road to get there involved excessive penalties on productivity. Such negative impact could manifest in many ways, including troublesome debugging due to a tricky runtime², or difficulty in getting the code to compile in the first place.

If you had even a passing contact with Rust, you’d expect the latter to be much more likely than the former.

Indeed, Rust’s very design eschews runtime flexibility to a ridiculous degree (in its “safe” mode, at least), while also forcing you to absorb subtle & complex ideas to even get your code past the compiler. The reward is increased likelihood your program will behave as intended — although it’s definitely not on the level of “if it compiles, it works” that can be offered by Haskell or Idris.

But since gisht is hardly mission critical, I didn’t actually care too much about this increased reliability. I don’t think it’s likely that Rust would buy me much over something like modern C++. And if I were to really do some kind of cost-benefit analysis of several languages — rather than going with Rust simply to learn it better — then it would be hard to justify it over something like Go.

It scales

So the real question is: has Rust not hampered my productivity too much? Having the benefit of hindsight, I’m happy to say that the trade-off was definitely acceptable :)

One thing I was particularly satisfied with was the language’s scalability. What I mean here is the ability to adapt as the project grows, but also to start quickly and remain nimble while the codebase is still pretty small.

Many languages (most, perhaps) are naturally tailored towards the large end, doing their best to make it more bearable to work with big codebases. In turn, they often forget about helping projects take off in the first place. Between complicated build systems and dependency managers (Java), or a virtual lack of either (C++), it can be really hard to get going in a “serious” language like this.

On the other hand, languages like Python make it very easy to start up and achieve relatively impressive results. Some people, however, report having encountered problems once the code evolves past certain size. While I’m actually very unsympathetic to those claims, I realize perception plays a significant role here, making those anecdotal experiences into a sort of self-fulfilling prophecy.

This perception problem should almost certainly spare Rust, as it’s a natively compiled and statically typed language, with a respectable type system to boot. There is also some evidence that the language works well in large projects already. So the only question that we might want to ask is: how easy it is to actually start a project in Rust, and carry it towards some kind of MVP?

Based on my experiences with gisht, I can say that it is, in fact, quite easy. Thanks mostly to the impressive Swiss army knife of cargo — acting as both package manager and a rudimentary build system — it was almost Python-trivial to cook a “Hello World” program that does something tangible, like talk to a JSON API. From there, it only took a few coding sessions to grow it into a functioning prototype.

Abstractions galore

As part of rewriting gisht from Python to Rust, I also wanted to fix some longstanding issues that limited its capabilities.

The most important one was the hopeless coupling to GitHub and their particular flavor of gists. Sure, this is where the project even got its name from, but people use a dozen of different services to share code snippets and it should very possible to support them all.

Here’s where it became necessary to utilize the abstraction capabilities that Rust has to offer. It was somewhat obvious to define a Host trait but of course its exact form had to be shaped over numerous iterations. Along the way, it even turned out that Result<Option<T>> and Option<Result<T>> are sometimes both necessary as return types :)

Besides cleaner architecture, another neat thing about an explicit abstraction is the ability to slice a concept into smaller pieces — and then put some of them back together. While the Host trait could support a very diverse set of gist services and pastebins, many of them turned out to be just a slight variation of one central theme. Because of this similarity, it was possible to introduce a single Basic implementation which handles multiple services through varying sets of URL patterns.

Devices like these aren’t of course specific to Rust: interfaces (traits) and classes are a staple of OO languages in general. But some other techniques were more idiomatic; the concept of iterators, for example, is flexible enough to accommodate looping over GitHub user’s gists, even as they read directly from HTTP responses.

Hacking time

Not everything was sunshine and rainbows, though.

Take clap, for example. It’s mostly a very good crate for parsing command line arguments, but it couldn’t quite cope with the unusual requirements that gisht had. To make gisht Foo/bar work alongside gisht run Foo/bar, it was necessary to analyze argv before even handing it over to clap. This turned out to be surprisingly tricky to get right. Like, really tricky, with edges cases and stuff. But as it is often the case in software, the answer turned out to be yet another layer of indirection plus a copious amount of tests.

In another instance, however, a direct library support was crucial.

It so happened that hyper, the crate I’ve been using for HTTP requests, didn’t handle the Link: response header out of the box³. This was a stumbling block that prevented the gist iterator (mentioned earlier) from correctly handling pagination in the responses from GitHub API. Thankfully, having the Header abstraction in hyper meant it was possible to add the missing support in a relatively straighforward manner. Yes, it’s not a universal implementation that’d be suitable for every HTTP client, but it does the job for gisht just fine.

Test-Reluctant Development

And so the program kept growing steadily over the months, most notably through more and more gist hosts it could now support.

Eventually, some of them would fall into a sort of twilight zone. They weren’t as complicated as GitHub to warrant writing a completely new Host instance, but they also couldn’t be handled via the Basic structure alone. A good example would be sprunge.us: mostly an ordinary pastebin, except for its optional syntax highlighting which may add some “junk” to the otherwise regular URLs.

In order to handle those odd cases, I went for a classic wrapper/decorator pattern which, in its essence, boils down to something like this:

pub struct Sprunge {
    inner: Basic,
}

impl Sprunge {
    pub fn new() -> {
        Sprunge{inner: Basic::new(ID, "sprunge.us",
                                  "http://sprunge.us/${id}", ...)}
    }
}

impl Host for Sprunge {
    // override & wrap methods that require custom logic:
    fn resolve_url(&self, url: &str) -> Option<io::Result<Gist>> {
        let mut url_obj = try_opt!(Url::parse(url).ok());
        url_obj.set_query(None);
        inner.resolve_url(url_obj.to_string().as_str())
    }

    // passthrough to the `Basic` struct for others:
    fn fetch_gist(&self, gist: &Gist, mode: FetchMode) -> io::Result<()> {
        self.inner.fetch_gist(gist, mode)
    }
    // (etc.)
}

Despite the noticeable boilerplate of a few pass-through methods, I was pretty happy with this solution, at least initially. After a few more unusual hosts, however, it became cumbersome to fix all the edge cases by looking only at the final output of the inner Basic implementation. The code was evidently asking for some tests, if only to check how the inner structure is being called.

Shouldn’t be too hard, right?… Yeah, that’s what I thought, too.

The reality, unfortunately, fell very short of those expectations. Stubs, mocks, fakes — test doubles in general — are a dark and forgotten corner of Rust that almost no one seems to pay any attention to. Absent a proper library support — much less a language one — the only way forward was to roll up my sleeves and implement a fake Host from scratch.

But that was just the beginning. How do you seamlessly inject this fake implementation into the wrapper so that it replaces the Basic struct for testing? If you are not careful and go for the “obvious” solution — a trait object:

pub struct Sprunge {
    inner: Box<Host>,
}

you’ll soon realize that you need not just a Box, but at least an Rc (or maybe even Arc). Without this kind of shared ownership, you’ll lose your chance to interrogate the test double once you hand it over to the wrapper. This, in turn, will heavily limit your ability to write effective tests.

What’s the non-obvious approach, then? The full rationale would probably warrant a separate post, but the working recipe looks more or less like this:

First, parametrize the wrapper with its inner type: pub struct Sprunge<T: Host> { inner: T }.

Put that in an internal module with the correct visibility setup:

mod internal {
    pub struct Sprunge<T: Host> {
        pub(super) inner: T,
    }
}

Make the regular (“production”) version of the wrapper into an alias, giving it the type parameter that you’ve been using directly⁴:
```
pub type Sprunge = internal::Sprunge<Basic>;
```
Change the new constructor to instantiate the internal type.
In tests, create the wrapper with a fake inner object inside.

As you can see in the real example, this convoluted technique removes the need for any pointer indirection. It also permits you to access the out-of-band interface that a fake object would normally expose.

It’s a shame, though, that so much work is required for something that should be very simple. As it appears, testing is still a neglected topic in Rust.

Packing up

It wasn’t just Rust that played a notable role in the development of gisht.

Pretty soon after getting the app to a presentable state, it became clear that a mere cargo build won’t do everything that’s necessary to carry out a complete build. It could do more, admittedly, if I had the foresight to explore Cargo build scripts a little more thoroughly. But overall, I don’t regret dropping back to my trusty ol’ pick: Python.

Like in a few previous projects, I used the Invoke task runner for both the crucial and the auxiliary automation tasks. It is a relatively powerful tool — and probably the best in its class in Python that I know of — though it can be a bit capricious if you want to really fine-tune it. But it does make it much easier to organize your automation code, to reuse it between tasks, and to (ahem) invoke those tasks in a convenient manner.

In any case, it certainly beats a collection of disconnected Bash scripts ;)

What have I automated in this way, you may ask? Well, a couple of small things; those include:

embedding of the current Git commit hash into the binary, to help identify the exact revision in the logs of any potential bug reports⁵
after a successful build, replacing the Usage section in README with the program’s --help output
generating completion scripts for popular shells by invoking the binary with a magic hidden flag (courtesy of clap)

Undoubtedly the biggest task that I relegated to Python/Invoke, was the preparation of release packages. When it comes to the various Linuxes (currently Debian and Red Hat flavors), this wasn’t particularly complicated. Major thanks are due to the amazing fpm tool here, which I recommend to anyone who needs to package their software in a distro-compatible manner.

Homebrew, however — or more precisely, OS X itself — was quite a different story. Many, many failed attempts were needed to even get it to build on Travis, and the additional dependency on Python was partially to blame. To be fair, however, most of the pain was exclusively due to OpenSSL; getting that thing to build is always loads of “fun”, especially in such an opaque and poorly debuggable environment as Travis.

The wrap

There’s probably a lot of minor things and tidbits I could’ve mentioned along the way, but the story so far has most likely covered all the important topics. Let’s wrap it up then, and highlight some interesting points in the classic Yay/Meh/Nay manner.

Yay

It was definitely a good choice to rewrite gisht specifically in Rust. Besides all the advantages I’ve mentioned already, it is also worth noting that the language went through about 10 minor version bumps while I was working on this project. Of all those new releases, I don’t recall a single one that would introduce a breaking change.
Most of the Rust ecosystem (third-party libraries) was a joy to use, and very easy to get started with. Honorable mention goes to serde_json and how easy it was to transition the code from rustc_serialize that I had used at first.
With a possible exception of sucking in node.js as a huge dependency of your project and using Grunt, there is probably no better way of writing automation & support code than Python. There may eventually be some Rust-based task runners that could try to compete, but I’m not very convinced about using a compiled language for this purpose (and especially one that takes so long to build).

Meh

While the clap crate is quite configurable and pretty straightforward to use, it does lack at least one feature that’d be very nice for gisht. Additionally, working with raw clap is often a little tedious, as it doesn’t assist you in translating parsed flags into your own configuration types, and thus requires shuffling those bits manually⁶.
Being a defacto standard for continuous integration in open-source projects, Travis CI could be a little less finicky. In almost every project I decide to use it for, I end up with about half a dozen commits that frantically try to fix silly configuration issues, all before even a simple .travis.yml works as intended. Providing a way to test CI builds locally would be an obvious way to avoid this churn.

Nay

Testing in Rust is such a weird animal. On one hand, there is a first-class, out-of-the-box support for unit tests (and even integration tests) right in the toolchain. On the other hand, the relevant parts of the ecosystem are immature or lacking, as evidenced by the dreary story of mocking and stubbing. It’s no surprise that there is a long way to catch up to languages with the strongest testing culture (Java and C#/.NET⁷), but it’s disappointing to see Rust outclassed even by C++.
Getting anything to build reliably on OSX in a CI environment is already a tall order. But if it involves things as OpenSSL, then it quickly goes from bad to terrible. I’m really not amused anymore how this “Just Works” system often turns out to hardly work at all.

Since I don’t want to end on such a negative note, I feel compelled to state the obvious fact: every technology choice is a trade-off. In case of this project, however, the drawbacks were heavily outweighed by the benefits.

For this reason, I can definitely recommend the software stack I’ve just described to anyone developing non-trivial, cross-platform command line tools.

This is not an isolated complaint, by the way, as the interpreter startup time has recently emerged as an important issue to many developers of the Python language. ↩
Which may also include a practical lack thereof. ↩
It does handle it now, fortunately. ↩
Observant readers may notice that we’re exposing a technically private type (internal::Sprunge) through a publicly visible type alias. If that type was actually private, this would trigger a compiler warning which is slated to become a hard error at some point in the future. But, amusingly, we can fool the compiler by making it a public type inside a private module, which is exactly what we’re doing here. ↩
This has since been rewritten and is now done in build.rs — but that’s only because I implemented the relevant Cargo feature myself :) ↩
For an alternative approach that doesn’t seem to have this problem, check the structopt crate. ↩
Dynamically typed languages, due to their rich runtime, are basically a class of their own when it comes to testing ease, so it wouldn’t really be fair to hold them up for comparison. ↩

Currying and API design

Posted on Sun 12 November 2017 in Programming • Tagged with functional programming, currying, partial application, Haskell, API, abstraction • Leave a comment

In functional programming, currying is one of the concepts that contribute greatly to its expressive power. Its importance could be compared to something as ubiquitous as chaining method calls (foo.bar().baz()) in imperative, object-oriented languages.

Although a simple idea on the surface, it has significant consequences for the way functional APIs are designed. This post is an overview of various techniques that help utilize currying effectively when writing your functions. While the examples are written in Haskell syntax, I believe it should be useful for developers working in other functional languages, too.

The basics

Let’s start with a short recap.

Intuitively, we say that an N-argument function is curried if you can invoke it with a single argument and get back an (N-1)-argument function. Repeat this N times, and it’ll be equivalent to supplying all N arguments at once.

Here’s an example: the Data.Text module in Haskell contains the following function called splitOn:

splitOn :: Text -> Text -> [Text]
splitOn sep text = ...

It’s a fairly standard string splitting function, taking a separator as its first argument, with the second one being a string to perform the splitting on:

splitOn "," "1,2,3"  -- produces ["1", "2", "3"]

Both arguments are of type Text (Haskell strings), while the return type is [Text] — a list of strings. This add up to the signature (type) of splitOn, written above as Text -> Text -> [Text].

Like all functions in Haskell, however, splitOn is curried. We don’t have to provide it with both arguments at once; instead, we can stop at one in order to obtain another function:

splitOnComma :: Text -> [Text]
splitOnComma = splitOn ","

This new function is a partially applied version of splitOn, with its first argument (the separator) already filled in. To complete the call, all you need to do now is provide the text to split:

splitOnComma "1,2,3"  -- also produces ["1", "2", "3"]

and, unsurprisingly, you’ll get the exact same result.

Compare now the type signatures of both splitOn and splitOnComma:

splitOn :: Text -> Text -> [Text]
splitOnComma :: Text -> [Text]

It may be puzzling at first why the same arrow symbol (->) is used for what seems like two distinct meanings: the “argument separator”, and the return type indicator.

But for curried functions, both of those meanings are in fact identical!

Indeed, we can make it more explicit by defining splitOn as:

splitOn :: Text -> (Text -> [Text])

or even:

splitOn :: Text -> TypeOf splitOnComma -- (not a real Haskell syntax)

From this perspective, what splitOn actually returns is not [Text] but a function from Text to [Text] (Text -> [Text]). And conversely, a call with two arguments:

splitOn "," "1,2,3"

is instead two function calls, each taking just one argument:

(splitOn ",") "1,2,3"

This is why the -> arrow isn’t actually ambiguous: it always signifies the mapping of an argument type to a result type. And it’s always just one argument, too, because:

Currying makes all functions take only one argument.

It’s just that sometimes, what those single-argument functions return will be yet another function.

Least used arguments go first

Now that we have a firmer grasp on the idea of currying, we can see how it influences API design.

There is one thing in particular you will notice almost immediately, especially if you are coming from imperative languages that support default argument values and/or function overloading. It’s the particular order of arguments that a well designed, functional API will almost certainly follow.

See the splitOn function again:

splitOn :: Text -> Text -> [Text]
splitOn sep text = ...

It is no accident that it puts the separator as its first argument. This choice — as opposed to the alternative where text goes first — produces much more useful results when the function is applied partially through currying.

Say, for instance, that you want to splice a list of strings where the individual pieces can be comma-separated:

spliceOnComma :: [Text] -> [Text]
spliceOnComma ["1", "2,3", "4,5,6", "7"]
-- ^ This should produce ["1", "2", "3", "4", "5", "6", "7"]

Because the separator appears first in a splitOn call, you can do it easily through a direct use of currying:

spliceOnComma xs = concat $ map (splitOn ",") xs

-- or equivalently, in a terser point-free style:
-- spliceOnComma = concatMap $ splitOn ","

What we do here is apply the split to every string in the list xs (with map), followed by flattening the result — a list of lists, [[Text]] — back to a regular [Text] with concat.

If we had the alternative version of splitOn, one where the order of arguments is reversed:

splitOn' text sep = ...

we’d have no choice but to “fix it”, with either a lambda function or the flip combinator:

spliceOnComma' xs = concat $ map (\x -> splitOn' x ",") xs
spliceOnComma' xs = concat $ map (flip splitOn' ",") xs

Putting the delimiter first is simply more convenient. It is much more likely you’ll be splitting multiple strings on the same separator, as opposed to a single string and multiple separators. The argument order of splitOn is making the common use case slightly easier by moving the more “stable” parameter to the front.

This practice generalizes to all curried functions, forming a simple rule:

The more likely it is for an argument to remain constant between calls, the sooner it should appear in the function signature.

Note how this is different compared to any language where functions may take variable number of arguments. In Python, for example, the equivalent of splitOn is defined as:

str.split(text, sep)

and the implicit default value for sep is essentially “any whitespace character”. In many cases, this is exactly what we want, making the following calls possible¹:

>>> str.split("Alice has a cat")
["Alice", "has", "a", "cat"]

So, as a less-used argument, sep actually goes last in str.split, as it is often desirable to omit it altogether. Under the currying regime, however, we put it first, so that we can fix it to a chosen value and obtain a more specialized version of the function.

The fewer arguments, the better

Another thing you’d encounter in languages with flexible function definitions is the proliferation of optional arguments:

response = requests.get("http://example.com/foo",
                        params={'arg': 42},
                        data={'field': 'value'},
                        auth=('user', 'pass'),
                        headers={'User-Agent': "My Amazing App"},
                        cookies={'c_is': 'for_cookie'},
                        files={'attachment.txt': open('file.txt', 'rb')},
                        allow_redirects=False,
                        timeout=5.0)

Trying to translate this directly to a functional paradigm would result in extremely unreadable function calls — doubly so when you don’t actually need all those arguments and have to provide some canned defaults:

response <- Requests.get
    "http://example.com/foo" [('arg', 42)]
    [] Nothing [] [] [] True Nothing

What does that True mean, for example? Or what exactly does each empty list signify? It’s impossible to know just by looking at the function call alone.

Long argument lists are thus detrimental to the quality of functional APIs. It’s much harder to correctly apply the previous rule (least used arguments first) when there are so many possible permutations.

What should we do then?… In some cases, including the above example of an HTTP library, we cannot simply cut out features in the name of elegance. The necessary information needs to go somewhere, meaning we need to find at least somewhat acceptable place for it.

Fortunately, we have a couple of options that should help us with solving this problem.

Combinators / builders

Looking back at the last example in Python, we can see why the function call remains readable even if it sprouts a dozen or so additional arguments.

The obvious reason is that each one has been uniquely identified by a name.

In order to emulate some form of what’s called keyword arguments, we can split the single function call into multiple stages. Each one would then supply one piece of data, with a matching function name serving as a readability cue:

response <- sendRequest $
            withHeaders [("User-Agent", "My Amazing App")] $
            withBasicAuth "user" "pass" $
            withData [("field", "value")] $
                get "http://example.com/foo"

If we follow this approach, the caller would only invoke those intermediate functions that fit his particular use case. The API above could still offer withCookies, withFiles, or any of the other combinators, but their usage shall be completely optional.

Pretty neat, right?

Thing is, the implementation would be a little involved here. We would clearly need to carry some data between the various withFoo calls, which requires some additional data types in addition to plain functions. At minimum, we need something to represent the Request, as it is created by the get function:

get :: Text -> Request

and then “piped” through withFoo transformers like this one:

withBasicAuth :: Text -> Text -> (Request -> Request)

so that it can we can finally send it:

sendRequest :: Request -> IO Response

Such Request type needs to keep track of all the additional parameters that may have been tacked onto it:

type Request = (Text, [Param])  -- Text is the URL

data Param = Header Text Text
           | BasicAuth Text Text
           | Data [(Text, Text)]
           -- and so on

-- example
withBasicAuth user pass (url, params) =
    (url, params ++ [BasicAuth user pass])

All of a sudden, what would be a single function explodes into a collection of data types and associated combinators.

In Haskell at least, we can forgo some of the boilerplate by automatically deriving an instance of Monoid (or perhaps a Semigroup). Rather than invoking a series of combinators, clients would then build their requests through repeated mappends²:

response <- sendRequest $ get "http://example.com/foo"
                          <> header "User-Agent" "My Awesome App"
                          <> basicAuth "user" "pass"
                          <> body [("field", "value")]

This mini-DSL looks very similar to keyword arguments in Python, as well as the equivalent Builder pattern from Java, Rust, and others. What’s disappointing, however, is that it doesn’t easily beat those solutions in terms of compile-time safety. Unless you invest into some tricky type-level hacks, there is nothing to prevent the users from building invalid requests at runtime:

let reqParams = get "http://example.com/foo"
--
-- ... lots of code in between ...
--
response <- sendRequest $
            reqParams <> get "http://example.com/bar" -- woops!

Compared to a plain function (with however many arguments), we have actually lost some measure of correctness here.

Record types

In many cases, fortunately, there is another way to keep our calls both flexible and safe against runtime errors. We just need to change the representation of the input type (here, Request) into a record.

Record is simply a user-defined type that’s a collection of named fields.

Most languages (especially imperative ones: C, C++, Go, Rust, …) call those structures, and use the struct keyword to signify a record definition. In functional programming parlance, they are also referred to as product types; this is because the joint record type is a Cartesian product of its individual field types³.

Going back to our example, it shouldn’t be difficult to define a record representing an HTTP Request:

data Request = Request { reqURL :: URL
                       , reqMethod :: Method
                       , reqHeaders [(Header, Text)]
                       , reqPostData [(Text, Text)]
                       }

In fact, I suspect most programmers would naturally reach for this notation first.

Having this definition, calls to sendRequest can be rewritten to take a record instance that we construct on the spot⁴:

response <- sendRequest $
    Request { reqURL = "http://example.com/bar"
            , reqMethod = GET
            , reqHeaders = [("User-Agent", "My Awesome App")]
            , reqPostData = []
            }

Compare this snippet to the Python example from the beginning of this section. It comes remarkably close, right? The Request record and its fields can indeed work quite nicely as substitutes for keyword arguments.

But besides the readability boon of having “argument” names at the call site. we’ve also gained stronger correctness checks. For example, there is no way anymore to accidentally supply the URL field twice.

Different functions for different things

Astute readers may have noticed at least two things about the previous solutions.

First, they are not mutually incompatible. Quite the opposite, actually: they compose very neatly, allowing us to combine builder functions with the record update syntax in the final API:

response <- sendRequest $
    (get "http://example.com/baz")
    { reqHeaders = [("User-Agent", "My Awesome App")] }

This cuts out basically all the boilerplate of record-based calls, leaving only the parts that actually differ from the defaults⁵.

But on the second and more important note: we don’t seem to be talking about currying anymore. Does it mean it loses its usefulness once we go beyond certain threshold of complexity?…

Thankfully, the answer is no. While some APIs may require more advanced techniques to access the full breadth of their functionality, it is always possible to expose some carefully constructed facade that is conducive to partial application.

Consider, for example, the functionality exposed by this set of HTTP wrappers:

head :: URL -> Request
headWith :: [(Header, Text)] -> URL -> Request
get :: URL -> Request
getWith :: [(Header, Text)] -> URL -> Request
postForm :: [(Text, Text)] -> URL -> Request
postFormWith :: [(Header, Text)] -> [(Text, Text)] -> URL -> Request
toURL :: Method -> URL -> Request

Each one is obviously curry-friendly⁶. Combined, they also offer a pretty comprehensive API surface. And should they prove insufficient, you’d still have the builder pattern and/or record updates to fall back on — either for specialized one-off cases, or for writing your own wrappers.

Naturally, this technique of layered API design — with simple wrappers hiding a progressively more advanced core — isn’t limited to just functional programming. In some way, it is what good API design looks like in general. But in FP languages, it becomes especially important, because the expressive benefits of partial application are so paramount there

Fortunately, these principles seem to be followed pretty consistently, at least within the Haskell ecosystem. You can see it in the design of the http-client package, which is the real world extension of the HTTP interface outlined here. More evidently, it can be observed in any of the numerous packages the expose both a basic foo and a more customizable fooWith functions; popular examples include the async package, the zlib library, and the Text.Regex module.

It’d be more common in Python to write this as "Alice has a cat".split(), but this form would make it less obvious how the arguments are passed. ↩
A great example of this pattern can be found in the optparse-applicative package. ↩
Tuples (like (Int, String)) are also product types. They can be thought of as ad-hoc records where field indices serve as rudimentary “names”. In fact, some languages even use the dotted notation to access fields of both records/structs (x.foo) and tuples (y.0). ↩
For simplicity, I’m gonna assume the URL and Header types can be “magically” constructed from string literals through the GHC’s OverloadedStrings extension. ↩
In many languages, we can specify more formally what the “default” means for a compound-type like Request, and sometimes even derive it automatically. Examples include the Default typeclass in Haskell, the Default trait in Rust, and the default/argumentless/trivial constructors in C++ et al. ↩
Haskell programmers may especially notice how the last function is designed specifically for infix application: response <- sendRequest $ POST `toUrl` url. ↩

Older Posts