Currying and API design

Posted on Sun 12 November 2017 in Programming

In functional programming, currying is one of the concepts that contribute greatly to its expressive power. Its importance could be compared to something as ubiquitous as chaining method calls (foo.bar().baz()) in imperative, object-oriented languages.

Although a simple idea on the surface, it has significant consequences for the way functional APIs are designed. This post is an overview of various techniques that help utilize currying effectively when writing your functions. While the examples are written in Haskell syntax, I believe it should be useful for developers working in other functional languages, too.

The basics

Let’s start with a short recap.

Intuitively, we say that an N-argument function is curried if you can invoke it with a single argument and get back an (N-1)-argument function. Repeat this N times, and it’ll be equivalent to supplying all N arguments at once.

Here’s an example: the Data.Text module in Haskell contains the following function called splitOn:

splitOn :: Text -> Text -> [Text]
splitOn sep text = ...

It’s a fairly standard string splitting function, taking a separator as its first argument, with the second one being a string to perform the splitting on:

splitOn "," "1,2,3"  -- produces ["1", "2", "3"]

Both arguments are of type Text (Haskell strings), while the return type is [Text] — a list of strings. This add up to the signature (type) of splitOn, written above as Text -> Text -> [Text].

Like all functions in Haskell, however, splitOn is curried. We don’t have to provide it with both arguments at once; instead, we can stop at one in order to obtain another function:

splitOnComma :: Text -> [Text]
splitOnComma = splitOn ","

This new function is a partially applied version of splitOn, with its first argument (the separator) already filled in. To complete the call, all you need to do now is provide the text to split:

splitOnComma "1,2,3"  -- also produces ["1", "2", "3"]

and, unsurprisingly, you’ll get the exact same result.

Compare now the type signatures of both splitOn and splitOnComma:

splitOn :: Text -> Text -> [Text]
splitOnComma :: Text -> [Text]

It may be puzzling at first why the same arrow symbol (->) is used for what seems like two distinct meanings: the “argument separator”, and the return type indicator.

But for curried functions, both of those meanings are in fact identical!

Indeed, we can make it more explicit by defining splitOn as:

splitOn :: Text -> (Text -> [Text])

or even:

splitOn :: Text -> TypeOf splitOnComma -- (not a real Haskell syntax)

From this perspective, what splitOn actually returns is not [Text] but a function from Text to [Text] (Text -> [Text]). And conversely, a call with two arguments:

splitOn "," "1,2,3"

is instead two function calls, each taking just one argument:

(splitOn ",") "1,2,3"

This is why the -> arrow isn’t actually ambiguous: it always signifies the mapping of an argument type to a result type. And it’s always just one argument, too, because:

Currying makes all functions take only one argument.

It’s just that sometimes, what those single-argument functions return will be yet another function.

Least used arguments go first

Now that we have a firmer grasp on the idea of currying, we can see how it influences API design.

There is one thing in particular you will notice almost immediately, especially if you are coming from imperative languages that support default argument values and/or function overloading. It’s the particular order of arguments that a well designed, functional API will almost certainly follow.

See the splitOn function again:

splitOn :: Text -> Text -> [Text]
splitOn sep text = ...

It is no accident that it puts the separator as its first argument. This choice — as opposed to the alternative where text goes first — produces much more useful results when the function is applied partially through currying.

Say, for instance, that you want to splice a list of strings where the individual pieces can be comma-separated:

spliceOnComma :: [Text] -> [Text]
spliceOnComma ["1", "2,3", "4,5,6", "7"]
-- ^ This should produce ["1", "2", "3", "4", "5", "6", "7"]

Because the separator appears first in a splitOn call, you can do it easily through a direct use of currying:

spliceOnComma xs = concat $ map (splitOn ",") xs

-- or equivalently, in a terser point-free style:
-- spliceOnComma = concatMap $ splitOn ","

What we do here is apply the split to every string in the list xs (with map), followed by flattening the result — a list of lists, [[Text]] — back to a regular [Text] with concat.

If we had the alternative version of splitOn, one where the order of arguments is reversed:

splitOn' text sep = ...

we’d have no choice but to “fix it”, with either a lambda function or the flip combinator:

spliceOnComma' xs = concat $ map (\x -> splitOn' x ",") xs
spliceOnComma' xs = concat $ map (flip splitOn' ",") xs

Putting the delimiter first is simply more convenient. It is much more likely you’ll be splitting multiple strings on the same separator, as opposed to a single string and multiple separators. The argument order of splitOn is making the common use case slightly easier by moving the more “stable” parameter to the front.

This practice generalizes to all curried functions, forming a simple rule:

The more likely it is for an argument to remain constant between calls, the sooner it should appear in the function signature.

Note how this is different compared to any language where functions may take variable number of arguments. In Python, for example, the equivalent of splitOn is defined as:

str.split(text, sep)

and the implicit default value for sep is essentially “any whitespace character”. In many cases, this is exactly what we want, making the following calls possible¹:

>>> str.split("Alice has a cat")
["Alice", "has", "a", "cat"]

So, as a less-used argument, sep actually goes last in str.split, as it is often desirable to omit it altogether. Under the currying regime, however, we put it first, so that we can fix it to a chosen value and obtain a more specialized version of the function.

The fewer arguments, the better

Another thing you’d encounter in languages with flexible function definitions is the proliferation of optional arguments:

response = requests.get("http://example.com/foo",
                        params={'arg': 42},
                        data={'field': 'value'},
                        auth=('user', 'pass'),
                        headers={'User-Agent': "My Amazing App"},
                        cookies={'c_is': 'for_cookie'},
                        files={'attachment.txt': open('file.txt', 'rb')},
                        allow_redirects=False,
                        timeout=5.0)

Trying to translate this directly to a functional paradigm would result in extremely unreadable function calls — doubly so when you don’t actually need all those arguments and have to provide some canned defaults:

response <- Requests.get
    "http://example.com/foo" [('arg', 42)]
    [] Nothing [] [] [] True Nothing

What does that True mean, for example? Or what exactly does each empty list signify? It’s impossible to know just by looking at the function call alone.

Long argument lists are thus detrimental to the quality of functional APIs. It’s much harder to correctly apply the previous rule (least used arguments first) when there are so many possible permutations.

What should we do then?… In some cases, including the above example of an HTTP library, we cannot simply cut out features in the name of elegance. The necessary information needs to go somewhere, meaning we need to find at least somewhat acceptable place for it.

Fortunately, we have a couple of options that should help us with solving this problem.

Combinators / builders

Looking back at the last example in Python, we can see why the function call remains readable even if it sprouts a dozen or so additional arguments.

The obvious reason is that each one has been uniquely identified by a name.

In order to emulate some form of what’s called keyword arguments, we can split the single function call into multiple stages. Each one would then supply one piece of data, with a matching function name serving as a readability cue:

response <- sendRequest $
            withHeaders [("User-Agent", "My Amazing App")] $
            withBasicAuth "user" "pass" $
            withData [("field", "value")] $
                get "http://example.com/foo"

If we follow this approach, the caller would only invoke those intermediate functions that fit his particular use case. The API above could still offer withCookies, withFiles, or any of the other combinators, but their usage shall be completely optional.

Pretty neat, right?

Thing is, the implementation would be a little involved here. We would clearly need to carry some data between the various withFoo calls, which requires some additional data types in addition to plain functions. At minimum, we need something to represent the Request, as it is created by the get function:

get :: Text -> Request

and then “piped” through withFoo transformers like this one:

withBasicAuth :: Text -> Text -> (Request -> Request)

so that it can we can finally send it:

sendRequest :: Request -> IO Response

Such Request type needs to keep track of all the additional parameters that may have been tacked onto it:

type Request = (Text, [Param])  -- Text is the URL

data Param = Header Text Text
           | BasicAuth Text Text
           | Data [(Text, Text)]
           -- and so on

-- example
withBasicAuth user pass (url, params) =
    (url, params ++ [BasicAuth user pass])

All of a sudden, what would be a single function explodes into a collection of data types and associated combinators.

In Haskell at least, we can forgo some of the boilerplate by automatically deriving an instance of Monoid (or perhaps a Semigroup). Rather than invoking a series of combinators, clients would then build their requests through repeated mappends²:

response <- sendRequest $ get "http://example.com/foo"
                          <> header "User-Agent" "My Awesome App"
                          <> basicAuth "user" "pass"
                          <> body [("field", "value")]

This mini-DSL looks very similar to keyword arguments in Python, as well as the equivalent Builder pattern from Java, Rust, and others. What’s disappointing, however, is that it doesn’t easily beat those solutions in terms of compile-time safety. Unless you invest into some tricky type-level hacks, there is nothing to prevent the users from building invalid requests at runtime:

let reqParams = get "http://example.com/foo"
--
-- ... lots of code in between ...
--
response <- sendRequest $
            reqParams <> get "http://example.com/bar" -- woops!

Compared to a plain function (with however many arguments), we have actually lost some measure of correctness here.

Record types

In many cases, fortunately, there is another way to keep our calls both flexible and safe against runtime errors. We just need to change the representation of the input type (here, Request) into a record.

Record is simply a user-defined type that’s a collection of named fields.

Most languages (especially imperative ones: C, C++, Go, Rust, …) call those structures, and use the struct keyword to signify a record definition. In functional programming parlance, they are also referred to as product types; this is because the joint record type is a Cartesian product of its individual field types³.

Going back to our example, it shouldn’t be difficult to define a record representing an HTTP Request:

data Request = Request { reqURL :: URL
                       , reqMethod :: Method
                       , reqHeaders [(Header, Text)]
                       , reqPostData [(Text, Text)]
                       }

In fact, I suspect most programmers would naturally reach for this notation first.

Having this definition, calls to sendRequest can be rewritten to take a record instance that we construct on the spot⁴:

response <- sendRequest $
    Request { reqURL = "http://example.com/bar"
            , reqMethod = GET
            , reqHeaders = [("User-Agent", "My Awesome App")]
            , reqPostData = []
            }

Compare this snippet to the Python example from the beginning of this section. It comes remarkably close, right? The Request record and its fields can indeed work quite nicely as substitutes for keyword arguments.

But besides the readability boon of having “argument” names at the call site. we’ve also gained stronger correctness checks. For example, there is no way anymore to accidentally supply the URL field twice.

Different functions for different things

Astute readers may have noticed at least two things about the previous solutions.

First, they are not mutually incompatible. Quite the opposite, actually: they compose very neatly, allowing us to combine builder functions with the record update syntax in the final API:

response <- sendRequest $
    (get "http://example.com/baz")
    { reqHeaders = [("User-Agent", "My Awesome App")] }

This cuts out basically all the boilerplate of record-based calls, leaving only the parts that actually differ from the defaults⁵.

But on the second and more important note: we don’t seem to be talking about currying anymore. Does it mean it loses its usefulness once we go beyond certain threshold of complexity?…

Thankfully, the answer is no. While some APIs may require more advanced techniques to access the full breadth of their functionality, it is always possible to expose some carefully constructed facade that is conducive to partial application.

Consider, for example, the functionality exposed by this set of HTTP wrappers:

head :: URL -> Request
headWith :: [(Header, Text)] -> URL -> Request
get :: URL -> Request
getWith :: [(Header, Text)] -> URL -> Request
postForm :: [(Text, Text)] -> URL -> Request
postFormWith :: [(Header, Text)] -> [(Text, Text)] -> URL -> Request
toURL :: Method -> URL -> Request

Each one is obviously curry-friendly⁶. Combined, they also offer a pretty comprehensive API surface. And should they prove insufficient, you’d still have the builder pattern and/or record updates to fall back on — either for specialized one-off cases, or for writing your own wrappers.

Naturally, this technique of layered API design — with simple wrappers hiding a progressively more advanced core — isn’t limited to just functional programming. In some way, it is what good API design looks like in general. But in FP languages, it becomes especially important, because the expressive benefits of partial application are so paramount there

Fortunately, these principles seem to be followed pretty consistently, at least within the Haskell ecosystem. You can see it in the design of the http-client package, which is the real world extension of the HTTP interface outlined here. More evidently, it can be observed in any of the numerous packages the expose both a basic foo and a more customizable fooWith functions; popular examples include the async package, the zlib library, and the Text.Regex module.

It’d be more common in Python to write this as "Alice has a cat".split(), but this form would make it less obvious how the arguments are passed. ↩
A great example of this pattern can be found in the optparse-applicative package. ↩
Tuples (like (Int, String)) are also product types. They can be thought of as ad-hoc records where field indices serve as rudimentary “names”. In fact, some languages even use the dotted notation to access fields of both records/structs (x.foo) and tuples (y.0). ↩
For simplicity, I’m gonna assume the URL and Header types can be “magically” constructed from string literals through the GHC’s OverloadedStrings extension. ↩
In many languages, we can specify more formally what the “default” means for a compound-type like Request, and sometimes even derive it automatically. Examples include the Default typeclass in Haskell, the Default trait in Rust, and the default/argumentless/trivial constructors in C++ et al. ↩
Haskell programmers may especially notice how the last function is designed specifically for infix application: response <- sendRequest $ POST `toUrl` url. ↩