Currying and API design
Posted on Sun 12 November 2017 in Programming
In functional programming, currying is one of the concepts
that contribute greatly to its expressive power.
Its importance could be compared to something as ubiquitous
as chaining method calls (foo.bar().baz()
) in imperative, object-oriented languages.
Although a simple idea on the surface, it has significant consequences for the way functional APIs are designed. This post is an overview of various techniques that help utilize currying effectively when writing your functions. While the examples are written in Haskell syntax, I believe it should be useful for developers working in other functional languages, too.
The basics
Let’s start with a short recap.
Intuitively, we say that an N-argument function is curried if you can invoke it with a single argument and get back an (N-1)-argument function. Repeat this N times, and it’ll be equivalent to supplying all N arguments at once.
Here’s an example: the Data.Text
module in Haskell
contains the following function called splitOn
:
splitOn :: Text -> Text -> [Text]
splitOn sep text = ...
It’s a fairly standard string splitting function, taking a separator as its first argument, with the second one being a string to perform the splitting on:
splitOn "," "1,2,3" -- produces ["1", "2", "3"]
Both arguments are of type Text
(Haskell strings),
while the return type is [Text]
— a list of strings.
This add up to the signature (type) of splitOn
,
written above as Text -> Text -> [Text]
.
Like all functions in Haskell, however, splitOn
is curried.
We don’t have to provide it with both arguments at once;
instead, we can stop at one in order to obtain another function:
splitOnComma :: Text -> [Text]
splitOnComma = splitOn ","
This new function is a partially applied version of splitOn
,
with its first argument (the separator) already filled in.
To complete the call, all you need to do now is provide the text to split:
splitOnComma "1,2,3" -- also produces ["1", "2", "3"]
and, unsurprisingly, you’ll get the exact same result.
Compare now the type signatures of both splitOn
and splitOnComma
:
splitOn :: Text -> Text -> [Text]
splitOnComma :: Text -> [Text]
It may be puzzling at first why the same arrow symbol (->
) is used
for what seems like two distinct meanings: the “argument separator”,
and the return type indicator.
But for curried functions, both of those meanings are in fact identical!
Indeed, we can make it more explicit by defining splitOn
as:
splitOn :: Text -> (Text -> [Text])
or even:
splitOn :: Text -> TypeOf splitOnComma -- (not a real Haskell syntax)
From this perspective, what splitOn
actually returns is not [Text]
but a function from Text
to [Text]
(Text -> [Text]
).
And conversely, a call with two arguments:
splitOn "," "1,2,3"
is instead two function calls, each taking just one argument:
(splitOn ",") "1,2,3"
This is why the ->
arrow isn’t actually ambiguous:
it always signifies the mapping of an argument type to a result type.
And it’s always just one argument, too, because:
Currying makes all functions take only one argument.
It’s just that sometimes, what those single-argument functions return will be yet another function.
Least used arguments go first
Now that we have a firmer grasp on the idea of currying, we can see how it influences API design.
There is one thing in particular you will notice almost immediately, especially if you are coming from imperative languages that support default argument values and/or function overloading. It’s the particular order of arguments that a well designed, functional API will almost certainly follow.
See the splitOn
function again:
splitOn :: Text -> Text -> [Text]
splitOn sep text = ...
It is no accident that it puts the sep
arator as its first argument.
This choice — as opposed to the alternative where text
goes first —
produces much more useful results when the function is applied partially
through currying.
Say, for instance, that you want to splice a list of strings where the individual pieces can be comma-separated:
spliceOnComma :: [Text] -> [Text]
spliceOnComma ["1", "2,3", "4,5,6", "7"]
-- ^ This should produce ["1", "2", "3", "4", "5", "6", "7"]
Because the separator appears first in a splitOn
call,
you can do it easily through a direct use of currying:
spliceOnComma xs = concat $ map (splitOn ",") xs
-- or equivalently, in a terser point-free style:
-- spliceOnComma = concatMap $ splitOn ","
What we do here is apply the split to every string in the list xs
(with map
), followed by flattening the result — a list of lists, [[Text]]
—
back to a regular [Text]
with concat
.
If we had the alternative version of splitOn
,
one where the order of arguments is reversed:
splitOn' text sep = ...
we’d have no choice but to “fix it”, with either a lambda function
or the flip
combinator:
spliceOnComma' xs = concat $ map (\x -> splitOn' x ",") xs
spliceOnComma' xs = concat $ map (flip splitOn' ",") xs
Putting the delimiter first is simply more convenient.
It is much more likely you’ll be splitting multiple strings on the same separator,
as opposed to a single string and multiple separators.
The argument order of splitOn
is making the common use case slightly easier
by moving the more “stable” parameter to the front.
This practice generalizes to all curried functions, forming a simple rule:
The more likely it is for an argument to remain constant between calls, the sooner it should appear in the function signature.
Note how this is different compared to any language
where functions may take variable number of arguments.
In Python, for example, the equivalent of splitOn
is defined as:
str.split(text, sep)
and the implicit default value for sep
is essentially “any whitespace character”.
In many cases, this is exactly what we want,
making the following calls possible1:
>>> str.split("Alice has a cat")
["Alice", "has", "a", "cat"]
So, as a less-used argument, sep
actually goes last in str.split
,
as it is often desirable to omit it altogether.
Under the currying regime, however, we put it first,
so that we can fix it to a chosen value and obtain a more specialized version of the function.
The fewer arguments, the better
Another thing you’d encounter in languages with flexible function definitions is the proliferation of optional arguments:
response = requests.get("http://example.com/foo",
params={'arg': 42},
data={'field': 'value'},
auth=('user', 'pass'),
headers={'User-Agent': "My Amazing App"},
cookies={'c_is': 'for_cookie'},
files={'attachment.txt': open('file.txt', 'rb')},
allow_redirects=False,
timeout=5.0)
Trying to translate this directly to a functional paradigm would result in extremely unreadable function calls — doubly so when you don’t actually need all those arguments and have to provide some canned defaults:
response <- Requests.get
"http://example.com/foo" [('arg', 42)]
[] Nothing [] [] [] True Nothing
What does that True
mean, for example?
Or what exactly does each empty list signify?
It’s impossible to know just by looking at the function call alone.
Long argument lists are thus detrimental to the quality of functional APIs. It’s much harder to correctly apply the previous rule (least used arguments first) when there are so many possible permutations.
What should we do then?… In some cases, including the above example of an HTTP library, we cannot simply cut out features in the name of elegance. The necessary information needs to go somewhere, meaning we need to find at least somewhat acceptable place for it.
Fortunately, we have a couple of options that should help us with solving this problem.
Combinators / builders
Looking back at the last example in Python, we can see why the function call remains readable even if it sprouts a dozen or so additional arguments.
The obvious reason is that each one has been uniquely identified by a name.
In order to emulate some form of what’s called keyword arguments, we can split the single function call into multiple stages. Each one would then supply one piece of data, with a matching function name serving as a readability cue:
response <- sendRequest $
withHeaders [("User-Agent", "My Amazing App")] $
withBasicAuth "user" "pass" $
withData [("field", "value")] $
get "http://example.com/foo"
If we follow this approach,
the caller would only invoke those intermediate functions
that fit his particular use case.
The API above could still offer withCookies
, withFiles
,
or any of the other combinators,
but their usage shall be completely optional.
Pretty neat, right?
Thing is, the implementation would be a little involved here.
We would clearly need to carry some data between the various withFoo
calls,
which requires some additional data types in addition to plain functions.
At minimum, we need something to represent the Request
,
as it is created by the get
function:
get :: Text -> Request
and then “piped” through withFoo
transformers like this one:
withBasicAuth :: Text -> Text -> (Request -> Request)
so that it can we can finally send it:
sendRequest :: Request -> IO Response
Such Request
type needs to keep track of all the additional parameters
that may have been tacked onto it:
type Request = (Text, [Param]) -- Text is the URL
data Param = Header Text Text
| BasicAuth Text Text
| Data [(Text, Text)]
-- and so on
-- example
withBasicAuth user pass (url, params) =
(url, params ++ [BasicAuth user pass])
All of a sudden, what would be a single function explodes into a collection of data types and associated combinators.
In Haskell at least,
we can forgo some of the boilerplate by automatically deriving an instance
of Monoid
(or perhaps a Semigroup
).
Rather than invoking a series of combinators,
clients would then build their requests through repeated mappend
s2:
response <- sendRequest $ get "http://example.com/foo"
<> header "User-Agent" "My Awesome App"
<> basicAuth "user" "pass"
<> body [("field", "value")]
This mini-DSL looks very similar to keyword arguments in Python, as well as the equivalent Builder pattern from Java, Rust, and others. What’s disappointing, however, is that it doesn’t easily beat those solutions in terms of compile-time safety. Unless you invest into some tricky type-level hacks, there is nothing to prevent the users from building invalid requests at runtime:
let reqParams = get "http://example.com/foo"
--
-- ... lots of code in between ...
--
response <- sendRequest $
reqParams <> get "http://example.com/bar" -- woops!
Compared to a plain function (with however many arguments), we have actually lost some measure of correctness here.
Record types
In many cases, fortunately,
there is another way to keep our calls both flexible and safe against runtime errors.
We just need to change the representation of the input type (here, Request
)
into a record.
Record is simply a user-defined type that’s a collection of named fields.
Most languages (especially imperative ones: C, C++, Go, Rust, …) call those structures,
and use the struct
keyword to signify a record definition.
In functional programming parlance, they are also referred to as product types;
this is because the joint record type is a Cartesian product of its individual field types3.
Going back to our example,
it shouldn’t be difficult to define a record representing an HTTP Request
:
data Request = Request { reqURL :: URL
, reqMethod :: Method
, reqHeaders [(Header, Text)]
, reqPostData [(Text, Text)]
}
In fact, I suspect most programmers would naturally reach for this notation first.
Having this definition,
calls to sendRequest
can be rewritten to take a record instance
that we construct on the spot4:
response <- sendRequest $
Request { reqURL = "http://example.com/bar"
, reqMethod = GET
, reqHeaders = [("User-Agent", "My Awesome App")]
, reqPostData = []
}
Compare this snippet to the Python example from the beginning of this section.
It comes remarkably close, right?
The Request
record and its fields can indeed work quite nicely
as substitutes for keyword arguments.
But besides the readability boon of having “argument” names at the call site. we’ve also gained stronger correctness checks. For example, there is no way anymore to accidentally supply the URL field twice.
Different functions for different things
Astute readers may have noticed at least two things about the previous solutions.
First, they are not mutually incompatible. Quite the opposite, actually: they compose very neatly, allowing us to combine builder functions with the record update syntax in the final API:
response <- sendRequest $
(get "http://example.com/baz")
{ reqHeaders = [("User-Agent", "My Awesome App")] }
This cuts out basically all the boilerplate of record-based calls, leaving only the parts that actually differ from the defaults5.
But on the second and more important note: we don’t seem to be talking about currying anymore. Does it mean it loses its usefulness once we go beyond certain threshold of complexity?…
Thankfully, the answer is no. While some APIs may require more advanced techniques to access the full breadth of their functionality, it is always possible to expose some carefully constructed facade that is conducive to partial application.
Consider, for example, the functionality exposed by this set of HTTP wrappers:
head :: URL -> Request
headWith :: [(Header, Text)] -> URL -> Request
get :: URL -> Request
getWith :: [(Header, Text)] -> URL -> Request
postForm :: [(Text, Text)] -> URL -> Request
postFormWith :: [(Header, Text)] -> [(Text, Text)] -> URL -> Request
toURL :: Method -> URL -> Request
Each one is obviously curry-friendly6. Combined, they also offer a pretty comprehensive API surface. And should they prove insufficient, you’d still have the builder pattern and/or record updates to fall back on — either for specialized one-off cases, or for writing your own wrappers.
Naturally, this technique of layered API design — with simple wrappers hiding a progressively more advanced core — isn’t limited to just functional programming. In some way, it is what good API design looks like in general. But in FP languages, it becomes especially important, because the expressive benefits of partial application are so paramount there
Fortunately, these principles seem to be followed pretty consistently,
at least within the Haskell ecosystem.
You can see it in the design of
the http-client
package,
which is the real world extension of the HTTP interface outlined here.
More evidently, it can be observed in any of the numerous packages
the expose both a basic foo
and a more customizable fooWith
functions;
popular examples include
the async
package,
the zlib
library,
and the Text.Regex
module.
-
It’d be more common in Python to write this as
"Alice has a cat".split()
, but this form would make it less obvious how the arguments are passed. ↩ -
A great example of this pattern can be found in the optparse-applicative package. ↩
-
Tuples (like
(Int, String)
) are also product types. They can be thought of as ad-hoc records where field indices serve as rudimentary “names”. In fact, some languages even use the dotted notation to access fields of both records/structs (x.foo
) and tuples (y.0
). ↩ -
For simplicity, I’m gonna assume the
URL
andHeader
types can be “magically” constructed from string literals through the GHC’sOverloadedStrings
extension. ↩ -
In many languages, we can specify more formally what the “default” means for a compound-type like
Request
, and sometimes even derive it automatically. Examples include theDefault
typeclass in Haskell, theDefault
trait in Rust, and the default/argumentless/trivial constructors in C++ et al. ↩ -
Haskell programmers may especially notice how the last function is designed specifically for infix application:
response <- sendRequest $ POST `toUrl` url
. ↩