A Haskell retrospective

Posted on Sat 18 August 2018 in Programming • Tagged with Haskell, functional programming, type systems, FacebookLeave a comment

Approximately a year ago, I had the opportunity to work on Sigma — a large, distributed system that protects Facebook users from spam and other kinds of abuse.

One reason it was a pretty unique experience is that Sigma is almost entirely a Haskell codebase. It was the first time I got to work with the language in a professional setting, so I was eager to see how it performs in a real-world, production-grade application.

In this (rather long) post, I’ll draw on this experience and highlight Haskell’s notable features from a practical, engineering standpoint. In other words, I’ll be interested in how much does it help with solving actual problems that arise in the field of software development & maintenance.

Haskell Who?

Before we start, however, it seems necessary to clarify what “Haskell” are we actually talking about.

Granted, this may be a little surprising. From a far-away vantage point, Haskell is typically discussed as a rather uniform language, and it is often treated as synonymous with functional programming in general.

But if you look closer, that turns out to be a bit of a misrepresentation. In reality, Haskell is a complex manifold of different components, some of which can be thought as their own sublanguages. Roughly speaking, Haskell — as it’s used in the industry and in the OSS world today — should be thought of as a cake with at least the following layers:

  • The base Haskell language, as defined by the Haskell ‘98 and 2010 reports. At least in theory, this is the portable version of the language that any conforming compiler is supposed to accept. In practice, given the absolute monopoly of GHC, it is merely a theoretical base that needs to be significantly augmented in order to reach some level of practical usability.

  • A bunch of GHC extensions that are widely considered mandatory for any real-world project. Some, like TupleSections or MultiParamTypeClasses, are mostly there to fix some surprising feature gaps that would be more confusing if you had worked around them instead. Others, like GADTs or DataKinds, open up completely new avenues for type-level abstractions.

  • A repertoire of common third-party libraries with unique DSLs, like conduit, pipes, or lens. Unlike many “regular” packages that merely bring in some domain-specific API, these fundamental libraries shape both the deeper architecture and the surface-level look & feel of any Haskell codebase that uses them.

  • A selection of less common extensions which are nevertheless encountered in Haskell code with some regularity.

  • Template Haskell, the language for compile-time metaprogramming whose main application is probably generics.
    To be clear, neither “template” nor “generics” have anything to do with the usual meanings of those terms in C++ and Java/C#/Go1. Rather, it refers to a kind of AST-based “preprocessing” that allows Haskell code to operate on the generic structure of user-defined types: their constructors, parameters, and record fields2.
    Direct use of TH in application code is extremely rare, but many projects rely on libraries which utilize it behind the scenes. A great example would be Persistent, a database interface library where the ORM uses Template Haskell to construct record types from a DB schema at compile time.

There is a language in my type system

What’s striking about this ensemble of features and ideas is that most of them don’t seem to follow from the ostensible premise of the language: that it is functional, pure / referentially transparent, and non-strict / lazily evaluated. Instead, they are mostly a collection of progressively more sophisticated refinements and applications of Haskell’s type system.

This singular focus on type theory — especially in the recent years3 — is probably why many people in the wider programming world think it is necessary to grok advanced type system concepts if you even want to dabble in functional programming

That is, of course, patently untrue4. Some features of a strong static type system are definitely useful to have in a functional language. You can look at Elm to see how awkward things become when you deprive an FP language of its typeclasses and composition sugar.

But when the focus on type systems becomes too heavy, the concepts keep piling up and the language becomes increasingly impenetrable. Eventually, you may end up with an ecosystem where the recommended way to implement an HTTP API is to call upon half a dozen compiler extensions in order to specify it as one humongous type.

But hey, isn’t it desirable to have this kind of increased type safety?

In principle, the answer would of course be yes. However, the price we pay here is in the precious currency of complexity, and it often turns out to be way too high. When libraries, frameworks, and languages get complicated and abstract, it’s not just safety and/or productivity that can (hopefully) increase — it is also the burden on developers’ thought processes. While the exact threshold of diminishing or even negative returns is hard to pinpoint, it can definitely be reached even by the smartest and most talented teams. Add in the usual obstacles of software engineering — shifting requirements, deadlines, turnover — and you may encounter it much sooner than you think.

For some, this is a sufficient justification to basically give up on type systems altogether. And while I’d say such a knee-jerk reaction is rather excessive and unwarranted, it is at least equally harmful to letting your typing regime grow in boundless complexity. Both approaches are just too extreme to stand the test of practicality.

The legacy of bleeding edge

In other words, Haskell is hard and this does count as one of its serious problems. This conclusion isn’t exactly novel or surprising, even if some people would still argue with it.

Suppose, however, that we have somehow caused this issue to disappear completely. Let’s say that through some kind of divine intervention, it was made so that the learning curve of Haskell is no longer a problem for the majority of programmers. Maybe we found a magic lamp and — for the lack of better ideas — we wished that everyone be as proficient in applicative parsers as they are in inheritance hierarchies.

Even in this hypothetical scenario, I posit that the value proposition of Haskell would still be a tough sell.

There is this old quote from Bjarne Stroustrup (creator of C++) where he says that programming languages divide into those everyone complains about, and those that no one uses.
The first group consists of old, established technologies that managed to accrue significant complexity debt through years and decades of evolution. All the while, they’ve been adapting to the constantly shifting perspectives on what are the best industry practices. Traces of those adaptations can still be found today, sticking out like a leftover appendix or residual tail bone — or like the built-in support for XML in Java.

Languages that “no one uses”, on the other hand, haven’t yet passed the industry threshold of sufficient maturity and stability. Their ecosystems are still cutting edge, and their future is uncertain, but they sometimes champion some really compelling paradigm shifts. As long as you can bear with things that are rough around the edges, you can take advantage of their novel ideas.

Unfortunately for Haskell, it manages to combine the worst parts of both of these worlds.

On one hand, it is a surprisingly old language, clocking more than two decades of fruitful research around many innovative concepts. Yet on the other hand, it bears the signs of a fresh new technology, with relatively few production-grade libraries, scarce coverage of some domains (e.g. GUI programming), and not too many stories of commercial successes.

There are many ways to do it

Nothing shows better the problems of Haskell’s evolution over the years than the various approaches to handling strings and errors that it now has.5

String theory

Historically, String has been defined as a list of Characters, which is normally denoted as the [Char] type. The good thing about this representation is that many string-based algorithms can simply be written using just the list functions.

The bad thing is that Haskell lists are the so-called cons lists. They consist of the single element (called head), followed by another list of the remaining elements (called tail). This makes them roughly equivalent to what the data structures theory calls a singly-linked list — a rarely used construct that has a number of undesirable characteristics:

  • linear time (O(n)) for finding a specific element in the list
  • linear time for finding an element with a specific index in the list
  • linear time for insertion in the middle of the list
  • poor cache coherency due to scattered allocations of list nodes6

On top of that, keeping only a single character inside each node results in a significant waste of memory.

Given those downsides, it isn’t very surprising that virtually no serious Haskell program uses Strings for any meaningful text processing. The community-accepted replacement is the text package, whose implementation stores strings inside packed arrays, i.e. just as you would expect. As a result, Haskell has at least two main types of “strings” — or even three, since Text has both lazy and strict variants.

That’s not all, however: there is also the bytestring package. Although technically it implements generic byte buffers, its API has been pretty rich and enticing. As a result, many other packages would rather use ByteStrings directly in their interfaces than to incur the conversions to and from Text.
And just like in case of Text, separate lazy and strict variants of ByteString are also available. But unlike Text, byte strings also have Word8 and Char8 versions, where the latter is designed to handle legacy cases of ASCII-exclusive text support.

Well, I hope you kept count of all these types! I also hope you can memorize the correct way of converting between them, because it’s commonplace to see them used simultaneously. This may sometimes happen even within the same library, but it definitely occurs in application code that utilizes many different dependencies. What it usually results in are numerous occurrences of something like Text.pack . foo . Text.unpack, with conversion functions copiously sprinkled in to help win in the Type Tetris.

Errors and how to handle them

A somewhat similar issue applies to error handling. Over the years, Haskell has tried many approaches to this problem, often mixing techniques that are very rarely found in a single language, like exceptions combined with result types.

Nowadays, there is some consensus about those mistakes of the past, but the best we got is their deprecation: the current version of GHC still supports them all.

What are all those techniques? Here’s an abridged list:

  • the error function, terminating the program with a message (which is obviously discouraged)
  • the fail method of the Monad typeclass (which is now deprecated and moved to MonadFail)
  • the MonadError class with the associated ErrorT transformer, now deprecated in favor of…
  • a different MonadError class, with ExceptT as the new transformer
  • exceptions in the IO monad, normally raised by the standard I/O calls to signal abnormal conditions and error; however, libraries and application code are free to also throw them and use for their own error handling
  • the Either sum type / monad, which is essentially a type-safe version of the venerable return codes

If you really stretched the definition of error handling, I could also imagine counting Maybe/MaybeT as yet another method. But even without it, that’s half a dozen distinct approaches which you are likely to encounter in the wild in one form or another.

Implicit is better than explicit

The other kind of troublesome legacy of Haskell relates to the various design choices in the language itself. They reflect ideas straight from the time they were conceived in, which doesn’t necessarily agree with the best engineering practices as we understand them now.

Leaky modules

Take the module system, for example.

Today, it is rather uncontroversial that the purpose of splitting code into multiple submodules is to isolate it as much as possible and prevent accidental dependencies. The benefit of such isolation is better internal cohesion for each module. This can simplify testing, improve readability, foster simplicity, and reduce cognitive burden on the maintainers.

Contemporary languages help achieving this goal by making inter-module dependencies explicit. If you want to use a symbol (functions, class) from module A inside another module B, you typically have to both:

  • declare it public in module A
  • explicitly import its name in module B

The first step helps to ensure that the API of module A is limited and tractable. The second step does the same to the external dependencies of module B.

Unfortunately, Haskell requires neither of those steps. In fact, it encourages precisely the opposite of well-defined, self-contained modules, all by the virtue of its default behaviors:

  • the default module declaration (module Foo where ...) implicitly declares every symbol defined in the module Foo as public and importable by others
  • the default import statement (import Foo) brings in every public symbol from the module Foo into the global namespace of the current module

In essence, this is like putting public on each and every class or method that you’d define in a Java project, while simultaneously using nothing but wildcard (star) imports. In a very short order, you will end up with project where everything depends on everything else, and nothing can be developed in isolation.

Namespaces are apparently a bad idea

Thankfully, it is possible to avoid this pitfall by explicitly declaring both your exported and imported symbols:

-- Foo.hs --
module Foo ( foo, bar ) where

foo = ...
bar = ...
baz = ...  -- not exported

-- Bar.hs --
import Foo (foo)
-- `bar` is inaccessible here, but `foo` is available

But while this helps fighting the tangle of dependencies, it still results in cluttering the namespace of any non-trivial module with a significant number of imported symbols.

In many other languages, you can instead import the module as a whole and only refer to its members using qualified names. This is possible in Haskell as well, though it requires yet another variant of the import statement:

import qualified Data.Text as Text

duplicateWords :: Text.Text -> Text.Text
duplicateWords = Text.unwords . map (Text.unwords . replicate 2) . Text.words

What if you want both, though? In the above code, for example, the qualified name Text.Text looks a little silly, especially when it’s such a common type. It would be nice to import it directly, so that we can use it simply as Text.

Unfortunately, this is only possible when using two import statements:

import Data.Text (Text)
import qualified Data.Text as Text

duplicateWords :: Text -> Text
duplicateWords = Text.unwords . map (Text.unwords . replicate 2) . Text.words

You will find this duplication pervasive throughout Haskell codebases. Given how it affects the most important third-party packages (like text and bytestring), there have been a few proposals to improve the situation7, but it seems that none can go through the syntax bikeshedding phase.

Contrast this with Rust, for example, where it’s common to see imports such as this:

use std::io::{self, Read};

fn read_first_half(path: &Path) -> io::Result<String> {
    // (omitted)

where self conveniently stands for the module as a whole.

Wild records

Another aspect of the difficulties with keeping your namespaces in check relates to Haskell record types — its rough equivalent of structs from C and others.

When you define a record type:

data User = User { usrFirstName :: String
                 , usrLastName :: String
                 , usrEmail :: String
                 } deriving (Show)

you are declaring not one but multiple different names, and dumping them all straight into the global namespace. These names include:

  • the record type (here, User)
  • its type constructor (also User, second one above)
  • all of its fields (usrFirstName, usrLastName, usrEmail)

Yep, thats right. Because Haskell has no special syntax for accessing record fields, each field declaration creates an unqualified getter function. Combined with the lack of function overloading, this creates many opportunities for name collisions.

This is why in the above example, Hungarian notation is used to prevent those clashes. Despite its age and almost complete disuse in basically every other language, it is still a widely accepted practice in Haskell8.

Purity beats practicality

We have previously discussed the multiple ways of working with strings and handling errors in Haskell. While somewhat confusing at times, there at least appears to be an agreement in the community as to which one should generally be preferred.

This is not the case for some subtler and more abstract topics.

Haskell is, famously, a purely functional programming language. Evaluating functions, in a mathematical sense, is all a Haskell program is supposed to be doing. But the obvious problem is that such a program wouldn’t be able to do anything actually useful; there needs to be some way for it to effect the environment it runs in, if only to print the results it computed.

How to reconcile the functional purity with real-world applications is probably the most important problem that the Haskell language designers have to contend with. After a couple of decades of research and industrial use it still doesn’t have a satisfactory answer.

Yes, there is the IO monad, but it is a very blunt instrument. It offers a distinction between pure code and “effectful” code, but allows for no granularity or structure for the latter. An IO-returning function can do literally anything, while a pure function can only compute some value based on its arguments. Most code, however, is best placed somewhere between those two extremes.

How to represent different varieties of effects (filesystem, logging, network, etc.)?
How to express them as function constraints that can be verified by the compiler?
How to compose them? How to extend them?

These (and others) are still very much open questions in the Haskell community. The traditional way of dealing with them are monad transformers, but they suffer from many shortcomings9. More recent solutions like effects or free monads are promising, but exhibit performance issues that likely won’t be solvable without full compiler support. And even so, you can convincingly argue against those new approaches, which suggests that we may ultimately need something else entirely.

Of course, this state of affairs doesn’t really prevent anyone from writing useful applications in Haskell. “Regular” monads are still a fine choice. Indeed, even if you end up stuffing most of your code inside plain IO, it will already be a step up compared to most other languages.

Good Enoughâ„¢

Incidentally, something similar could probably be said about the language as a whole.

Yes, it has numerous glaring flaws and some not-so-obvious shortcomings.
Yes, it requires disciplined coding style and attention to readability.
Yes, it will force you to courageously tackle problems that are completely unknown to programmers using other languages.
In the end, however, you will probably find it better than most alternatives.

Basically, Haskell is like pizza: even when it’s bad, it is still pretty good.

But what’s possibly the best thing about it is that you don’t even really need to adopt Haskell in order to benefit from its innovations (and avoid the legacy warts).

There is already a breed of mainstream languages that can aptly be characterized as “Haskell-lite”: heavily influenced by FP paradigms but without subscribing to them completely. The closest example in this category is of course Scala, while the newest one would be Rust.
In many aspects, they offer a great compromise that provides some important functional features while sparing you most of the teething issues that Haskell still has after almost 30 years. Functional purists may not be completely satisfied, but at least they’ll get to keep their typeclasses and monoids.

And what if you don’t want to hear about this FP nonsense at all?… Well, I’m afraid it will get harder and harder to avoid. These days, it’s evidently fine for a language to omit generics but it seems inconceivable to ditch first-class functions. Even the traditional OOP powerhouse like Java cannot do without support for anonymous (“lambda”) functions anymore. And let’s not forget all the numerous examples of monadic constructs that pervade many of the mature APIs, libraries, and languages.

So even if you, understandably, don’t really want to come to Haskell, it’s looking more and more likely that Haskell will soon come to you :)

  1. In case of Go, I’m of course referring to a feature that’s notoriously missing from the language. 

  2. For a close analogue in languages other than Haskell, you can look at the current state of procedural macros in Rust (commonly known as “custom derives”). 

  3. What seems to excite the Haskell community in 2018, for example, are things like linear types and dependent types

  4. The obvious counterexample is Clojure and its cousins in the Lisp family of languages. 

  5. Although the abundance of pretty-printing libraries is high up there, too :) 

  6. This can be mitigated somewhat by using a contiguous chunk of memory through a dedicated arena allocator, or implementing the list as an array. 

  7. See for example this project

  8. Some GHC extensions like DisambiguateRecordFields allow for correct type inference even in case of “overloaded” field names, though. 

  9. To name a few: they don’t compose well (e.g. can only have one instance of a particular monad in the stack); they can cause some extremely tricky bugs; they don’t really cooperate with the standard library which uses IO everywhere (often requiring tricks like this). 

Continue reading

Currying and API design

Posted on Sun 12 November 2017 in Programming • Tagged with functional programming, currying, partial application, Haskell, API, abstractionLeave a comment

In functional programming, currying is one of the concepts that contribute greatly to its expressive power. Its importance could be compared to something as ubiquitous as chaining method calls (foo.bar().baz()) in imperative, object-oriented languages.

Although a simple idea on the surface, it has significant consequences for the way functional APIs are designed. This post is an overview of various techniques that help utilize currying effectively when writing your functions. While the examples are written in Haskell syntax, I believe it should be useful for developers working in other functional languages, too.

The basics

Let’s start with a short recap.

Intuitively, we say that an N-argument function is curried if you can invoke it with a single argument and get back an (N-1)-argument function. Repeat this N times, and it’ll be equivalent to supplying all N arguments at once.

Here’s an example: the Data.Text module in Haskell contains the following function called splitOn:

splitOn :: Text -> Text -> [Text]
splitOn sep text = ...

It’s a fairly standard string splitting function, taking a separator as its first argument, with the second one being a string to perform the splitting on:

splitOn "," "1,2,3"  -- produces ["1", "2", "3"]

Both arguments are of type Text (Haskell strings), while the return type is [Text] — a list of strings. This add up to the signature (type) of splitOn, written above as Text -> Text -> [Text].

Like all functions in Haskell, however, splitOn is curried. We don’t have to provide it with both arguments at once; instead, we can stop at one in order to obtain another function:

splitOnComma :: Text -> [Text]
splitOnComma = splitOn ","

This new function is a partially applied version of splitOn, with its first argument (the separator) already filled in. To complete the call, all you need to do now is provide the text to split:

splitOnComma "1,2,3"  -- also produces ["1", "2", "3"]

and, unsurprisingly, you’ll get the exact same result.

Compare now the type signatures of both splitOn and splitOnComma:

splitOn :: Text -> Text -> [Text]
splitOnComma :: Text -> [Text]

It may be puzzling at first why the same arrow symbol (->) is used for what seems like two distinct meanings: the “argument separator”, and the return type indicator.

But for curried functions, both of those meanings are in fact identical!

Indeed, we can make it more explicit by defining splitOn as:

splitOn :: Text -> (Text -> [Text])

or even:

splitOn :: Text -> TypeOf splitOnComma -- (not a real Haskell syntax)

From this perspective, what splitOn actually returns is not [Text] but a function from Text to [Text] (Text -> [Text]). And conversely, a call with two arguments:

splitOn "," "1,2,3"

is instead two function calls, each taking just one argument:

(splitOn ",") "1,2,3"

This is why the -> arrow isn’t actually ambiguous: it always signifies the mapping of an argument type to a result type. And it’s always just one argument, too, because:

Currying makes all functions take only one argument.

It’s just that sometimes, what those single-argument functions return will be yet another function.

Least used arguments go first

Now that we have a firmer grasp on the idea of currying, we can see how it influences API design.

There is one thing in particular you will notice almost immediately, especially if you are coming from imperative languages that support default argument values and/or function overloading. It’s the particular order of arguments that a well designed, functional API will almost certainly follow.

See the splitOn function again:

splitOn :: Text -> Text -> [Text]
splitOn sep text = ...

It is no accident that it puts the separator as its first argument. This choice — as opposed to the alternative where text goes first — produces much more useful results when the function is applied partially through currying.

Say, for instance, that you want to splice a list of strings where the individual pieces can be comma-separated:

spliceOnComma :: [Text] -> [Text]
spliceOnComma ["1", "2,3", "4,5,6", "7"]
-- ^ This should produce ["1", "2", "3", "4", "5", "6", "7"]

Because the separator appears first in a splitOn call, you can do it easily through a direct use of currying:

spliceOnComma xs = concat $ map (splitOn ",") xs

-- or equivalently, in a terser point-free style:
-- spliceOnComma = concatMap $ splitOn ","

What we do here is apply the split to every string in the list xs (with map), followed by flattening the result — a list of lists, [[Text]] — back to a regular [Text] with concat.

If we had the alternative version of splitOn, one where the order of arguments is reversed:

splitOn' text sep = ...

we’d have no choice but to “fix it”, with either a lambda function or the flip combinator:

spliceOnComma' xs = concat $ map (\x -> splitOn' x ",") xs
spliceOnComma' xs = concat $ map (flip splitOn' ",") xs

Putting the delimiter first is simply more convenient. It is much more likely you’ll be splitting multiple strings on the same separator, as opposed to a single string and multiple separators. The argument order of splitOn is making the common use case slightly easier by moving the more “stable” parameter to the front.

This practice generalizes to all curried functions, forming a simple rule:

The more likely it is for an argument to remain constant between calls, the sooner it should appear in the function signature.

Note how this is different compared to any language where functions may take variable number of arguments. In Python, for example, the equivalent of splitOn is defined as:

str.split(text, sep)

and the implicit default value for sep is essentially “any whitespace character”. In many cases, this is exactly what we want, making the following calls possible1:

>>> str.split("Alice has a cat")
["Alice", "has", "a", "cat"]

So, as a less-used argument, sep actually goes last in str.split, as it is often desirable to omit it altogether. Under the currying regime, however, we put it first, so that we can fix it to a chosen value and obtain a more specialized version of the function.

The fewer arguments, the better

Another thing you’d encounter in languages with flexible function definitions is the proliferation of optional arguments:

response = requests.get("http://example.com/foo",
                        params={'arg': 42},
                        data={'field': 'value'},
                        auth=('user', 'pass'),
                        headers={'User-Agent': "My Amazing App"},
                        cookies={'c_is': 'for_cookie'},
                        files={'attachment.txt': open('file.txt', 'rb')},

Trying to translate this directly to a functional paradigm would result in extremely unreadable function calls — doubly so when you don’t actually need all those arguments and have to provide some canned defaults:

response <- Requests.get
    "http://example.com/foo" [('arg', 42)]
    [] Nothing [] [] [] True Nothing

What does that True mean, for example? Or what exactly does each empty list signify? It’s impossible to know just by looking at the function call alone.

Long argument lists are thus detrimental to the quality of functional APIs. It’s much harder to correctly apply the previous rule (least used arguments first) when there are so many possible permutations.

What should we do then?… In some cases, including the above example of an HTTP library, we cannot simply cut out features in the name of elegance. The necessary information needs to go somewhere, meaning we need to find at least somewhat acceptable place for it.

Fortunately, we have a couple of options that should help us with solving this problem.

Combinators / builders

Looking back at the last example in Python, we can see why the function call remains readable even if it sprouts a dozen or so additional arguments.

The obvious reason is that each one has been uniquely identified by a name.

In order to emulate some form of what’s called keyword arguments, we can split the single function call into multiple stages. Each one would then supply one piece of data, with a matching function name serving as a readability cue:

response <- sendRequest $
            withHeaders [("User-Agent", "My Amazing App")] $
            withBasicAuth "user" "pass" $
            withData [("field", "value")] $
                get "http://example.com/foo"

If we follow this approach, the caller would only invoke those intermediate functions that fit his particular use case. The API above could still offer withCookies, withFiles, or any of the other combinators, but their usage shall be completely optional.

Pretty neat, right?

Thing is, the implementation would be a little involved here. We would clearly need to carry some data between the various withFoo calls, which requires some additional data types in addition to plain functions. At minimum, we need something to represent the Request, as it is created by the get function:

get :: Text -> Request

and then “piped” through withFoo transformers like this one:

withBasicAuth :: Text -> Text -> (Request -> Request)

so that it can we can finally send it:

sendRequest :: Request -> IO Response

Such Request type needs to keep track of all the additional parameters that may have been tacked onto it:

type Request = (Text, [Param])  -- Text is the URL

data Param = Header Text Text
           | BasicAuth Text Text
           | Data [(Text, Text)]
           -- and so on

-- example
withBasicAuth user pass (url, params) =
    (url, params ++ [BasicAuth user pass])

All of a sudden, what would be a single function explodes into a collection of data types and associated combinators.

In Haskell at least, we can forgo some of the boilerplate by automatically deriving an instance of Monoid (or perhaps a Semigroup). Rather than invoking a series of combinators, clients would then build their requests through repeated mappends2:

response <- sendRequest $ get "http://example.com/foo"
                          <> header "User-Agent" "My Awesome App"
                          <> basicAuth "user" "pass"
                          <> body [("field", "value")]

This mini-DSL looks very similar to keyword arguments in Python, as well as the equivalent Builder pattern from Java, Rust, and others. What’s disappointing, however, is that it doesn’t easily beat those solutions in terms of compile-time safety. Unless you invest into some tricky type-level hacks, there is nothing to prevent the users from building invalid requests at runtime:

let reqParams = get "http://example.com/foo"
-- ... lots of code in between ...
response <- sendRequest $
            reqParams <> get "http://example.com/bar" -- woops!

Compared to a plain function (with however many arguments), we have actually lost some measure of correctness here.

Record types

In many cases, fortunately, there is another way to keep our calls both flexible and safe against runtime errors. We just need to change the representation of the input type (here, Request) into a record.

Record is simply a user-defined type that’s a collection of named fields.

Most languages (especially imperative ones: C, C++, Go, Rust, …) call those structures, and use the struct keyword to signify a record definition. In functional programming parlance, they are also referred to as product types; this is because the joint record type is a Cartesian product of its individual field types3.

Going back to our example, it shouldn’t be difficult to define a record representing an HTTP Request:

data Request = Request { reqURL :: URL
                       , reqMethod :: Method
                       , reqHeaders [(Header, Text)]
                       , reqPostData [(Text, Text)]

In fact, I suspect most programmers would naturally reach for this notation first.

Having this definition, calls to sendRequest can be rewritten to take a record instance that we construct on the spot4:

response <- sendRequest $
    Request { reqURL = "http://example.com/bar"
            , reqMethod = GET
            , reqHeaders = [("User-Agent", "My Awesome App")]
            , reqPostData = []

Compare this snippet to the Python example from the beginning of this section. It comes remarkably close, right? The Request record and its fields can indeed work quite nicely as substitutes for keyword arguments.

But besides the readability boon of having “argument” names at the call site. we’ve also gained stronger correctness checks. For example, there is no way anymore to accidentally supply the URL field twice.

Different functions for different things

Astute readers may have noticed at least two things about the previous solutions.

First, they are not mutually incompatible. Quite the opposite, actually: they compose very neatly, allowing us to combine builder functions with the record update syntax in the final API:

response <- sendRequest $
    (get "http://example.com/baz")
    { reqHeaders = [("User-Agent", "My Awesome App")] }

This cuts out basically all the boilerplate of record-based calls, leaving only the parts that actually differ from the defaults5.

But on the second and more important note: we don’t seem to be talking about currying anymore. Does it mean it loses its usefulness once we go beyond certain threshold of complexity?…

Thankfully, the answer is no. While some APIs may require more advanced techniques to access the full breadth of their functionality, it is always possible to expose some carefully constructed facade that is conducive to partial application.

Consider, for example, the functionality exposed by this set of HTTP wrappers:

head :: URL -> Request
headWith :: [(Header, Text)] -> URL -> Request
get :: URL -> Request
getWith :: [(Header, Text)] -> URL -> Request
postForm :: [(Text, Text)] -> URL -> Request
postFormWith :: [(Header, Text)] -> [(Text, Text)] -> URL -> Request
toURL :: Method -> URL -> Request

Each one is obviously curry-friendly6. Combined, they also offer a pretty comprehensive API surface. And should they prove insufficient, you’d still have the builder pattern and/or record updates to fall back on — either for specialized one-off cases, or for writing your own wrappers.

Naturally, this technique of layered API design — with simple wrappers hiding a progressively more advanced core — isn’t limited to just functional programming. In some way, it is what good API design looks like in general. But in FP languages, it becomes especially important, because the expressive benefits of partial application are so paramount there

Fortunately, these principles seem to be followed pretty consistently, at least within the Haskell ecosystem. You can see it in the design of the http-client package, which is the real world extension of the HTTP interface outlined here. More evidently, it can be observed in any of the numerous packages the expose both a basic foo and a more customizable fooWith functions; popular examples include the async package, the zlib library, and the Text.Regex module.

  1. It’d be more common in Python to write this as "Alice has a cat".split(), but this form would make it less obvious how the arguments are passed. 

  2. A great example of this pattern can be found in the optparse-applicative package

  3. Tuples (like (Int, String)) are also product types. They can be thought of as ad-hoc records where field indices serve as rudimentary “names”. In fact, some languages even use the dotted notation to access fields of both records/structs (x.foo) and tuples (y.0). 

  4. For simplicity, I’m gonna assume the URL and Header types can be “magically” constructed from string literals through the GHC’s OverloadedStrings extension. 

  5. In many languages, we can specify more formally what the “default” means for a compound-type like Request, and sometimes even derive it automatically. Examples include the Default typeclass in Haskell, the Default trait in Rust, and the default/argumentless/trivial constructors in C++ et al

  6. Haskell programmers may especially notice how the last function is designed specifically for infix application: response <- sendRequest $ POST `toUrl` url

Continue reading