URL library for Python

Posted on Fri 27 November 2015 in Code • Tagged with Python, URL, furlLeave a comment

Python has many batteries included, but a few things are still conspicuously missing.

One of them is a standardized and convenient approach to URL manipulation, akin to the URI class in Java. There are some functions in urllib, of course (or urllib.parse in Python 3), but much like their HTTP-related comrades, they prove rather verbose and somewhat clunky.

HTTP, however, is solved by the Requests package, so you may wonder if there is some analogous package for URL operations. The answer is affirmative, and the library in question is, quite whimsically, called furl.

URL in a wrap

The sole interesting part of the furl interface is the furl class. It represents a single URL, broken down to its constituents, with properties and methods for both reading them out and replacing with new values.

Thanks to this handy (and quite obvious) abstraction, common URL operations become quite simple and self-documenting:

from furl import furl


def to_absolute(url, base):
    """If given ``url`` is a relative path,
    make it relative to the ``base``.
    """
    furled = furl(url)
    if not furled.scheme:
        return furl(base).join(url).url
    return url


def is_same_origin(*urls):
    """Check whether URLs are from the same origin (host:port)."""
    origins = set(url.netloc for url in map(furl, urls))
    return len(origins) <= 1


def get_facebook_username(profile_url):
    """Get Facebook user ID from their profile URL."""
    furled = furl(profile_url)
    if not (furled.host == 'facebook.com' or
            furled.host.endswith('.facebook.com')):
        raise ValueError("not a Facebook URL: %s" % (profile_url,))
    return furled.path.segments[-1]

# etc.

This includes the extremely prevalent, yet very harmful pattern of bulding URLs through string interpolation:

url = '%s?%s' % (BASE_URL, urlencode(query_params))

Besides looking unpythonically ugly, it’s also inflexible and error-prone. If BASE_URL gains some innate query string params ('http://example.com/?a=b'), this method will start producing completely invalid urls (with two question marks, e.g. 'http://example.com/?a=b?foo=bar').

The equivalent in furl has none of these flaws:

url = furl(BASE_URL).add(query_params).url

The full package

To see the full power of furl, I recommend having a look at its API documentation. It’s quite clear and should be very easy to use.

Continue reading