URL library for Python
Posted on Fri 27 November 2015 in Code
Python has many batteries included, but a few things are still conspicuously missing.
One of them is a standardized and convenient approach to URL manipulation, akin to the
URI
class in Java.
There are some functions in urllib
, of course
(or urllib.parse
in Python 3), but much like their HTTP-related
comrades, they prove rather verbose and somewhat clunky.
HTTP, however, is solved by the Requests package, so you may wonder if there is some analogous package for URL operations. The answer is affirmative, and the library in question is, quite whimsically, called furl.
URL in a wrap
The sole interesting part of the furl interface is the furl
class. It represents a single URL, broken down to
its constituents, with properties and methods for both reading them out and replacing with new values.
Thanks to this handy (and quite obvious) abstraction, common URL operations become quite simple and self-documenting:
from furl import furl
def to_absolute(url, base):
"""If given ``url`` is a relative path,
make it relative to the ``base``.
"""
furled = furl(url)
if not furled.scheme:
return furl(base).join(url).url
return url
def is_same_origin(*urls):
"""Check whether URLs are from the same origin (host:port)."""
origins = set(url.netloc for url in map(furl, urls))
return len(origins) <= 1
def get_facebook_username(profile_url):
"""Get Facebook user ID from their profile URL."""
furled = furl(profile_url)
if not (furled.host == 'facebook.com' or
furled.host.endswith('.facebook.com')):
raise ValueError("not a Facebook URL: %s" % (profile_url,))
return furled.path.segments[-1]
# etc.
This includes the extremely prevalent, yet very harmful pattern of bulding URLs through string interpolation:
url = '%s?%s' % (BASE_URL, urlencode(query_params))
Besides looking unpythonically ugly, it’s also inflexible and error-prone. If BASE_URL
gains some innate query string
params ('http://example.com/?a=b'
), this method will start producing completely invalid url
s (with two question marks,
e.g. 'http://example.com/?a=b?foo=bar'
).
The equivalent in furl has none of these flaws:
url = furl(BASE_URL).add(query_params).url
The full package
To see the full power of furl, I recommend having a look at its API documentation. It’s quite clear and should be very easy to use.