Python has many batteries included, but a few things are still conspicuously missing.
One of them is a standardized and convenient approach to URL manipulation, akin to the
URI class in Java.
There are some functions in
urllib, of course
urllib.parse in Python 3), but much like their HTTP-related
comrades, they prove rather verbose and somewhat clunky.
HTTP, however, is solved by the Requests package, so you may wonder if there is some analogous package for URL operations. The answer is affirmative, and the library in question is, quite whimsically, called furl.
URL in a wrap
The sole interesting part of the furl interface is the
furl class. It represents a single URL, broken down to
its constituents, with properties and methods for both reading them out and replacing with new values.
Thanks to this handy (and quite obvious) abstraction, common URL operations become quite simple and self-documenting:
from furl import furl def to_absolute(url, base): """If given ``url`` is a relative path, make it relative to the ``base``. """ furled = furl(url) if not furled.scheme: return furl(base).join(url).url return url def is_same_origin(*urls): """Check whether URLs are from the same origin (host:port).""" origins = set(url.netloc for url in map(furl, urls)) return len(origins) <= 1 def get_facebook_username(profile_url): """Get Facebook user ID from their profile URL.""" furled = furl(profile_url) if not (furled.host == 'facebook.com' or furled.host.endswith('.facebook.com')): raise ValueError("not a Facebook URL: %s" % (profile_url,)) return furled.path.segments[-1] # etc.
This includes the extremely prevalent, yet very harmful pattern of bulding URLs through string interpolation:
url = '%s?%s' % (BASE_URL, urlencode(query_params))
Besides looking unpythonically ugly, it’s also inflexible and error-prone. If
BASE_URL gains some innate query string
'http://example.com/?a=b'), this method will start producing completely invalid
urls (with two question marks,
The equivalent in furl has none of these flaws:
url = furl(BASE_URL).add(query_params).url
The full package
To see the full power of furl, I recommend having a look at its API documentation. It’s quite clear and should be very easy to use.