URL library for Python

Posted on Fri 27 November 2015 in Code • Tagged with Python, URL, furlLeave a comment

Python has many batteries included, but a few things are still conspicuously missing.

One of them is a standardized and convenient approach to URL manipulation, akin to the URI class in Java. There are some functions in urllib, of course (or urllib.parse in Python 3), but much like their HTTP-related comrades, they prove rather verbose and somewhat clunky.

HTTP, however, is solved by the Requests package, so you may wonder if there is some analogous package for URL operations. The answer is affirmative, and the library in question is, quite whimsically, called furl.

URL in a wrap

The sole interesting part of the furl interface is the furl class. It represents a single URL, broken down to its constituents, with properties and methods for both reading them out and replacing with new values.

Thanks to this handy (and quite obvious) abstraction, common URL operations become quite simple and self-documenting:

from furl import furl


def to_absolute(url, base):
    """If given ``url`` is a relative path,
    make it relative to the ``base``.
    """
    furled = furl(url)
    if not furled.scheme:
        return furl(base).join(url).url
    return url


def is_same_origin(*urls):
    """Check whether URLs are from the same origin (host:port)."""
    origins = set(url.netloc for url in map(furl, urls))
    return len(origins) <= 1


def get_facebook_username(profile_url):
    """Get Facebook user ID from their profile URL."""
    furled = furl(profile_url)
    if not (furled.host == 'facebook.com' or
            furled.host.endswith('.facebook.com')):
        raise ValueError("not a Facebook URL: %s" % (profile_url,))
    return furled.path.segments[-1]

# etc.

This includes the extremely prevalent, yet very harmful pattern of bulding URLs through string interpolation:

url = '%s?%s' % (BASE_URL, urlencode(query_params))

Besides looking unpythonically ugly, it’s also inflexible and error-prone. If BASE_URL gains some innate query string params ('http://example.com/?a=b'), this method will start producing completely invalid urls (with two question marks, e.g. 'http://example.com/?a=b?foo=bar').

The equivalent in furl has none of these flaws:

url = furl(BASE_URL).add(query_params).url

The full package

To see the full power of furl, I recommend having a look at its API documentation. It’s quite clear and should be very easy to use.

Continue reading

mailto: URLs in JavaScript

Posted on Tue 15 September 2015 in Code • Tagged with JavaScript, email, URLLeave a comment

Though not as popular as back in the days, mailto: URLs are sometimes still the best way — and most certainly the easiest — to enable users to send emails from a web application. Typically, they are used in plain regular links that are created with the <a> tag:

<a href="mailto:john.doe@example.com">Send email</a>

Depending on the operating system and browser settings, clicking on such a link could invoke a different mail client. In the past, this could cause much confusion if the user has never used its local email application (e.g. Outlook), but nowadays browsers are increasingly doing the right thing and presenting a choice of possible options. This includes the various web applications from popular email providers. User can thus choose — say — Gmail, and have all subsequent mailto: links take them to a Compose Mail view there.

Given those developments, it’s possible that soon those URLs won’t be as quaint as they used to be ;) In any case, I think it’s worthwhile to have a few mailto: tricks up our sleeve for when they turn out to be an appropriate solution for the task at hand.

Triggering them in JavaScript

In the example above, I’ve used a mailto: URL directly in some HTML markup. For the most part, however, those URLs behave like any other and support a lot of the same functionality that http:// URLs do:

function sendEmail(address) {
    window.location.href = 'mailto:' + address;
}

Predictably, calling this sendEmail function is equivalent to clicking on a mailto: link. This additional level of indirection, however, may be enough to fool some web crawlers that look for email addresses to spam. Its effectiveness depends largely on how radically we’re going to obfuscate the address argument in the actual invocation.

More importantly, though, a function like that can be part of some more complicated logic. The only limitation is the same one that restricts where and when it’s possible to window.open another browser window. Namely, it has to be called in a direct response to a UI action triggered by the user (a button click would be a good example).

Customization

In their simplest form, mailto: URLs aren’t terribly interesting, since their only parameter is the address of a sole recipient. Fortunately, this isn’t where their capabilities end: it’s possible to customize other email fields as well, such as the subject or even body1.

Somewhat surprisingly, all this can be done in a manner that’s more typical for http:// URL: with query string parameters. Right after the recipient’s address, we can drop a question mark and add some ampersand-separated key-value pairs:

<a href="mailto:john.doe@example.com?subject=Hi&amp;body=Hello%20John!">
  Say hello to John!
</a>

A more programmatic approach is of course also possible. One thing we need to remember, though, is the proper escaping of input values in order to make them safe to put inside an URL2. The encodeURIComponent function can be used for this purpose:

function getMailtoUrl(to, subject, body) {
    var args = [];
    if (typeof subject !== 'undefined') {
        args.push('subject=' + encodeURIComponent(subject));
    }
    if (typeof body !== 'undefined') {
        args.push('body=' + encodeURIComponent(body))
    }

    var url = 'mailto:' + encodeURIComponent(to);
    if (args.length > 0) {
        url += '?' + args.join('&');
    }
    return url;
}

In theory, almost any of the numerous email headers can be specified this way. However, support may vary across browsers, especially since some of the headers (like the Bcc: one) are subject to certain security or privacy considerations. Personally, I wouldn’t recommend relying on anything besides subject= and body=, with a possible addition of cc= for the Cc:header .


  1. By RFC 2368, only text/plain email body can be specified this way. 

  2. Note that this is unrelated to the HTML escaping of & character as &amp; entity in the previous example. 

Continue reading