str.startswith() with tuple argument

Posted on Tue 28 June 2016 in Code

Here’s a little known trick that’s applicable to Python’s startswith and endswith methods of str (and unicode).

Suppose you’re checking whether a string starts with some prefix:

if s.startswith('http://'):
    # totally an URL

You eventually add more possible prefixes (or suffixes) to your condition:

if s.startswith('http://') or s.startswith('https://'):
    # ...

Later on you notice the repetition and refactor it into something like this:

SCHEMES = ['http://', 'https://', 'ftp://', 'git://']
if any(s.startswith(p) for p in SCHEMES):
    # ...

or if you’re feeling extra functional:

if any(map(s.startswith, SCHEMES)):
    # ...

Turns out, however, that startswith (and endswith) support this use case natively. Rather than passing just a single string as the argument, you can provide a tuple of strings instead:

SCHEMES = ('http://', 'https://', 'ftp://', 'git://')
if s.startswith(SCHEMES):
    # ...

Either method will then check the original string against every element of the passed tuple. Both will only return True if at least one of the strings is recognized as prefix/suffix. As you can see, that’s exactly what we would previously do with any.

Somewhat surprisingly, however, the feature only works for actual tuples. Trying to pass a seemingly equivalent iterable — a list or set, for example — will be met with interpreter’s refusal:

>>> is_jpeg = filename.endswith(['.jpg', '.jpeg'])
TypeError: endswith first arg must be str, unicode, or tuple, not list

If you dig into it, there doesn’t seem to be a compelling reason for this behavior. The relevant feature request talks about consistency with the built-in isinstance function, but it’s quite difficult to see how those two are related.

In any case, this can be worked around without much difficulty:

PROTOCOLS = ('http', 'https', 'ftp', 'git')
if s.startswith(tuple(p + '://' for p in PROTOCOLS)):
    # ...

though ideally, you’d want to pack the prefixes in a tuple to begin with.