Pipe `xargs` into `find`

Posted on Sun 03 April 2016 in Code

Here’s a trick that’s hardly new, but if you haven’t heard about, it will save you a trip to a man page or two.

Assuming you’re a person who mostly prefers the terminal over some fancy GUI, you’ve probably used the find command along with xargs at least a few times. It’s very common, for example, to use the results of find as arguments to some other program. It could something as simple as figuring out which modules in your project have grown slightly too large:

$ find . -name '*.py' | xargs wc -l | sort -hr
1467 total
 322 callee/base.py
 261 callee/general.py
 251 callee/collections.py
# etc.

We find them all first, and then use xargs to build a long wc invocation, and we finally display results in the reverse order. Pretty easy stuff: I don’t usually have to try more than a dozen times to get it right!1

But how about the opposite situation? Let’s say you have a list of directories you want to search through with find. Doing so may seem easy enough2:

$ cat packagedirs.txt | xargs find -name '__init__.py'

Except it’s not going to work. Like a few other Unix commands, find is very particular about the order of arguments it receives. Not only are the predicate flags (like -name) considered in sequence, but they also have to appear after the directories we want to search through.

But in the xargs invocation above, essentially the opposite is going to happen.

The replacement flag

So how to remedy this? Enter the -I flag to xargs:

$ cat packagedirs.txt | xargs -I{} find {} -name '__init__.py'

This flag will tell xargs quite a few things.

The most important one is to stop putting the arguments at the end of the command invocation. Instead, it shall place them wherever it sees the replacement string — here, pair of braces3: {}. And because we placed the braces where find is normally expecting the list of directories to search through, the command will now get us exactly the results we wanted.

What’s almost impossible to see, however, is that it may not use the exact way we intended to obtain those results. The difference is easier to spot when we replace find with echo:

$ cat >/tmp/list
foo
bar
$ cat /tmp/list | xargs echo
foo bar
$ cat /tmp/list | xargs -I{} echo {}
foo
bar

or, better yet, use xargs with the -t flag to print the commands on stderr before executing them:

$ cat packagedirs.txt | xargs -I{} -t find {} -name '__init__.py' >/dev/null
find callee -name '__init__.py'
find tests -name '__init__.py'

As you can see, we actually have more than one find invocation here!

This is the second effect of -I: it causes xargs to execute given command line for each argument separately. It so happens that it doesn’t really make any difference for our usage of find, which is why it wasn’t at all obvious we were running it multiple times.

To avoid problems, though, you should definitely be cognizant of this fact when calling other programs with xargs -I.

Make arguments spaced again

Incidentally, I’m not aware of any method that’d actually make xargs produce find foo bar -name ... calls. If you need this exact form, probably the easiest way is to use plain old shell variables:

$ (d=$(cat packagedirs.txt); find $d -name '*.py')

This takes advantage of the word splitting feature of Bash and a few other compatible shells. Caveat is, you may be using a shell where this behavior is disabled by default. The result would be making find interpret the content of $d as a single directory name: foo bar rather than foo and bar.

zsh is one such shell. Although probably a good thing overall, in times like these you’d want to bring the “normal” behavior back. In zsh, it’s fortunately pretty simple:

$ (d=$(cat packagedirs.txt); find ${=d} -name '*.py')

What about a portable solution? As far as I can tell, the only certain way you can ensure word splitting occurs is to use eval. Here, the xargs command can actually come in handy again, albeit only as a prop:

$ (d=$(cat packagedirs.txt | xargs echo); eval "find $d -name '*.py'")

One would hope such hacks aren’t needed very often.


  1. A completely kosher version would also use the -print0 flag to find and the -0 flag to xargs. It’s not necessary here because Python module files cannot contain spaces. 

  2. Purists shall excuse my use of cat here, it’s merely for illustrative purposes. 

  3. This use of braces in find has of course nothing to do with the other possible occurrences of {} there, like in the -exec flag. Since you cannot force find to expect a different placeholder, you should use something else for xargs in those cases, .e.g: xargs -I^ find ^ -name '__main__.py' -exec 'python {}' \;