Pipe `xargs` into `find`
Posted on Sun 03 April 2016 in Code • Tagged with shell scripting, Bash, xargs, find, zsh • Leave a comment
Here’s a trick that’s hardly new, but if you haven’t heard about, it will save you a trip to a man page or two.
Assuming you’re a person who mostly prefers the terminal over some fancy GUI, you’ve probably used the find
command
along with xargs
at least a few times. It’s very common, for example, to use the results of find
as arguments
to some other program. It could something as simple as figuring out which modules in your project have grown
slightly too large:
$ find . -name '*.py' | xargs wc -l | sort -hr
1467 total
322 callee/base.py
261 callee/general.py
251 callee/collections.py
# etc.
We find them all first, and then use xargs
to build a long wc
invocation, and we finally display results
in the reverse order. Pretty easy stuff: I don’t usually have to try more than a dozen times to get it right!1
But how about the opposite situation? Let’s say you have a list of directories you want to search through with find
.
Doing so may seem easy enough2:
$ cat packagedirs.txt | xargs find -name '__init__.py'
Except it’s not going to work. Like a few other Unix commands, find
is very particular about the order of arguments
it receives. Not only are the predicate flags (like -name
) considered in sequence, but they also have to appear
after the directories we want to search through.
But in the xargs
invocation above, essentially the opposite is going to happen.
The replacement flag
So how to remedy this? Enter the -I
flag to xargs
:
$ cat packagedirs.txt | xargs -I{} find {} -name '__init__.py'
This flag will tell xargs
quite a few things.
The most important one is to stop putting the arguments at the end of the command invocation.
Instead, it shall place them wherever it sees the replacement string — here, pair of braces3: {}
.
And because we placed the braces where find
is normally expecting the list of directories to search through,
the command will now get us exactly the results we wanted.
What’s almost impossible to see, however, is that it may not use the exact way we intended to obtain those results.
The difference is easier to spot when we replace find
with echo
:
$ cat >/tmp/list
foo
bar
$ cat /tmp/list | xargs echo
foo bar
$ cat /tmp/list | xargs -I{} echo {}
foo
bar
or, better yet, use xargs
with the -t
flag to print the commands on stderr before executing them:
$ cat packagedirs.txt | xargs -I{} -t find {} -name '__init__.py' >/dev/null
find callee -name '__init__.py'
find tests -name '__init__.py'
As you can see, we actually have more than one find
invocation here!
This is the second effect of -I
: it causes xargs
to execute given command line for each argument separately.
It so happens that it doesn’t really make any difference for our usage of find
, which is why it wasn’t at all obvious
we were running it multiple times.
To avoid problems, though, you should definitely be cognizant of this fact when calling other programs with xargs -I
.
Make arguments spaced again
Incidentally, I’m not aware of any method that’d actually make xargs
produce find foo bar -name ...
calls.
If you need this exact form, probably the easiest way is to use plain old shell variables:
$ (d=$(cat packagedirs.txt); find $d -name '*.py')
This takes advantage of the word splitting feature of Bash and a few other
compatible shells. Caveat is, you may be using a shell where this behavior is disabled by default. The result would be
making find
interpret the content of $d
as a single directory name: foo bar
rather than foo
and bar
.
zsh is one such shell. Although probably a good thing overall, in times like these you’d want to bring the “normal” behavior back. In zsh, it’s fortunately pretty simple:
$ (d=$(cat packagedirs.txt); find ${=d} -name '*.py')
What about a portable solution? As far as I can tell, the only certain way you can ensure word splitting occurs
is to use eval
. Here, the xargs
command can actually come in handy again, albeit only as a prop:
$ (d=$(cat packagedirs.txt | xargs echo); eval "find $d -name '*.py'")
One would hope such hacks aren’t needed very often.
-
A completely kosher version would also use the
-print0
flag tofind
and the-0
flag toxargs
. It’s not necessary here because Python module files cannot contain spaces. ↩ -
Purists shall excuse my use of
cat
here, it’s merely for illustrative purposes. ↩ -
This use of braces in
find
has of course nothing to do with the other possible occurrences of{}
there, like in the-exec
flag. Since you cannot forcefind
to expect a different placeholder, you should use something else forxargs
in those cases, .e.g:xargs -I^ find ^ -name '__main__.py' -exec 'python {}' \;
. ↩