Skip to main content

simplified bash loops (or, xargs -I on steroids)

Project description

You may think that wrld is some abbreviated form of “world”. This is not the case. The world is lame. What isn’t lame is iterating on stdin. Probably my favorite thing to do. In the shell, the sanest way to do this is with a while read line; do loop. Forget the world. wrld is the future of iteration.

Raise your hand if you have ever written this loop:

find -name '*foo.bar' -type f|while read line; do
  mv "$line" "$(echo "$line"|sed 's/pat/rep/')"
done

Or the related loop:

for i in *foo.bar; do
  cp ... # I'm too lazy even to finish this example.
done
Note:

if you have ever written a loop that starts with the words for i in $(ls ..., you’re doing it wrong. Do one of the above instead. (also, the while read line; do version can also fail if there are filenames with newlines, which you might have if you’re iterating on filenames generated by an idiot.)

With wrld, you can write like this: find -name '*foo.bar' -type f | wrld mv {} '@sed "s/pat/rep/"'. You can do something similar with globs as well: wrld mv {} '@sed "s/pat/rep/"' -f *foo.bar This is manifestly better for one-liners in the shell.

You could also think of it as xargs -I{} or the -exec flag from find on steroids, because it iterates on stdin, but it also allows inlining arbitrary shell commands.

$ ls|wrld mv {} '@awk "{print $2, $1}"'
mv 'Arnold Palmer' 'Palmer Arnold'
mv 'Jane Doe' 'Doe Jane'
mv 'John Doe' 'Doe John'
mv 'John Wayne' 'Wayne John'
mv 'Lucy Lawless' 'Lawless Lucy'
mv 'Ricki Lake' 'Lake Ricki'

As you can see, inlined commands have the current line piped to their stdin. If you want to use some poorly-designed command that doesn’t read from stdin as the filter, you can also substitute {} for the current line. Use \{} if you need a literal ‘{}’. However, if you can’t do it with sed or awk, there’s always perl -pe, and if you can’t do it with perl -pe, I don’t want to know about it. You can also see that wrld echos back the commands it constructs. You can shut it up with -q/--no-echo. You can also do a “test run” to see what the generated commands will be without actually running them, using -t/--test flags.

Because POSIX stupidly allows newlines in file names, this is actually a “dangerous” example unless can guarantee there are no idiot newlines in the file names. For this reason, you may instead specify a list of file names to iterate over (like, preferably with a glob) with the -f/–file-list flag:

$ wrld mv {} '@awk "{print $2, $1}"' -f *
mv 'Doe Jane' 'Jane Doe'
mv 'Doe John' 'John Doe'
mv 'Lake Ricki' 'Ricki Lake'
mv 'Lawless Lucy' 'Lucy Lawless'
mv 'Palmer Arnold' 'Arnold Palmer'
mv 'Wayne John' 'John Wayne'

If you’re using a proper shell like fish or zsh, you can do recursive globbing and get quite a lot done this way.

One day, in the far distant future, wrld may support splitting stdin on the null byte for compatibility with find -print0. It is a little know fact that any task which a computer is capable of preforming may be prefomed with the find command, so compatibility is key.

flags

wrld is stupid about flags with the command it wraps. If you want to send a flag through to whatever binary you use in your loop, it needs a backslash in front of it. This means you actually have to use a double backslash \\ in most shells to get it through.

optimize

Note:

I/O bound tasks will not benefit much from these optimizations.

As you may note, wrld is capable of spawning a lot of processes. If it’s some quick thing, who cares? If your iterating over a million files, it might be bad. wrld offers some internal goodies to speed things along, but they are written in python, so don’t expect any miracles! (kind of kidding. A few lines of python is way faster than spawning a new process, but it would be much slower than piping a million lines strait through sed or whatever optimized C utility).

These builtins are for certain common file operations: they have names like “move”, “copy”, “hlink” and “slink”.

  • move moves files recursively. It’s like mv without any options.

  • copy copies files recursively. It’s like cp -R.

  • hlink creates hard links. Hard links basically give the same chunk of data more than one name on the filesystem. It’s called a “hard” link because of the physiological responce many people experience when they realize how powerful this idea can be.

  • slink creates soft links. These are about like shortcuts on the great and glorious Windows operating system. They are called “soft” links because of what happens to you when you realize the original file has moved and all your links are broken. You never have this problem with “hard” links, but you can’t use them across different partitions/devices or on directories, so, eh.

  • srlink expand relative paths to absolute paths when soft linking. Like ln -sr.

  • remove remove stuff. recursively. take care.

  • makedir makes directories… works like mkdir -p

Other builtins may be added as they occur to me or users ask for them. mv, cp and ln are commands I frequently find myself needing in these kinds of loops.

Another way to optimize is by using | as a prefix to your filters, rather than @; i.e. wrld move {} '|awk "{print $2, $1}"' -f *. This opens a single process of awk, filters stdin through that, and then zips the results together with the main loop. This will create problems if the filter produces no output for certain lines of input (like grep would, though I don’t know why you’d use grep in a context like this…), or if you have filenames with newlines, like a freak. So, it will work in most cases. One day I may implement this properly with asyncronous piping, so this won’t be a problem.

Note that, until this becomes an asyncronous pipe, this is a speed enhancement, but piping in this way consumes additional memory, which may make it infeasable for very large tasks in a low memory environment.

There are also two buitin filters. @py allows you to use arbitrary python expressions as a filter. The current line or filename is available in the execution context as i.

$ wrld move {} '@py i.upper()' -f *
move 'Arnold Palmer' 'ARNOLD PALMER'
move 'Jane Doe' 'JANE DOE'
move 'John Doe' 'JOHN DOE'
move 'John Wayne' 'JOHN WAYNE'
move 'Lucy Lawless' 'LUCY LAWLESS'
move 'Ricki Lake' 'RICKY LAKE'

@py uses a little namespace magic that will import any module you happen to use in your expression on demand. Note that only expressions and not statements are supported. @py combined with -f should also do the right thing with newlines in file names.

The other builtin filter is s. The syntax looks a bit like sed, but it’s python regex, so refer to the relevant docs if you’re not already familiar with it. It’s based on Perl, like the regex in most popular programming langauges (and unlike sed), but it has a few of its own quirks.

$ wrld move {} 's/[aeiou]/λ/g' -f *
move 'Arnold Palmer' 'Arnλld Pλlmλr'
move 'Jane Doe' 'Jλnλ Dλλ'
move 'John Doe' 'Jλhn Dλλ'
move 'John Wayne' 'Jλhn Wλynλ'
move 'Lucy Lawless' 'Lλcy Lλwlλss'
move 'Ricki Lake' 'Rλckλ Lλkλ'

It accepts any flags that can be used in a python regex in the contex of (?[flags]), so, aiLmsux. In addition, the g flag is supported, to make it more similar to sed and Perl. While / is used as the delimiter by convention, any non-alphanumeric character may be used.

If the replacement is prefixed with \e, a python expresison can be used, where m is the re.match object for each match, so that offers some interesting possibilities.

I can neither confirm nor deny that there may be another filter in my mind for doing awk-like things based on python’s str.filter method.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrld-0.5.tar.gz (10.3 kB view hashes)

Uploaded Source

Built Distribution

wrld-0.5-py3-none-any.whl (14.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page