Mercurial_ The Definitive Guide - Bryan O'Sullivan [53]
Mercurial supports two kinds of pattern syntax. The most frequently used is called glob; this is the same kind of pattern matching used by the Unix shell, and should be familiar to Windows command prompt users, too.
When Mercurial does automatic pattern matching on Windows, it uses glob syntax. You can thus omit the glob: prefix on Windows, but it’s safe to use it, too.
The re syntax is more powerful; it lets you specify patterns using regular expressions, also known as regexps.
By the way, in the examples that follow, notice that I’m careful to wrap all of my patterns in quote characters, so that they won’t get expanded by the shell before Mercurial sees them.
Shell-Style Glob Patterns
This is an overview of the kinds of patterns you can use when you’re matching on glob patterns.
The * character matches any string within a single directory.
$ hg add 'glob:*.py'
adding main.py
The ** pattern matches any string and crosses directory boundaries. It’s not a standard Unix glob token, but it’s accepted by several popular Unix shells, and is very useful.
$ cd ..
$ hg status 'glob:**.py'
A examples/simple.py
A src/main.py
? examples/performant.py
? setup.py
? src/watcher/watcher.py
The ? pattern matches any single character.
$ hg status 'glob:**.?'
? src/watcher/_watcher.c
The [ character begins a character class. This matches any single character within the class. The class ends with a ] character. A class may contain multiple ranges of the form a-f, which is shorthand for abcdef.
$ hg status 'glob:**[nr-t]'
? MANIFEST.in
? src/xyzzy.txt
If the first character after the [ in a character class is a !, it negates the class, making it match any single character not in the class.
A { begins a group of subpatterns, where the whole group matches if any subpattern in the group matches. The , character separates subpatterns, and } ends the group.
$ hg status 'glob:*.{in,py}'
? MANIFEST.in
? setup.py
Watch out!
Don’t forget that if you want to match a pattern in any directory, you should not be using the * match-any token, as this will only match within one directory. Instead, use the ** token. This small example illustrates the difference between the two.
$ hg status 'glob:*.py'
? setup.py
$ hg status 'glob:**.py'
A examples/simple.py
A src/main.py
? examples/performant.py
? setup.py
? src/watcher/watcher.py
Regular Expression Matching with Re Patterns
Mercurial accepts the same regular expression syntax as the Python programming language (it uses Python’s regexp engine internally). This is based on the Perl language’s regexp syntax, which is the most popular dialect in use (it’s also used in Java, for example).
I won’t discuss Mercurial’s regexp dialect in any detail here, as regexps are not often used. Perl-style regexps are in any case already exhaustively documented on a multitude of websites, and in many books. Instead, I will focus here on a few things you should know if you find yourself needing to use regexps with Mercurial.
A regexp is matched against an entire filename, relative to the root of the repository. In other words, even if you’re already in subdirectory foo, if you want to match files under this directory, your pattern must start with foo/.
One thing to note, if you’re familiar with Perl-style regexps, is that Mercurial’s are rooted. That is, a regexp starts matching against the beginning of a string; it doesn’t look for a match anywhere within the string. To match anywhere in a string, start your pattern with .*.
Filtering Files
Not only does Mercurial give you a variety of ways to specify files, it lets you further winnow those files using filters. Commands that work with filenames accept two filtering options:
-I, or --include, lets you specify a pattern that filenames must match in order to be processed.
-X, or --exclude, gives you a way to avoid processing files if they match this pattern.
You can provide multiple -I and -X options on the command line, and intermix them as you please. Mercurial interprets