Online Book Reader

Home Category

Mercurial_ The Definitive Guide - Bryan O'Sullivan [30]

By Root 886 0
combination of algorithm and compression of the entire stream (instead of a revision at a time) substantially reduces the number of bytes to be transferred, yielding better network performance over most kinds of network.

If the connection is over ssh, Mercurial doesn’t recompress the stream, because ssh can already do this itself. You can tell Mercurial to always use ssh’s compression feature by editing the .hgrc file in your home directory as follows:

[ui]

ssh = ssh -C

Read/Write Ordering and Atomicity

Appending to files isn’t the whole story when it comes to guaranteeing that a reader won’t see a partial write. If you recall Figure 4-2, revisions in the changelog point to revisions in the manifest, and revisions in the manifest point to revisions in filelogs. This hierarchy is deliberate.

A writer starts a transaction by writing filelog and manifest data, and doesn’t write any changelog data until those are finished. A reader starts by reading changelog data, then manifest data, followed by filelog data.

Since the writer has always finished writing filelog and manifest data before it writes to the changelog, a reader will never read a pointer to a partially written manifest revision from the changelog, and it will never read a pointer to a partially written filelog revision from the manifest.

Concurrent Access

The read/write ordering and atomicity guarantees mean that Mercurial never needs to lock a repository when it’s reading data, even if the repository is being written to while the read is occurring. This has a big effect on scalability; you can have an arbitrary number of Mercurial processes safely reading data from a repository all at once, no matter whether it’s being written to or not.

The lockless nature of reading means that if you’re sharing a repository on a multi-user system, you don’t need to grant other local users permission to write to your repository in order for them to be able to clone it or pull changes from it; they only need read permission. (This is not a common feature among revision control systems, so don’t take it for granted! Most require readers to be able to lock a repository to access it safely, and this requires write permission on at least one directory, which of course makes for all kinds of nasty and annoying security and administrative problems.)

Mercurial uses locks to ensure that only one process can write to a repository at a time (the locking mechanism is safe even over filesystems that are notoriously hostile to locking, such as NFS). If a repository is locked, a writer will wait for a while to retry if the repository becomes unlocked, but if the repository remains locked for too long, the process attempting to write will time out after a while. This means that your daily automated scripts won’t get stuck forever and pile up if a system crashes unnoticed, for example. (Yes, the timeout is configurable, from zero to infinity.)

Safe dirstate access

As with revision data, Mercurial doesn’t take a lock to read the dirstate file; it does acquire a lock to write it. To avoid the possibility of reading a partially written copy of the dirstate file, Mercurial writes to a file with a unique name in the same directory as the dirstate file, then renames the temporary file atomically to dirstate. The file named dirstate is thus guaranteed to be complete, not partially written.

Avoiding Seeks

Critical to Mercurial’s performance is the avoidance of seeks of the disk head, since any seek is far more expensive than even a comparatively large read operation.

This is why, for example, the dirstate is stored in a single file. If there were a dirstate file per directory that Mercurial tracked, the disk would seek once per directory. Instead, Mercurial reads the entire single dirstate file in one step.

Mercurial also uses a “copy on write” scheme when cloning a repository on local storage. Instead of copying every revlog file from the old repository into the new repository, it makes a “hard link,” which is a shorthand way to say “these two names point

Return Main Page Previous Page Next Page

®Online Book Reader