Managing NFS and NIS, 2nd Edition - Mike Eisler [136]
Let's suppose a process with ID 1867 issues an fcntl exclusive lock call on the entire range of a local file that has mandatory lock permissions set. This fcntl call is an advisory lock. Now the process attempts to write the file. The operating system can tell that process 1867 holds an advisory lock, and so, it allows the write to proceed, rather than attempting to acquire the advisory lock on behalf of the process 1867 for the duration of the write. Now suppose process 1867 does the same sequence on another file with mandatory lock permissions, but this file is on an NFS filesystem. Process 1867 issues an fcntl exclusive lock call on the entire range of a file that has mandatory lock permissions set. Now process 1867 attempts to write the file. While the NLM protocol has fields in its lock requests to uniquely identify the process on the client that locked the file, the NFS protocol has no fields to identify the processes that are doing writes or reads. The file is advisory locked, and it has the mandatory lock permissions set, yet the NFS server has no way of knowing if the process that sent the write request is the same one that obtained the lock. Thus, the NFS server cannot lock the file on behalf of the NFS client. For this reason, some NFS servers, including Solaris servers, refuse any read or write to a file with the mandatory lock permissions set.
NFS and Windows lock semantics
The NLM protocol supports byte range locking and share reservations.
While Windows byte range locking is mandatory, on Unix servers it will be advisory. To the dismay of Windows software developers, this means that non-PC/NFS clients might step on PC/NFS clients, because the non-PC/NFS client does not try to acquire a lock. It also means that servers that support both NFS/NLM and SMB might not correctly handle cases where an NFS client is doing a read or write to a file that an SMB client has established a mandatory lock on.
PC/NFS clients will emulate share reservation semantics by issuing the share reservation remote procedure calls to the NLM server. However, most non-PC/NFS clients, or even local processes on Unix NLM servers will not honor the deny semantics of the share reservation of the PC/NFS client. Another problem with the emulation is that Windows semantics expect the share reservation and exclusive file creation to be atomic. The share reservation and file creation go out as separate operations, hence no atomicity, allowing a window of vulnerability, where a client can succeed in its exclusive create, but not get the share reservation.
Troubleshooting locking problems
Lock problems will be evident when an NFS client tries to lock a file, and it fails because someone has it locked. For applications that share access to files, the expectation is that locks will be short-lived. Thus, the pattern your users will notice when something is awry is that yesterday an application started up quite quickly, but today it hangs. Usually it is because an NFS/NLM client holds a lock on a file that your application needs to lock, and the holding client has crashed.
Diagnosing NFS lock hangs
On Solaris, you can use tools like pstack and truss to verify that processes are hanging in a lock request:
client1% ps -eaf | grep SuperApp
mre 23796 10031 0 11:13:22 pts/6 0:00 SuperApp
client1% pstack 23796
23796: SuperApp
ff313134 fcntl (1, 7, ffbef9dc)
ff30de48 fcntl (1, 7, ffbef9dc, 0, 0, 0) + 1c8
ff30e254 lockf (1, 1, 0, 2, ff332584, ff2a0140) + 98
0001086c main (1, ffbefac4, ffbefacc, 20800, 0, 0) + 1c
00010824 _start (0, 0, 0, 0, 0, 0) + dc
client1% truss -p 23796
fcntl(1, F_SETLKW, 0xFFBEF9DC) (sleeping...)
This verifies that the application is stuck in a lock request. We can use pfiles to see what is going on with the files of process 23796:
client1% pfiles 23796
pfiles 23796
23796: SuperApp
Current rlimit: 256 file descriptors
0: S_IFCHR mode:0620 dev:136,0