Classic Shell Scripting - Arnold Robbins [231]
The designers of the original Unix filesystem chose to permit all but two characters from a 256-element set in filenames. The forbidden ones are the control character NUL (the character with all bits set to zero), which is used to mark end-of-string in several programming languages, including the ones used to write most of Unix, and forward slash (/), which is reserved for an important purpose that we describe shortly.
This choice is quite permissive, but you are strongly advised to impose further restrictions, for at least these good reasons:
Since filenames are used by people, the names should require only visible characters: invisible control characters are not candidates.
Filenames get used by both humans and computers: a human might well recognize a string of characters as a filename from its surrounding context, but a computer program needs more precise rules.
Shell metacharacters (i.e., most punctuation characters) in filenames require special handling, and are therefore best avoided altogether.
Initial hyphens make filenames look like Unix command options.
Some non-Unix filesystems permit both uppercase and lowercase characters to be used in filenames, but ignore lettercase differences when comparing names. Unix native filesystems do not: readme, Readme, and README are distinct filenames.[7]
Unix filenames are conventionally written entirely in lowercase, since that is both easier to read and easier to type. Certain common important filenames, such as AUTHORS, BUGS, ChangeLog, COPYRIGHT, INSTALL, LICENSE, Makefile, NEWS, README, and TODO, are conventionally spelled in uppercase, or occasionally, in mixed case. Because uppercase precedes lowercase in the ASCII character set, these files occur at the beginning of a directory listing, making them even more visible. However, in modern Unix systems, the sort order depends on the locale; set the environment variable LC_ALL to C to get the traditional ASCII sort order.
For portability to other operating systems, it is a good idea to limit characters in filenames to Latin letters, digits, hyphen, underscore, and at most, a single dot.
How long can a filename be? That depends on the filesystem, and on lots of software that contains fixed-size buffers that are expected to be big enough to hold filenames. Early Unix systems imposed a 14-character limit. However, Unix systems designed since the mid-1980s have generally permitted up to 255 characters. POSIX defines the constant NAME_MAX to be that length, excluding the terminating NUL character, and requires a minimum value of 14. The X/Open Portability Guide requires a minimum of 255. You can use the getconf [8] command to find out the limit on your system. Here is what most Unix systems report:
$ getconf NAME_MAX .
What is longest filename in current filesystem?
255
The full specification of file locations has another, and larger, limit discussed in Section B.4.1 later in this Appendix.
* * *
Warning
We offer a warning here about spaces in filenames. Some window-based desktop operating systems, where filenames are selected from scrolling menus, or typed into dialog boxes, have led their users to believe that spaces in filenames are just fine. They are not! Filenames get used in many other contexts outside of little boxes, and the only sensible way to recognize a filename is that it is a word chosen from a restricted character set. Unix shells, in particular, assume that commands can be parsed into words separated by spaces.
* * *
* * *
Note
Because of the possibility of whitespace and other special characters in filenames, in shell scripts you should always quote the evaluation of any shell variable that might contain a filename.
* * *
* * *
[2] Later renamed the American National Standards Institute (ANSI).
[3] Search the ISO Standards catalog at http://www.iso.ch/iso/en/CatalogueListPage.CatalogueList.
[4] The