AppleScript_ The Definitive Guide - Matt Neuburg [124]
The value of the text item delimiters persists as long as this instance of the AppleScript scripting component does. Because you might run more than one script in the presence of this scripting component, any of which might set the text item delimiters, it is wise to make no assumptions as to the value of the text item delimiters. In other words, don't use it without setting it first. Apple's documentation makes a big deal of this, but it's really no different from any of the other AppleScript global properties, such as pi (see Chapter 16).
Observe that other string elements may equally be used to split a string, often more conveniently: characters splits a string into individual characters, words splits a string at its word boundaries, and paragraphs splits a string at its line breaks.
Unicode Text
Like the Macintosh itself, the AppleScript string class has long been bedeviled by the existence of text encodings representing characters outside its own native encoding, which is MacRoman . With the coming of Mac OS X, this problem is essentially solved at system level: text is now Unicode . Unicode expresses tens of thousands of characters in a single massive encoding, and in its fullest form will express about a million characters, embracing every character of every written language in history. Unfortunately, AppleScript precedes Mac OS X, and the string class is still its primary text class. Over the years, various secondary classes have been fudged into AppleScript in an attempt to increase a string's representational power and to improve AppleScript's compatibility with text in the world around it. At the moment, the most important of these is the Unicode text class, which has the UTF-16 encoding.
Text supplied by the system is often Unicode text rather than a string. For example:
tell application "Finder" to set x to (get name of disk 1)
class of x -- Unicode text
Similarly, some Mac OS X-native applications, such as TextEdit, return text values as Unicode text.
The trouble is that Unicode text remains very much a second-class citizen within AppleScript. Perhaps someday all AppleScript text will be Unicode text, but that day has not yet come. A literal string (the stuff between quotes in your code) is still a string, not Unicode text. Thus, you can't even enter a Unicode string directly; you can try, but non-MacRoman characters are lost at compile time. AppleScript's supplied string manipulation commands, such as the scripting addition command ASCII character, don't work outside the MacRoman range. The character string element knows nothing of composed characters. Unicode text display (in a result, for example) isn't particularly good either; many non-MacRoman characters are not displayed properly. Unicode text communication between a script and a Unicode-savvy application works, but problems can arise.
Then there's the business of how a Unicode text value will interact with a string value, or with a command that expects a string. The good news is that in Tiger such interaction is much improved over previous versions of AppleScript. Whatever you can do to a string, you can do to Unicode text. If you get an element of a Unicode text value, the result is Unicode text. If you concatenate Unicode text and a string, the result is Unicode text (in earlier versions of AppleScript this was not true, which was a big source of trouble). You can explicitly coerce between a string and Unicode text; AppleScript also implicitly coerces for you as appropriate. And scripting addition commands have now mostly been revised to accept Unicode text parameters.
Forming Unicode Text
As I mentioned earlier, you can't type a non-MacRoman literal directly. This section provides some workarounds, all of them more or less horrible.
Behind the scenes, a Unicode text string is a 'utxt' resource consisting of a stream of UTF-16 hex bytes. This suggests that you can form such a resource directly as raw data (see "Data," earlier in this chapter) and coerce it to Unicode text. For example:
set