Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Professional C__ - Marc Gregoire [252]

By Root 1224 0

UTF-32 encoded Unicode characters as one char32_t.

wchar_t: Stores a wide character of a compiler-specific size and encoding.

The benefit of using char16_t and char32_t instead of wchar_t is that the size of char16_t and char32_t are compiler-independent, whereas the size of wchar_t depends on your compiler.

The standard also defines the following two macros:

__STDC_UTF_32__: If this is defined, the type char32_t represents a UTF-32 encoding. If it is not defined, the type char32_t has a compiler dependent encoding.

__STDC_UTF_16__: If this is defined, the type char16_t represents a UTF-16 encoding. If it is not defined, the type char16_t has a compiler dependent encoding.

C++11 defines three new string prefixes in addition to the existing L prefix. The complete set of supported string prefixes is as follows:

u8: A char string literal with UTF-8 encoding.

u: A char16_t string literal, which can be UTF-16 if __STDC_UTF_16__ is defined.

U: A char32_t string literal, which can be UTF-32 if __STDC_UTF_32__ is defined.

L: A wchar_t string literal with a compiler-dependent encoding.

All of these string literals can also be combined with the raw string literal seen earlier in this chapter. For example:

const char* s1 = u8R"(Raw UTF-8 encoded string literal)";

const wchar_t* s2 = LR"(Raw wide string literal)";

const char16_t* s3 = uR"(Raw char16_t string literal)";

const char32_t* s4 = UR"(Raw char32_t string literal)";

Code snippet from CharTypes\CharTypes.cpp

If you are using Unicode encoding, for example by using u8 UTF-8 string literals or by specifying __STDC_UTF_16__ or __STDC_UTF_32__, you can insert a specific Unicode code point in your non-raw string literal by using the \uABCD notation. For example \u03C0 represents the PI character, and \u00B2 represents the ² character. The following code prints "π r²":

const char* formula = u8"\u03C0 r\u00B2";

cout << formula << endl;

Code snippet from CharTypes\CharTypes.cpp

The C++ string library has also been extended to include two new typedefs to work with the new character types:

typedef basic_string u16string;

typedef basic_string u32string;

Additionally, the following four new conversion functions related to char16_t and char32_t are included: mbrtoc16, c16rtomb, mbrtoc32 and c32rtomb.

Unfortunately, the support for char16_t and char32_t stops there. For example, the I/O stream classes in the C++11 standard library do not include support for these new character types. This means that there is nothing like a version of cout or cin that supports char16_t and char32_t making it difficult to print such strings to a console or to read them from user input. If you want to do more with char16_t and char32_t strings you will have to resort to third-party libraries.

Locales and Facets

Character sets are only one of the differences in data representation between countries. Even countries that use similar character sets, such as Great Britain and the United States, still differ in how they represent data such as dates and money.

The standard C++ mechanism that groups specific data about a particular set of cultural parameters is called a locale. An individual component of a locale, such as date format, time format, number format, etc. is called a facet. An example of a locale is U.S. English. An example of a facet is the format used to display a date. There are several built-in facets that are common to all locales. The language also provides a way to customize or add facets.

Using Locales

When using I/O streams, data is formatted according to a particular locale. Locales are objects that can be attached to a stream. They are defined in the header file. Locale names can be implementation-specific. One standard is to separate the language and the area in two-letter sections with an optional encoding. For example, the locale for the English language as spoken in the U.S. is en_US, while the locale for the English language as spoken in Great Britain is en_GB. The locale for Japanese spoken in Japan

Online Book Reader

Professional C__ - Marc Gregoire [252]

®Online Book Reader