— A character sequence is an array object (8.3.4) A that can be declared as T A [N], where T is any of the types char, unsigned char, or signed char (3.9.1), optionally qualified by any combination of const or volatile. The initial elements of the array have defined contents up to and including an element determined by some predicate. A character sequence can be designated by a pointer value S that points to its first element.
166) Note that this definition differs from the definition in ISO C 7.1.1.
167) declared in <clocale> (22.6).
据此可以定义更具体的NTBS(null terminated byte string)
ISO C++11 17.5.2.1.4.1 Byte strings [byte.strings]
1 A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character); no other element in the sequence has the value zero.168
2 The length of an NTBS is the number of elements that precede the terminating null character. An empty ntbs has a length of zero.
3 The value of an NTBS is the sequence of values of the elements up to and including the terminating null character.
4 A static NTBS is an ntbs with static storage duration.169
168) Many of the objects manipulated by function signatures declared in <cstring> (21.7) are character sequences or NTBSs.
The size of some of these character sequences is limited by a length value, maintained separately from the character sequence.
169) A string literal, such as "abc", is a static ntbs.
NTBS的元素通常用char类型对象或值表示。
NTBS在NTCTS和character sequence的基础上明确了存储。此外,NTBS区分于NTCTS的定义的重要目的之一是为了明确(允许变长编码的)多字节字符串NTMBS的外延——注意,这里的一些“长度”开始体现出显著的区别。
先看定义:
ISO C++11 17.5.2.1.4.2 Multibyte strings [multibyte.strings]
1 A null-terminated multibyte string, or NTMBS, is an NTBS that constitutes a sequence of valid multibyte characters, beginning and ending in the initial shift state.170
2 A static NTMBS is an NTMBS with static storage duration.
170) An NTBS that contains characters only from the basic execution character set is also an NTMBS. Each multibyte character then consists of a single byte.
可见NTMBS是NTBS的子集,它其中可以包含多个(连续)字节组成的字符。
按17.5.2.1.4.1/2,NTMBS即NTBS的长度是其中包含的元素数。这里的“元素”概念和NTBS中有区别,即强调作为NTMBS时长度是多字节字符数而不是字符(字节)数。显然对于一般的NTMBS,即便去除结尾的空字符,长度和占用的字节数可以不同。
但是,ISO C里面关于“长度”可以有些关键性的不同。简而言之,ISO C标准库使用的string相当于NTBS,类似NTMBS的概念中长度仍以字节计:
ISO C99/C11(N1570)
7.1.1/1 A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order. |