#include <textEncoder.h>
Inheritance diagram for TextEncoder:
Public Types | |
enum | Encoding { E_iso8859, E_utf8, E_unicode } |
Public Member Functions | |
TextEncoder () | |
void | set_encoding (Encoding encoding) |
Specifies how the string set via set_text() is to be interpreted. | |
Encoding | get_encoding () const |
Returns the encoding by which the string set via set_text() is to be interpreted. | |
void | set_text (const string &text) |
Changes the text that is stored in the encoder. | |
void | set_text (const string &text, Encoding encoding) |
The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string. | |
void | clear_text () |
Removes the text from the TextEncoder. | |
bool | has_text () const |
void | make_upper () |
Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly). | |
void | make_lower () |
Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly). | |
string | get_text () const |
Returns the current text, as encoded via the current encoding system. | |
string | get_text (Encoding encoding) const |
Returns the current text, as encoded via the indicated encoding system. | |
void | append_text (const string &text) |
Appends the indicates string to the end of the stored text. | |
void | append_unicode_char (int character) |
Appends a single character to the end of the stored text. | |
int | get_num_chars () const |
Returns the number of characters in the stored text. | |
int | get_unicode_char (int index) const |
Returns the Unicode value of the nth character in the stored text. | |
void | set_unicode_char (int index, int character) |
Sets the Unicode value of the nth character in the stored text. | |
string | get_encoded_char (int index) const |
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string. | |
string | get_encoded_char (int index, Encoding encoding) const |
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string. | |
string | get_text_as_ascii () const |
Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. | |
void | set_wtext (const wstring &wtext) |
Changes the text that is stored in the encoder. | |
const wstring & | get_wtext () const |
Returns the text associated with the TextEncoder, as a wide-character string. | |
void | append_wtext (const wstring &text) |
Appends the indicates string to the end of the stored wide-character text. | |
wstring | get_wtext_as_ascii () const |
Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. | |
string | encode_wtext (const wstring &wtext) const |
Encodes a wide-text string into a single-char string, according to the current encoding. | |
wstring | decode_text (const string &text) const |
Returns the given wstring decoded to a single-byte string, via the current encoding system. | |
Static Public Member Functions | |
void | set_default_encoding (Encoding encoding) |
Specifies the default encoding to be used for all subsequently created TextEncoder objects. | |
Encoding | get_default_encoding () |
Specifies the default encoding to be used for all subsequently created TextEncoder objects. | |
string | reencode_text (const string &text, Encoding from, Encoding to) |
Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string. | |
bool | unicode_isalpha (int character) |
Returns true if the indicated character is an alphabetic letter, false otherwise. | |
bool | unicode_isdigit (int character) |
Returns true if the indicated character is a numeric digit, false otherwise. | |
bool | unicode_ispunct (int character) |
Returns true if the indicated character is a punctuation mark, false otherwise. | |
bool | unicode_islower (int character) |
Returns true if the indicated character is a lowercase letter, false otherwise. | |
bool | unicode_isupper (int character) |
Returns true if the indicated character is an uppercase letter, false otherwise. | |
int | unicode_toupper (int character) |
Returns the uppercase equivalent of the given Unicode character. | |
int | unicode_tolower (int character) |
Returns the uppercase equivalent of the given Unicode character. | |
string | upper (const string &source) |
Converts the string to uppercase, assuming the string is encoded in the default encoding. | |
string | upper (const string &source, Encoding encoding) |
Converts the string to uppercase, assuming the string is encoded in the indicated encoding. | |
string | lower (const string &source) |
Converts the string to lowercase, assuming the string is encoded in the default encoding. | |
string | lower (const string &source, Encoding encoding) |
Converts the string to lowercase, assuming the string is encoded in the indicated encoding. | |
string | encode_wchar (wchar_t ch, Encoding encoding) |
Encodes a single wide char into a one-, two-, or three-byte string, according to the given encoding system. | |
string | encode_wtext (const wstring &wtext, Encoding encoding) |
Encodes a wide-text string into a single-char string, according to the given encoding. | |
wstring | decode_text (const string &text, Encoding encoding) |
Returns the given wstring decoded to a single-byte string, via the given encoding system. | |
TypeHandle | get_class_type () |
void | init_type () |
Private Types | |
enum | Flags { F_got_text = 0x0001, F_got_wtext = 0x0002 } |
Static Private Member Functions | |
wstring | decode_text_impl (StringDecoder &decoder) |
Decodes the eight-bit stream from the indicated decoder, returning the decoded wide-char string. | |
Private Attributes | |
int | _flags |
Encoding | _encoding |
string | _text |
wstring | _wtext |
Static Private Attributes | |
Encoding | _default_encoding |
TypeHandle | _type_handle |
utf-8 to Unicode. You may use it as a static class object, passing the encoding each time, or you may create an instance and use that object, which will record the current encoding and retain the current string.
This class is also a base class of TextNode, which inherits this functionality.
Definition at line 53 of file textEncoder.h.
|
Definition at line 55 of file textEncoder.h. Referenced by get_encoding(). |
|
Reimplemented in TextNode. Definition at line 118 of file textEncoder.h. |
|
Definition at line 31 of file textEncoder.I. References _flags, F_got_text, and F_got_wtext. |
|
Appends the indicates string to the end of the stored text.
Reimplemented in TextNode. Definition at line 246 of file textEncoder.I. References get_encoded_char(), get_encoding(), and INLINE. |
|
Appends a single character to the end of the stored text. This may be a wide character, up to 16 bits in Unicode. Reimplemented in TextNode. Definition at line 264 of file textEncoder.I. |
|
Appends the indicates string to the end of the stored wide-character text.
Reimplemented in TextNode. Definition at line 694 of file textEncoder.I. Referenced by TextNode::set_draw_order(). |
|
Removes the text from the TextEncoder.
Reimplemented in TextNode. Definition at line 179 of file textEncoder.I. References _flags, _text, F_got_text, F_got_wtext, get_text(), and INLINE. |
|
Returns the given wstring decoded to a single-byte string, via the given encoding system.
Definition at line 226 of file textEncoder.cxx. |
|
Returns the given wstring decoded to a single-byte string, via the current encoding system.
Definition at line 725 of file textEncoder.I. Referenced by get_default_encoding(), get_unicode_char(), and TextNode::~TextNode(). |
|
Decodes the eight-bit stream from the indicated decoder, returning the decoded wide-char string.
Definition at line 260 of file textEncoder.cxx. |
|
Encodes a single wide char into a one-, two-, or three-byte string, according to the given encoding system.
Definition at line 146 of file textEncoder.cxx. |
|
Encodes a wide-text string into a single-char string, according to the given encoding.
Definition at line 205 of file textEncoder.cxx. References StringDecoder::get_next_character(), StringDecoder::is_eof(), and wstring. |
|
Encodes a wide-text string into a single-char string, according to the current encoding.
Definition at line 710 of file textEncoder.I. Referenced by get_num_chars(), get_unicode_char(), PGEntry::set_text(), set_text(), unicode_islower(), and TextNode::~TextNode(). |
|
Reimplemented in TextNode. Definition at line 132 of file textEncoder.h. |
|
Specifies the default encoding to be used for all subsequently created TextEncoder objects. See set_encoding(). Definition at line 117 of file textEncoder.I. References decode_text(), INLINE, and set_wtext(). |
|
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.
Definition at line 355 of file textEncoder.I. References UnicodeLatinMap::CT_punct. |
|
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.
Definition at line 340 of file textEncoder.I. References INLINE, UnicodeLatinMap::look_up(), and NULL. Referenced by append_text(). |
|
Returns the encoding by which the string set via set_text() is to be interpreted. See set_encoding(). Definition at line 83 of file textEncoder.I. References _default_encoding, Encoding, and INLINE. Referenced by append_text(). |
|
Returns the number of characters in the stored text. This is a count of wide characters, after the string has been decoded according to set_encoding(). Definition at line 282 of file textEncoder.I. References encode_wtext(), get_wtext_as_ascii(), and INLINE. |
|
Returns the current text, as encoded via the indicated encoding system.
Definition at line 231 of file textEncoder.I. References _flags, _wtext, F_got_text, get_wtext(), INLINE, and nassertv. |
|
Returns the current text, as encoded via the current encoding system.
Definition at line 212 of file textEncoder.I. References _wtext, get_wtext(), INLINE, and nassertr. Referenced by clear_text(), unicode_isalpha(), and unicode_isdigit(). |
|
Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. This means replacing accented letters with their unaccented ASCII equivalents. It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain encoded in the encoding specified by set_encoding(). Definition at line 397 of file textEncoder.I. References UnicodeLatinMap::Entry::_toupper_character, INLINE, UnicodeLatinMap::look_up(), and NULL. |
|
Returns the Unicode value of the nth character in the stored text. This may be a wide character (greater than 255), after the string has been decoded according to set_encoding(). Definition at line 301 of file textEncoder.I. References decode_text(), and encode_wtext(). |
|
Returns the text associated with the TextEncoder, as a wide-character string.
Definition at line 675 of file textEncoder.I. Referenced by get_text(), has_text(), make_lower(), make_upper(), and set_text(). |
|
Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. This means replacing accented letters with their unaccented ASCII equivalents. It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain in their original form. Definition at line 110 of file textEncoder.cxx. References UnicodeLatinMap::Entry::_ascii_additional, UnicodeLatinMap::Entry::_ascii_equiv, E_iso8859, E_utf8, UnicodeLatinMap::look_up(), and NULL. Referenced by get_num_chars(). |
|
Definition at line 193 of file textEncoder.I. References _flags, _wtext, F_got_text, F_got_wtext, get_wtext(), INLINE, and wstring. Referenced by set_default_encoding(), and unicode_ispunct(). |
|
Reimplemented in TextNode. Definition at line 135 of file textEncoder.h. Referenced by ConfigureFn(). |
|
Converts the string to lowercase, assuming the string is encoded in the indicated encoding.
Definition at line 634 of file textEncoder.I. |
|
Converts the string to lowercase, assuming the string is encoded in the default encoding.
Definition at line 619 of file textEncoder.I. |
|
Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly).
Definition at line 64 of file textEncoder.cxx. References UnicodeLatinMap::Entry::_ascii_additional, UnicodeLatinMap::Entry::_ascii_equiv, _wtext, get_wtext(), UnicodeLatinMap::look_up(), NULL, and wstring. Referenced by unicode_isdigit(). |
|
Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly).
Definition at line 42 of file textEncoder.cxx. References _flags, _wtext, F_got_text, get_wtext(), and unicode_tolower(). Referenced by unicode_isalpha(). |
|
Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string. This does not change or affect any properties on the TextEncoder itself. Definition at line 418 of file textEncoder.I. |
|
Specifies the default encoding to be used for all subsequently created TextEncoder objects. See set_encoding(). Definition at line 100 of file textEncoder.I. References _flags, _text, F_got_text, F_got_wtext, has_text(), and INLINE. |
|
Specifies how the string set via set_text() is to be interpreted. The default, E_iso8859, means a standard string with one-byte characters (i.e. ASCII). Other encodings are possible to take advantage of character sets with more than 256 characters. This affects only future calls to set_text(); it does not change text that was set previously. Definition at line 65 of file textEncoder.I. Referenced by unicode_isalpha(), and unicode_isdigit(). |
|
The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string. Subsequent calls to get_text() will return the same text re-encoded using whichever encoding is specified by set_encoding(). Definition at line 166 of file textEncoder.I. References encode_wtext(), get_wtext(), and INLINE. |
|
Changes the text that is stored in the encoder. The text should be encoded according to the method indicated by set_encoding(). Subsequent calls to get_text() will return this same string, while get_wtext() will return the decoded version of the string. Reimplemented in TextNode. Definition at line 140 of file textEncoder.I. References _flags, _text, _wtext, F_got_wtext, and INLINE. Referenced by TextNode::get_card_as_set(), unicode_isalpha(), and unicode_isdigit(). |
|
Sets the Unicode value of the nth character in the stored text. This may be a wide character (greater than 255), after the string has been decoded according to set_encoding(). Definition at line 322 of file textEncoder.I. References UnicodeLatinMap::Entry::_ascii_equiv, INLINE, UnicodeLatinMap::look_up(), and NULL. |
|
Changes the text that is stored in the encoder. Subsequent calls to get_wtext() will return this same string, while get_text() will return the encoded version of the string. Reimplemented in TextNode. Definition at line 657 of file textEncoder.I. Referenced by get_default_encoding(). |
|
Returns true if the indicated character is an alphabetic letter, false otherwise. This is akin to ctype's isalpha(), extended to Unicode. Definition at line 436 of file textEncoder.I. References get_text(), INLINE, make_upper(), set_encoding(), and set_text(). |
|
Returns true if the indicated character is a numeric digit, false otherwise. This is akin to ctype's isdigit(), extended to Unicode. Definition at line 458 of file textEncoder.I. References get_text(), INLINE, make_lower(), set_encoding(), and set_text(). |
|
Returns true if the indicated character is a lowercase letter, false otherwise. This is akin to ctype's islower(), extended to Unicode. Definition at line 524 of file textEncoder.I. References _encoding, and encode_wtext(). |
|
Returns true if the indicated character is a punctuation mark, false otherwise. This is akin to ctype's ispunct(), extended to Unicode. Definition at line 481 of file textEncoder.I. References _flags, _wtext, F_got_text, F_got_wtext, has_text(), and INLINE. |
|
Returns true if the indicated character is an uppercase letter, false otherwise. This is akin to ctype's isupper(), extended to Unicode. Definition at line 503 of file textEncoder.I. |
|
Returns the uppercase equivalent of the given Unicode character. This is akin to ctype's tolower(), extended to Unicode. Definition at line 566 of file textEncoder.I. Referenced by make_upper(). |
|
Returns the uppercase equivalent of the given Unicode character. This is akin to ctype's toupper(), extended to Unicode. Definition at line 545 of file textEncoder.I. |
|
Converts the string to uppercase, assuming the string is encoded in the indicated encoding.
Definition at line 600 of file textEncoder.I. |
|
Converts the string to uppercase, assuming the string is encoded in the default encoding.
Definition at line 585 of file textEncoder.I. |
|
Definition at line 27 of file textEncoder.cxx. Referenced by get_encoding(). |
|
Definition at line 125 of file textEncoder.h. Referenced by unicode_islower(). |
|
Reimplemented in TextNode. Definition at line 124 of file textEncoder.h. Referenced by clear_text(), get_text(), has_text(), make_upper(), set_default_encoding(), set_text(), TextEncoder(), and unicode_ispunct(). |
|
Definition at line 126 of file textEncoder.h. Referenced by clear_text(), set_default_encoding(), and set_text(). |
|
Reimplemented in TextNode. Definition at line 26 of file textEncoder.cxx. |
|
Definition at line 127 of file textEncoder.h. Referenced by get_text(), has_text(), make_lower(), make_upper(), set_text(), and unicode_ispunct(). |