Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

TextEncoder Class Reference

This class can be used to convert text between multiple representations, e.g. More...

#include <textEncoder.h>

Inheritance diagram for TextEncoder:

TextNode List of all members.

Public Types

enum  Encoding { E_iso8859, E_utf8, E_unicode }

Public Member Functions

 TextEncoder ()
void set_encoding (Encoding encoding)
 Specifies how the string set via set_text() is to be interpreted.

Encoding get_encoding () const
 Returns the encoding by which the string set via set_text() is to be interpreted.

void set_text (const string &text)
 Changes the text that is stored in the encoder.

void set_text (const string &text, Encoding encoding)
 The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string.

void clear_text ()
 Removes the text from the TextEncoder.

bool has_text () const
void make_upper ()
 Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly).

void make_lower ()
 Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly).

string get_text () const
 Returns the current text, as encoded via the current encoding system.

string get_text (Encoding encoding) const
 Returns the current text, as encoded via the indicated encoding system.

void append_text (const string &text)
 Appends the indicates string to the end of the stored text.

void append_unicode_char (int character)
 Appends a single character to the end of the stored text.

int get_num_chars () const
 Returns the number of characters in the stored text.

int get_unicode_char (int index) const
 Returns the Unicode value of the nth character in the stored text.

void set_unicode_char (int index, int character)
 Sets the Unicode value of the nth character in the stored text.

string get_encoded_char (int index) const
 Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

string get_encoded_char (int index, Encoding encoding) const
 Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

string get_text_as_ascii () const
 Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.

void set_wtext (const wstring &wtext)
 Changes the text that is stored in the encoder.

const wstringget_wtext () const
 Returns the text associated with the TextEncoder, as a wide-character string.

void append_wtext (const wstring &text)
 Appends the indicates string to the end of the stored wide-character text.

wstring get_wtext_as_ascii () const
 Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.

string encode_wtext (const wstring &wtext) const
 Encodes a wide-text string into a single-char string, according to the current encoding.

wstring decode_text (const string &text) const
 Returns the given wstring decoded to a single-byte string, via the current encoding system.


Static Public Member Functions

void set_default_encoding (Encoding encoding)
 Specifies the default encoding to be used for all subsequently created TextEncoder objects.

Encoding get_default_encoding ()
 Specifies the default encoding to be used for all subsequently created TextEncoder objects.

string reencode_text (const string &text, Encoding from, Encoding to)
 Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string.

bool unicode_isalpha (int character)
 Returns true if the indicated character is an alphabetic letter, false otherwise.

bool unicode_isdigit (int character)
 Returns true if the indicated character is a numeric digit, false otherwise.

bool unicode_ispunct (int character)
 Returns true if the indicated character is a punctuation mark, false otherwise.

bool unicode_islower (int character)
 Returns true if the indicated character is a lowercase letter, false otherwise.

bool unicode_isupper (int character)
 Returns true if the indicated character is an uppercase letter, false otherwise.

int unicode_toupper (int character)
 Returns the uppercase equivalent of the given Unicode character.

int unicode_tolower (int character)
 Returns the uppercase equivalent of the given Unicode character.

string upper (const string &source)
 Converts the string to uppercase, assuming the string is encoded in the default encoding.

string upper (const string &source, Encoding encoding)
 Converts the string to uppercase, assuming the string is encoded in the indicated encoding.

string lower (const string &source)
 Converts the string to lowercase, assuming the string is encoded in the default encoding.

string lower (const string &source, Encoding encoding)
 Converts the string to lowercase, assuming the string is encoded in the indicated encoding.

string encode_wchar (wchar_t ch, Encoding encoding)
 Encodes a single wide char into a one-, two-, or three-byte string, according to the given encoding system.

string encode_wtext (const wstring &wtext, Encoding encoding)
 Encodes a wide-text string into a single-char string, according to the given encoding.

wstring decode_text (const string &text, Encoding encoding)
 Returns the given wstring decoded to a single-byte string, via the given encoding system.

TypeHandle get_class_type ()
void init_type ()

Private Types

enum  Flags { F_got_text = 0x0001, F_got_wtext = 0x0002 }

Static Private Member Functions

wstring decode_text_impl (StringDecoder &decoder)
 Decodes the eight-bit stream from the indicated decoder, returning the decoded wide-char string.


Private Attributes

int _flags
Encoding _encoding
string _text
wstring _wtext

Static Private Attributes

Encoding _default_encoding
TypeHandle _type_handle

Detailed Description

This class can be used to convert text between multiple representations, e.g.

utf-8 to Unicode. You may use it as a static class object, passing the encoding each time, or you may create an instance and use that object, which will record the current encoding and retain the current string.

This class is also a base class of TextNode, which inherits this functionality.

Definition at line 53 of file textEncoder.h.


Member Enumeration Documentation

enum TextEncoder::Encoding
 

Enumeration values:
E_iso8859 
E_utf8 
E_unicode 

Definition at line 55 of file textEncoder.h.

Referenced by get_encoding().

enum TextEncoder::Flags [private]
 

Enumeration values:
F_got_text 
F_got_wtext 

Reimplemented in TextNode.

Definition at line 118 of file textEncoder.h.


Constructor & Destructor Documentation

TextEncoder::TextEncoder  )  [inline]
 

Definition at line 31 of file textEncoder.I.

References _flags, F_got_text, and F_got_wtext.


Member Function Documentation

void TextEncoder::append_text const string &  text  )  [inline]
 

Appends the indicates string to the end of the stored text.

Reimplemented in TextNode.

Definition at line 246 of file textEncoder.I.

References get_encoded_char(), get_encoding(), and INLINE.

void TextEncoder::append_unicode_char int  character  )  [inline]
 

Appends a single character to the end of the stored text.

This may be a wide character, up to 16 bits in Unicode.

Reimplemented in TextNode.

Definition at line 264 of file textEncoder.I.

void TextEncoder::append_wtext const wstring text  )  [inline]
 

Appends the indicates string to the end of the stored wide-character text.

Reimplemented in TextNode.

Definition at line 694 of file textEncoder.I.

Referenced by TextNode::set_draw_order().

void TextEncoder::clear_text  )  [inline]
 

Removes the text from the TextEncoder.

Reimplemented in TextNode.

Definition at line 179 of file textEncoder.I.

References _flags, _text, F_got_text, F_got_wtext, get_text(), and INLINE.

wstring TextEncoder::decode_text const string &  text,
TextEncoder::Encoding  encoding
[static]
 

Returns the given wstring decoded to a single-byte string, via the given encoding system.

Definition at line 226 of file textEncoder.cxx.

wstring TextEncoder::decode_text const string &  text  )  const [inline]
 

Returns the given wstring decoded to a single-byte string, via the current encoding system.

Definition at line 725 of file textEncoder.I.

Referenced by get_default_encoding(), get_unicode_char(), and TextNode::~TextNode().

wstring TextEncoder::decode_text_impl StringDecoder decoder  )  [static, private]
 

Decodes the eight-bit stream from the indicated decoder, returning the decoded wide-char string.

Definition at line 260 of file textEncoder.cxx.

string TextEncoder::encode_wchar wchar_t  ch,
TextEncoder::Encoding  encoding
[static]
 

Encodes a single wide char into a one-, two-, or three-byte string, according to the given encoding system.

Definition at line 146 of file textEncoder.cxx.

string TextEncoder::encode_wtext const wstring wtext,
TextEncoder::Encoding  encoding
[static]
 

Encodes a wide-text string into a single-char string, according to the given encoding.

Definition at line 205 of file textEncoder.cxx.

References StringDecoder::get_next_character(), StringDecoder::is_eof(), and wstring.

string TextEncoder::encode_wtext const wstring wtext  )  const [inline]
 

Encodes a wide-text string into a single-char string, according to the current encoding.

Definition at line 710 of file textEncoder.I.

Referenced by get_num_chars(), get_unicode_char(), PGEntry::set_text(), set_text(), unicode_islower(), and TextNode::~TextNode().

TypeHandle TextEncoder::get_class_type void   )  [inline, static]
 

Reimplemented in TextNode.

Definition at line 132 of file textEncoder.h.

TextEncoder::Encoding TextEncoder::get_default_encoding  )  [inline, static]
 

Specifies the default encoding to be used for all subsequently created TextEncoder objects.

See set_encoding().

Definition at line 117 of file textEncoder.I.

References decode_text(), INLINE, and set_wtext().

string TextEncoder::get_encoded_char int  index,
TextEncoder::Encoding  encoding
const [inline]
 

Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

Definition at line 355 of file textEncoder.I.

References UnicodeLatinMap::CT_punct.

string TextEncoder::get_encoded_char int  index  )  const [inline]
 

Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

Definition at line 340 of file textEncoder.I.

References INLINE, UnicodeLatinMap::look_up(), and NULL.

Referenced by append_text().

TextEncoder::Encoding TextEncoder::get_encoding  )  const [inline]
 

Returns the encoding by which the string set via set_text() is to be interpreted.

See set_encoding().

Definition at line 83 of file textEncoder.I.

References _default_encoding, Encoding, and INLINE.

Referenced by append_text().

int TextEncoder::get_num_chars  )  const [inline]
 

Returns the number of characters in the stored text.

This is a count of wide characters, after the string has been decoded according to set_encoding().

Definition at line 282 of file textEncoder.I.

References encode_wtext(), get_wtext_as_ascii(), and INLINE.

string TextEncoder::get_text TextEncoder::Encoding  encoding  )  const [inline]
 

Returns the current text, as encoded via the indicated encoding system.

Definition at line 231 of file textEncoder.I.

References _flags, _wtext, F_got_text, get_wtext(), INLINE, and nassertv.

string TextEncoder::get_text  )  const [inline]
 

Returns the current text, as encoded via the current encoding system.

Definition at line 212 of file textEncoder.I.

References _wtext, get_wtext(), INLINE, and nassertr.

Referenced by clear_text(), unicode_isalpha(), and unicode_isdigit().

string TextEncoder::get_text_as_ascii  )  const [inline]
 

Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.

This means replacing accented letters with their unaccented ASCII equivalents.

It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain encoded in the encoding specified by set_encoding().

Definition at line 397 of file textEncoder.I.

References UnicodeLatinMap::Entry::_toupper_character, INLINE, UnicodeLatinMap::look_up(), and NULL.

int TextEncoder::get_unicode_char int  index  )  const [inline]
 

Returns the Unicode value of the nth character in the stored text.

This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().

Definition at line 301 of file textEncoder.I.

References decode_text(), and encode_wtext().

const wstring & TextEncoder::get_wtext  )  const [inline]
 

Returns the text associated with the TextEncoder, as a wide-character string.

Definition at line 675 of file textEncoder.I.

Referenced by get_text(), has_text(), make_lower(), make_upper(), and set_text().

wstring TextEncoder::get_wtext_as_ascii  )  const
 

Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.

This means replacing accented letters with their unaccented ASCII equivalents.

It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain in their original form.

Definition at line 110 of file textEncoder.cxx.

References UnicodeLatinMap::Entry::_ascii_additional, UnicodeLatinMap::Entry::_ascii_equiv, E_iso8859, E_utf8, UnicodeLatinMap::look_up(), and NULL.

Referenced by get_num_chars().

bool TextEncoder::has_text  )  const [inline]
 

Definition at line 193 of file textEncoder.I.

References _flags, _wtext, F_got_text, F_got_wtext, get_wtext(), INLINE, and wstring.

Referenced by set_default_encoding(), and unicode_ispunct().

void TextEncoder::init_type void   )  [inline, static]
 

Reimplemented in TextNode.

Definition at line 135 of file textEncoder.h.

Referenced by ConfigureFn().

string TextEncoder::lower const string &  source,
TextEncoder::Encoding  encoding
[inline, static]
 

Converts the string to lowercase, assuming the string is encoded in the indicated encoding.

Definition at line 634 of file textEncoder.I.

string TextEncoder::lower const string &  source  )  [inline, static]
 

Converts the string to lowercase, assuming the string is encoded in the default encoding.

Definition at line 619 of file textEncoder.I.

void TextEncoder::make_lower  ) 
 

Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly).

Definition at line 64 of file textEncoder.cxx.

References UnicodeLatinMap::Entry::_ascii_additional, UnicodeLatinMap::Entry::_ascii_equiv, _wtext, get_wtext(), UnicodeLatinMap::look_up(), NULL, and wstring.

Referenced by unicode_isdigit().

void TextEncoder::make_upper  ) 
 

Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly).

Definition at line 42 of file textEncoder.cxx.

References _flags, _wtext, F_got_text, get_wtext(), and unicode_tolower().

Referenced by unicode_isalpha().

string TextEncoder::reencode_text const string &  text,
TextEncoder::Encoding  from,
TextEncoder::Encoding  to
[inline, static]
 

Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string.

This does not change or affect any properties on the TextEncoder itself.

Definition at line 418 of file textEncoder.I.

void TextEncoder::set_default_encoding TextEncoder::Encoding  encoding  )  [inline, static]
 

Specifies the default encoding to be used for all subsequently created TextEncoder objects.

See set_encoding().

Definition at line 100 of file textEncoder.I.

References _flags, _text, F_got_text, F_got_wtext, has_text(), and INLINE.

void TextEncoder::set_encoding TextEncoder::Encoding  encoding  )  [inline]
 

Specifies how the string set via set_text() is to be interpreted.

The default, E_iso8859, means a standard string with one-byte characters (i.e. ASCII). Other encodings are possible to take advantage of character sets with more than 256 characters.

This affects only future calls to set_text(); it does not change text that was set previously.

Definition at line 65 of file textEncoder.I.

Referenced by unicode_isalpha(), and unicode_isdigit().

void TextEncoder::set_text const string &  text,
TextEncoder::Encoding  encoding
[inline]
 

The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string.

Subsequent calls to get_text() will return the same text re-encoded using whichever encoding is specified by set_encoding().

Definition at line 166 of file textEncoder.I.

References encode_wtext(), get_wtext(), and INLINE.

void TextEncoder::set_text const string &  text  )  [inline]
 

Changes the text that is stored in the encoder.

The text should be encoded according to the method indicated by set_encoding(). Subsequent calls to get_text() will return this same string, while get_wtext() will return the decoded version of the string.

Reimplemented in TextNode.

Definition at line 140 of file textEncoder.I.

References _flags, _text, _wtext, F_got_wtext, and INLINE.

Referenced by TextNode::get_card_as_set(), unicode_isalpha(), and unicode_isdigit().

void TextEncoder::set_unicode_char int  index,
int  character
[inline]
 

Sets the Unicode value of the nth character in the stored text.

This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().

Definition at line 322 of file textEncoder.I.

References UnicodeLatinMap::Entry::_ascii_equiv, INLINE, UnicodeLatinMap::look_up(), and NULL.

void TextEncoder::set_wtext const wstring wtext  )  [inline]
 

Changes the text that is stored in the encoder.

Subsequent calls to get_wtext() will return this same string, while get_text() will return the encoded version of the string.

Reimplemented in TextNode.

Definition at line 657 of file textEncoder.I.

Referenced by get_default_encoding().

bool TextEncoder::unicode_isalpha int  character  )  [inline, static]
 

Returns true if the indicated character is an alphabetic letter, false otherwise.

This is akin to ctype's isalpha(), extended to Unicode.

Definition at line 436 of file textEncoder.I.

References get_text(), INLINE, make_upper(), set_encoding(), and set_text().

bool TextEncoder::unicode_isdigit int  character  )  [inline, static]
 

Returns true if the indicated character is a numeric digit, false otherwise.

This is akin to ctype's isdigit(), extended to Unicode.

Definition at line 458 of file textEncoder.I.

References get_text(), INLINE, make_lower(), set_encoding(), and set_text().

bool TextEncoder::unicode_islower int  character  )  [inline, static]
 

Returns true if the indicated character is a lowercase letter, false otherwise.

This is akin to ctype's islower(), extended to Unicode.

Definition at line 524 of file textEncoder.I.

References _encoding, and encode_wtext().

bool TextEncoder::unicode_ispunct int  character  )  [inline, static]
 

Returns true if the indicated character is a punctuation mark, false otherwise.

This is akin to ctype's ispunct(), extended to Unicode.

Definition at line 481 of file textEncoder.I.

References _flags, _wtext, F_got_text, F_got_wtext, has_text(), and INLINE.

bool TextEncoder::unicode_isupper int  character  )  [inline, static]
 

Returns true if the indicated character is an uppercase letter, false otherwise.

This is akin to ctype's isupper(), extended to Unicode.

Definition at line 503 of file textEncoder.I.

int TextEncoder::unicode_tolower int  character  )  [inline, static]
 

Returns the uppercase equivalent of the given Unicode character.

This is akin to ctype's tolower(), extended to Unicode.

Definition at line 566 of file textEncoder.I.

Referenced by make_upper().

int TextEncoder::unicode_toupper int  character  )  [inline, static]
 

Returns the uppercase equivalent of the given Unicode character.

This is akin to ctype's toupper(), extended to Unicode.

Definition at line 545 of file textEncoder.I.

string TextEncoder::upper const string &  source,
TextEncoder::Encoding  encoding
[inline, static]
 

Converts the string to uppercase, assuming the string is encoded in the indicated encoding.

Definition at line 600 of file textEncoder.I.

string TextEncoder::upper const string &  source  )  [inline, static]
 

Converts the string to uppercase, assuming the string is encoded in the default encoding.

Definition at line 585 of file textEncoder.I.


Member Data Documentation

TextEncoder::Encoding TextEncoder::_default_encoding [static, private]
 

Definition at line 27 of file textEncoder.cxx.

Referenced by get_encoding().

Encoding TextEncoder::_encoding [private]
 

Definition at line 125 of file textEncoder.h.

Referenced by unicode_islower().

int TextEncoder::_flags [private]
 

Reimplemented in TextNode.

Definition at line 124 of file textEncoder.h.

Referenced by clear_text(), get_text(), has_text(), make_upper(), set_default_encoding(), set_text(), TextEncoder(), and unicode_ispunct().

string TextEncoder::_text [private]
 

Definition at line 126 of file textEncoder.h.

Referenced by clear_text(), set_default_encoding(), and set_text().

TypeHandle TextEncoder::_type_handle [static, private]
 

Reimplemented in TextNode.

Definition at line 26 of file textEncoder.cxx.

wstring TextEncoder::_wtext [private]
 

Definition at line 127 of file textEncoder.h.

Referenced by get_text(), has_text(), make_lower(), make_upper(), set_text(), and unicode_ispunct().


The documentation for this class was generated from the following files:
Generated on Fri May 2 00:55:26 2003 for Panda by doxygen1.3