Main Page Class Hierarchy Alphabetical List Compound List File List Compound Members File Members

TextEncoder Class Reference

This class can be used to convert text between multiple representations, e.g. More...

#include <textEncoder.h>

Inheritance diagram for TextEncoder:


Public Types
enum	Encoding { E_iso8859, E_utf8, E_unicode }
Public Member Functions
	TextEncoder ()
void	set_encoding (Encoding encoding)
	Specifies how the string set via set_text() is to be interpreted.
Encoding	get_encoding () const
	Returns the encoding by which the string set via set_text() is to be interpreted.
void	set_text (const string &text)
	Changes the text that is stored in the encoder.
void	set_text (const string &text, Encoding encoding)
	The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string.
void	clear_text ()
	Removes the text from the TextEncoder.
bool	has_text () const
void	make_upper ()
	Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly).
void	make_lower ()
	Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly).
string	get_text () const
	Returns the current text, as encoded via the current encoding system.
string	get_text (Encoding encoding) const
	Returns the current text, as encoded via the indicated encoding system.
void	append_text (const string &text)
	Appends the indicates string to the end of the stored text.
void	append_unicode_char (int character)
	Appends a single character to the end of the stored text.
int	get_num_chars () const
	Returns the number of characters in the stored text.
int	get_unicode_char (int index) const
	Returns the Unicode value of the nth character in the stored text.
void	set_unicode_char (int index, int character)
	Sets the Unicode value of the nth character in the stored text.
string	get_encoded_char (int index) const
	Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.
string	get_encoded_char (int index, Encoding encoding) const
	Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.
string	get_text_as_ascii () const
	Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.
void	set_wtext (const wstring &wtext)
	Changes the text that is stored in the encoder.
const wstring &	get_wtext () const
	Returns the text associated with the TextEncoder, as a wide-character string.
void	append_wtext (const wstring &text)
	Appends the indicates string to the end of the stored wide-character text.
wstring	get_wtext_as_ascii () const
	Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.
string	encode_wtext (const wstring &wtext) const
	Encodes a wide-text string into a single-char string, according to the current encoding.
wstring	decode_text (const string &text) const
	Returns the given wstring decoded to a single-byte string, via the current encoding system.
Static Public Member Functions
void	set_default_encoding (Encoding encoding)
	Specifies the default encoding to be used for all subsequently created TextEncoder objects.
Encoding	get_default_encoding ()
	Specifies the default encoding to be used for all subsequently created TextEncoder objects.
string	reencode_text (const string &text, Encoding from, Encoding to)
	Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string.
bool	unicode_isalpha (int character)
	Returns true if the indicated character is an alphabetic letter, false otherwise.
bool	unicode_isdigit (int character)
	Returns true if the indicated character is a numeric digit, false otherwise.
bool	unicode_ispunct (int character)
	Returns true if the indicated character is a punctuation mark, false otherwise.
bool	unicode_islower (int character)
	Returns true if the indicated character is a lowercase letter, false otherwise.
bool	unicode_isupper (int character)
	Returns true if the indicated character is an uppercase letter, false otherwise.
int	unicode_toupper (int character)
	Returns the uppercase equivalent of the given Unicode character.
int	unicode_tolower (int character)
	Returns the uppercase equivalent of the given Unicode character.
string	upper (const string &source)
	Converts the string to uppercase, assuming the string is encoded in the default encoding.
string	upper (const string &source, Encoding encoding)
	Converts the string to uppercase, assuming the string is encoded in the indicated encoding.
string	lower (const string &source)
	Converts the string to lowercase, assuming the string is encoded in the default encoding.
string	lower (const string &source, Encoding encoding)
	Converts the string to lowercase, assuming the string is encoded in the indicated encoding.
string	encode_wchar (wchar_t ch, Encoding encoding)
	Encodes a single wide char into a one-, two-, or three-byte string, according to the given encoding system.
string	encode_wtext (const wstring &wtext, Encoding encoding)
	Encodes a wide-text string into a single-char string, according to the given encoding.
wstring	decode_text (const string &text, Encoding encoding)
	Returns the given wstring decoded to a single-byte string, via the given encoding system.
TypeHandle	get_class_type ()
void	init_type ()
Private Types
enum	Flags { F_got_text = 0x0001, F_got_wtext = 0x0002 }
Static Private Member Functions
wstring	decode_text_impl (StringDecoder &decoder)
	Decodes the eight-bit stream from the indicated decoder, returning the decoded wide-char string.
Private Attributes
int	_flags
Encoding	_encoding
string	_text
wstring	_wtext
Static Private Attributes
Encoding	_default_encoding
TypeHandle	_type_handle

Detailed Description

This class can be used to convert text between multiple representations, e.g.

utf-8 to Unicode. You may use it as a static class object, passing the encoding each time, or you may create an instance and use that object, which will record the current encoding and retain the current string.

This class is also a base class of TextNode, which inherits this functionality.

Definition at line 53 of file textEncoder.h.

Member Enumeration Documentation

enum TextEncoder::Encoding

Enumeration values:

E_iso8859

E_utf8

E_unicode

Definition at line 55 of file textEncoder.h.
Referenced by get_encoding().

enum TextEncoder::Flags [private]

Enumeration values:

F_got_text

F_got_wtext

Reimplemented in TextNode.
Definition at line 118 of file textEncoder.h.

Constructor & Destructor Documentation

TextEncoder::TextEncoder ( ) [inline]

Definition at line 31 of file textEncoder.I.
References _flags, F_got_text, and F_got_wtext.

Member Function Documentation

void TextEncoder::append_text ( const string & text ) [inline]

Appends the indicates string to the end of the stored text.

Reimplemented in TextNode.
Definition at line 246 of file textEncoder.I.
References get_encoded_char(), get_encoding(), and INLINE.

void TextEncoder::append_unicode_char ( int character ) [inline]

Appends a single character to the end of the stored text.
This may be a wide character, up to 16 bits in Unicode.
Reimplemented in TextNode.
Definition at line 264 of file textEncoder.I.

void TextEncoder::append_wtext ( const wstring & text ) [inline]

Appends the indicates string to the end of the stored wide-character text.

Reimplemented in TextNode.
Definition at line 694 of file textEncoder.I.
Referenced by TextNode::set_draw_order().

void TextEncoder::clear_text ( ) [inline]

Removes the text from the TextEncoder.

Reimplemented in TextNode.
Definition at line 179 of file textEncoder.I.
References _flags, _text, F_got_text, F_got_wtext, get_text(), and INLINE.

wstring TextEncoder::decode_text ( const string & text,

TextEncoder::Encoding encoding

) [static]

Returns the given wstring decoded to a single-byte string, via the given encoding system.

Definition at line 226 of file textEncoder.cxx.

wstring TextEncoder::decode_text ( const string & text ) const [inline]

Returns the given wstring decoded to a single-byte string, via the current encoding system.

Definition at line 725 of file textEncoder.I.
Referenced by get_default_encoding(), get_unicode_char(), and TextNode::~TextNode().

wstring TextEncoder::decode_text_impl ( StringDecoder & decoder ) [static, private]

Decodes the eight-bit stream from the indicated decoder, returning the decoded wide-char string.

Definition at line 260 of file textEncoder.cxx.

string TextEncoder::encode_wchar ( wchar_t ch,

TextEncoder::Encoding encoding

) [static]

Encodes a single wide char into a one-, two-, or three-byte string, according to the given encoding system.

Definition at line 146 of file textEncoder.cxx.

string TextEncoder::encode_wtext ( const wstring & wtext,

TextEncoder::Encoding encoding

) [static]

Encodes a wide-text string into a single-char string, according to the given encoding.

Definition at line 205 of file textEncoder.cxx.
References StringDecoder::get_next_character(), StringDecoder::is_eof(), and wstring.

string TextEncoder::encode_wtext ( const wstring & wtext ) const [inline]

Encodes a wide-text string into a single-char string, according to the current encoding.

Definition at line 710 of file textEncoder.I.
Referenced by get_num_chars(), get_unicode_char(), PGEntry::set_text(), set_text(), unicode_islower(), and TextNode::~TextNode().

TypeHandle TextEncoder::get_class_type ( void ) [inline, static]

Reimplemented in TextNode.
Definition at line 132 of file textEncoder.h.

TextEncoder::Encoding TextEncoder::get_default_encoding ( ) [inline, static]

Specifies the default encoding to be used for all subsequently created TextEncoder objects.
See set_encoding().
Definition at line 117 of file textEncoder.I.
References decode_text(), INLINE, and set_wtext().

string TextEncoder::get_encoded_char ( int index,

TextEncoder::Encoding encoding

) const [inline]

Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

Definition at line 355 of file textEncoder.I.
References UnicodeLatinMap::CT_punct.

string TextEncoder::get_encoded_char ( int index ) const [inline]

Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

Definition at line 340 of file textEncoder.I.
References INLINE, UnicodeLatinMap::look_up(), and NULL.
Referenced by append_text().

TextEncoder::Encoding TextEncoder::get_encoding ( ) const [inline]

Returns the encoding by which the string set via set_text() is to be interpreted.
See set_encoding().
Definition at line 83 of file textEncoder.I.
References _default_encoding, Encoding, and INLINE.
Referenced by append_text().

int TextEncoder::get_num_chars ( ) const [inline]

Returns the number of characters in the stored text.
This is a count of wide characters, after the string has been decoded according to set_encoding().
Definition at line 282 of file textEncoder.I.
References encode_wtext(), get_wtext_as_ascii(), and INLINE.

string TextEncoder::get_text ( TextEncoder::Encoding encoding ) const [inline]

Returns the current text, as encoded via the indicated encoding system.

Definition at line 231 of file textEncoder.I.
References _flags, _wtext, F_got_text, get_wtext(), INLINE, and nassertv.

string TextEncoder::get_text ( ) const [inline]

Returns the current text, as encoded via the current encoding system.

Definition at line 212 of file textEncoder.I.
References _wtext, get_wtext(), INLINE, and nassertr.
Referenced by clear_text(), unicode_isalpha(), and unicode_isdigit().

string TextEncoder::get_text_as_ascii ( ) const [inline]

Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.
This means replacing accented letters with their unaccented ASCII equivalents.
It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain encoded in the encoding specified by set_encoding().
Definition at line 397 of file textEncoder.I.
References UnicodeLatinMap::Entry::_toupper_character, INLINE, UnicodeLatinMap::look_up(), and NULL.

int TextEncoder::get_unicode_char ( int index ) const [inline]

Returns the Unicode value of the nth character in the stored text.
This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().
Definition at line 301 of file textEncoder.I.
References decode_text(), and encode_wtext().

const wstring & TextEncoder::get_wtext ( ) const [inline]

Returns the text associated with the TextEncoder, as a wide-character string.

Definition at line 675 of file textEncoder.I.
Referenced by get_text(), has_text(), make_lower(), make_upper(), and set_text().

wstring TextEncoder::get_wtext_as_ascii ( ) const

Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.
This means replacing accented letters with their unaccented ASCII equivalents.
It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain in their original form.
Definition at line 110 of file textEncoder.cxx.
References UnicodeLatinMap::Entry::_ascii_additional, UnicodeLatinMap::Entry::_ascii_equiv, E_iso8859, E_utf8, UnicodeLatinMap::look_up(), and NULL.
Referenced by get_num_chars().

bool TextEncoder::has_text ( ) const [inline]

Definition at line 193 of file textEncoder.I.
References _flags, _wtext, F_got_text, F_got_wtext, get_wtext(), INLINE, and wstring.
Referenced by set_default_encoding(), and unicode_ispunct().

void TextEncoder::init_type ( void ) [inline, static]

Reimplemented in TextNode.
Definition at line 135 of file textEncoder.h.
Referenced by ConfigureFn().

string TextEncoder::lower ( const string & source,

TextEncoder::Encoding encoding

) [inline, static]

Converts the string to lowercase, assuming the string is encoded in the indicated encoding.

Definition at line 634 of file textEncoder.I.

string TextEncoder::lower ( const string & source ) [inline, static]

Converts the string to lowercase, assuming the string is encoded in the default encoding.

Definition at line 619 of file textEncoder.I.

void TextEncoder::make_lower ( )

Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly).

Definition at line 64 of file textEncoder.cxx.
References UnicodeLatinMap::Entry::_ascii_additional, UnicodeLatinMap::Entry::_ascii_equiv, _wtext, get_wtext(), UnicodeLatinMap::look_up(), NULL, and wstring.
Referenced by unicode_isdigit().

void TextEncoder::make_upper ( )

Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly).

Definition at line 42 of file textEncoder.cxx.
References _flags, _wtext, F_got_text, get_wtext(), and unicode_tolower().
Referenced by unicode_isalpha().

string TextEncoder::reencode_text ( const string & text,

TextEncoder::Encoding from,

TextEncoder::Encoding to

) [inline, static]

Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string.
This does not change or affect any properties on the TextEncoder itself.
Definition at line 418 of file textEncoder.I.

void TextEncoder::set_default_encoding ( TextEncoder::Encoding encoding ) [inline, static]

Specifies the default encoding to be used for all subsequently created TextEncoder objects.
See set_encoding().
Definition at line 100 of file textEncoder.I.
References _flags, _text, F_got_text, F_got_wtext, has_text(), and INLINE.

void TextEncoder::set_encoding ( TextEncoder::Encoding encoding ) [inline]

Specifies how the string set via set_text() is to be interpreted.
The default, E_iso8859, means a standard string with one-byte characters (i.e. ASCII). Other encodings are possible to take advantage of character sets with more than 256 characters.
This affects only future calls to set_text(); it does not change text that was set previously.
Definition at line 65 of file textEncoder.I.
Referenced by unicode_isalpha(), and unicode_isdigit().

void TextEncoder::set_text ( const string & text,

TextEncoder::Encoding encoding

) [inline]

The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string.
Subsequent calls to get_text() will return the same text re-encoded using whichever encoding is specified by set_encoding().
Definition at line 166 of file textEncoder.I.
References encode_wtext(), get_wtext(), and INLINE.

void TextEncoder::set_text ( const string & text ) [inline]

Changes the text that is stored in the encoder.
The text should be encoded according to the method indicated by set_encoding(). Subsequent calls to get_text() will return this same string, while get_wtext() will return the decoded version of the string.
Reimplemented in TextNode.
Definition at line 140 of file textEncoder.I.
References _flags, _text, _wtext, F_got_wtext, and INLINE.
Referenced by TextNode::get_card_as_set(), unicode_isalpha(), and unicode_isdigit().

void TextEncoder::set_unicode_char ( int index,

int character

) [inline]

Sets the Unicode value of the nth character in the stored text.
This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().
Definition at line 322 of file textEncoder.I.
References UnicodeLatinMap::Entry::_ascii_equiv, INLINE, UnicodeLatinMap::look_up(), and NULL.

void TextEncoder::set_wtext ( const wstring & wtext ) [inline]

Changes the text that is stored in the encoder.
Subsequent calls to get_wtext() will return this same string, while get_text() will return the encoded version of the string.
Reimplemented in TextNode.
Definition at line 657 of file textEncoder.I.
Referenced by get_default_encoding().

bool TextEncoder::unicode_isalpha ( int character ) [inline, static]

Returns true if the indicated character is an alphabetic letter, false otherwise.
This is akin to ctype's isalpha(), extended to Unicode.
Definition at line 436 of file textEncoder.I.
References get_text(), INLINE, make_upper(), set_encoding(), and set_text().

bool TextEncoder::unicode_isdigit ( int character ) [inline, static]

Returns true if the indicated character is a numeric digit, false otherwise.
This is akin to ctype's isdigit(), extended to Unicode.
Definition at line 458 of file textEncoder.I.
References get_text(), INLINE, make_lower(), set_encoding(), and set_text().

bool TextEncoder::unicode_islower ( int character ) [inline, static]

Returns true if the indicated character is a lowercase letter, false otherwise.
This is akin to ctype's islower(), extended to Unicode.
Definition at line 524 of file textEncoder.I.
References _encoding, and encode_wtext().

bool TextEncoder::unicode_ispunct ( int character ) [inline, static]

Returns true if the indicated character is a punctuation mark, false otherwise.
This is akin to ctype's ispunct(), extended to Unicode.
Definition at line 481 of file textEncoder.I.
References _flags, _wtext, F_got_text, F_got_wtext, has_text(), and INLINE.

bool TextEncoder::unicode_isupper ( int character ) [inline, static]

Returns true if the indicated character is an uppercase letter, false otherwise.
This is akin to ctype's isupper(), extended to Unicode.
Definition at line 503 of file textEncoder.I.

int TextEncoder::unicode_tolower ( int character ) [inline, static]

Returns the uppercase equivalent of the given Unicode character.
This is akin to ctype's tolower(), extended to Unicode.
Definition at line 566 of file textEncoder.I.
Referenced by make_upper().

int TextEncoder::unicode_toupper ( int character ) [inline, static]

Returns the uppercase equivalent of the given Unicode character.
This is akin to ctype's toupper(), extended to Unicode.
Definition at line 545 of file textEncoder.I.

string TextEncoder::upper ( const string & source,

TextEncoder::Encoding encoding

) [inline, static]

Converts the string to uppercase, assuming the string is encoded in the indicated encoding.

Definition at line 600 of file textEncoder.I.

string TextEncoder::upper ( const string & source ) [inline, static]

Converts the string to uppercase, assuming the string is encoded in the default encoding.

Definition at line 585 of file textEncoder.I.

Member Data Documentation

TextEncoder::Encoding TextEncoder::_default_encoding [static, private]

Definition at line 27 of file textEncoder.cxx.
Referenced by get_encoding().

Encoding TextEncoder::_encoding [private]

Definition at line 125 of file textEncoder.h.
Referenced by unicode_islower().

int TextEncoder::_flags [private]

Reimplemented in TextNode.
Definition at line 124 of file textEncoder.h.
Referenced by clear_text(), get_text(), has_text(), make_upper(), set_default_encoding(), set_text(), TextEncoder(), and unicode_ispunct().

string TextEncoder::_text [private]

Definition at line 126 of file textEncoder.h.
Referenced by clear_text(), set_default_encoding(), and set_text().

TypeHandle TextEncoder::_type_handle [static, private]

Reimplemented in TextNode.
Definition at line 26 of file textEncoder.cxx.

wstring TextEncoder::_wtext [private]

Definition at line 127 of file textEncoder.h.
Referenced by get_text(), has_text(), make_lower(), make_upper(), set_text(), and unicode_ispunct().

The documentation for this class was generated from the following files:

panda/src/express/textEncoder.h
panda/src/express/textEncoder.cxx
panda/src/express/textEncoder.I

Generated on Fri May 2 00:55:26 2003 for Panda by

1.3

TextEncoder Class Reference

Public Types

Public Member Functions

Static Public Member Functions

Private Types

Static Private Member Functions

Private Attributes

Static Private Attributes

Detailed Description

Member Enumeration Documentation

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation