libtranscript
 All Data Structures Functions Variables Enumerations Enumerator Modules
Data Structures | Macros | Enumerations | Functions
Native transcript interface.

Data Structures

struct  transcript_name_t
 A structure holding a display name and availability information about a converter. More...
 
struct  transcript_t
 An opaque structure describing a converter and its state. More...
 

Macros

#define TRANSCRIPT_MIN_BUFFER_SIZE
 Minimum required size for an output buffer for either transcript_to_unicode or transcript_from_unicode, if M:N conversion are allowed. More...
 
#define TRANSCRIPT_MIN_CODEPAGE_BUFFER_SIZE
 Minimum required size for an output buffer for transcript_from_unicode, if M:N conversion are allowed. More...
 
#define TRANSCRIPT_MIN_UNICODE_BUFFER_SIZE
 Minimum required size for an output buffer for transcript_to_unicode, if M:N conversion are allowed. More...
 
#define TRANSCRIPT_SAVE_STATE_SIZE
 Required size of a buffer for saving converter state. More...
 
#define TRANSCRIPT_VERSION
 The version of libtranscript encoded as a single integer. More...
 

Enumerations

enum  transcript_error_t {
  TRANSCRIPT_SUCCESS, TRANSCRIPT_NO_SPACE, TRANSCRIPT_INCOMPLETE, TRANSCRIPT_FALLBACK,
  TRANSCRIPT_UNASSIGNED, TRANSCRIPT_ILLEGAL, TRANSCRIPT_ILLEGAL_END, TRANSCRIPT_INTERNAL_ERROR,
  TRANSCRIPT_PRIVATE_USE, TRANSCRIPT_ERRNO, TRANSCRIPT_BAD_ARG, TRANSCRIPT_OUT_OF_MEMORY,
  TRANSCRIPT_INVALID_FORMAT, TRANSCRIPT_TRUNCATED_MAP, TRANSCRIPT_WRONG_VERSION, TRANSCRIPT_INTERNAL_TABLE,
  TRANSCRIPT_DLOPEN_FAILURE, TRANSCRIPT_CONVERTER_DISABLED, TRANSCRIPT_PACKAGE_FILE, TRANSCRIPT_INIT_DLFCN,
  TRANSCRIPT_NOT_INITIALIZED, TRANSCRIPT_PART_SUCCESS_MAX
}
 Error values. More...
 
enum  transcript_flags_t {
  TRANSCRIPT_ALLOW_FALLBACK, TRANSCRIPT_SUBST_UNASSIGNED, TRANSCRIPT_SUBST_ILLEGAL, TRANSCRIPT_ALLOW_PRIVATE_USE,
  TRANSCRIPT_FILE_START, TRANSCRIPT_END_OF_TEXT, TRANSCRIPT_SINGLE_CONVERSION, TRANSCRIPT_NO_MN_CONVERSION,
  TRANSCRIPT_NO_1N_CONVERSION
}
 Flags for converters and conversions. More...
 

Functions

void transcript_close_converter (transcript_t *handle)
 Close a converter. More...
 
int transcript_equal (const char *name_a, const char *name_b)
 Check if two names describe the same converter. More...
 
void transcript_finalize (void)
 Finalize the library use. More...
 
transcript_error_t transcript_from_unicode (transcript_t *handle, const char **inbuf, const char *inbuflimit, char **outbuf, const char *outbuflimit, int flags)
 Convert a buffer from Unicode to a chararcter set. More...
 
transcript_error_t transcript_from_unicode_flush (transcript_t *handle, char **outbuf, const char *outbuflimit)
 Write out any bytes required to create a legal output in a character set. More...
 
void transcript_from_unicode_reset (transcript_t *handle)
 Reset the from-Unicode conversion to its initial state. More...
 
transcript_error_t transcript_from_unicode_skip (transcript_t *handle, const char **inbuf, const char *inbuflimit)
 Skip the next character in Unicode encoding. More...
 
const char * transcript_get_codeset (void)
 Get a character string describing the current character set indicated by the environment. More...
 
const transcript_name_ttranscript_get_names (int *count)
 Retrieve the list of display names known to this instantiation of the library. More...
 
long transcript_get_version (void)
 Get the value of TRANSCRIPT_VERSION corresponding to the actually used library. More...
 
transcript_error_t transcript_handle_unassigned (transcript_t *handle, uint32_t codepoint, char **outbuf, const char *outbuflimit, int flags)
 Handle an unassigned codepoint in a from-Unicode conversion. More...
 
transcript_error_t transcript_init (void)
 Initialize the library. More...
 
void transcript_load_state (transcript_t *handle, void *state)
 Restore a converter's state. More...
 
void transcript_normalize_name (const char *name, char *normalized_name, size_t normalized_name_max)
 Normalize a character set name. More...
 
transcript_ttranscript_open_converter (const char *name, transcript_utf_t utf_type, int flags, transcript_error_t *error)
 Open a converter. More...
 
int transcript_probe_converter (const char *name)
 Check if a named converter is available. More...
 
void transcript_save_state (transcript_t *handle, void *state)
 Save a converter's state. More...
 
const char * transcript_strerror (transcript_error_t error)
 Get a localized descriptive string for an error code. More...
 
transcript_error_t transcript_to_unicode (transcript_t *handle, const char **inbuf, const char *inbuflimit, char **outbuf, const char *outbuflimit, int flags)
 Convert a buffer from a chararcter set to Unicode. More...
 
void transcript_to_unicode_reset (transcript_t *handle)
 Reset the to-Unicode conversion to its initial state. More...
 
transcript_error_t transcript_to_unicode_skip (transcript_t *handle, const char **inbuf, const char *inbuflimit)
 Skip the next character in character set encoding. More...
 

Detailed Description

Macro Definition Documentation

#define TRANSCRIPT_MIN_BUFFER_SIZE

Minimum required size for an output buffer for either transcript_to_unicode or transcript_from_unicode, if M:N conversion are allowed.

#define TRANSCRIPT_MIN_CODEPAGE_BUFFER_SIZE

Minimum required size for an output buffer for transcript_from_unicode, if M:N conversion are allowed.

#define TRANSCRIPT_MIN_UNICODE_BUFFER_SIZE

Minimum required size for an output buffer for transcript_to_unicode, if M:N conversion are allowed.

#define TRANSCRIPT_SAVE_STATE_SIZE

Required size of a buffer for saving converter state.

#define TRANSCRIPT_VERSION

The version of libtranscript encoded as a single integer.

The least significant 8 bits represent the patch level. The second 8 bits represent the minor version. The third 8 bits represent the major version.

At runtime, the value of TRANSCRIPT_VERSION can be retrieved by calling transcript_get_version.

Enumeration Type Documentation

Error values.

Enumerator
TRANSCRIPT_SUCCESS 

All OK.

TRANSCRIPT_NO_SPACE 

There was no space left in the output buffer.

TRANSCRIPT_INCOMPLETE 

The buffer ended with an incomplete sequence, or more data was needed to verify a M:N conversion.

TRANSCRIPT_FALLBACK 

The next character to convert is a fallback mapping.

TRANSCRIPT_UNASSIGNED 

The next character to convert is an unassigned sequence.

TRANSCRIPT_ILLEGAL 

The input is an illegal sequence.

TRANSCRIPT_ILLEGAL_END 

The end of the input does not form a valid sequence.

TRANSCRIPT_INTERNAL_ERROR 

The transcript library screwed up; no recovery possible.

TRANSCRIPT_PRIVATE_USE 

The next character to convert maps to a private use codepoint.

TRANSCRIPT_ERRNO 

See errno for error code.

TRANSCRIPT_BAD_ARG 

Bad argument.

TRANSCRIPT_OUT_OF_MEMORY 

Out of memory.

TRANSCRIPT_INVALID_FORMAT 

Invalid format while reading conversion map.

TRANSCRIPT_TRUNCATED_MAP 

Tried to read a truncated conversion map.

TRANSCRIPT_WRONG_VERSION 

Conversion map is of an unsupported version.

TRANSCRIPT_INTERNAL_TABLE 

Tried to load a table that is for internal use only.

TRANSCRIPT_DLOPEN_FAILURE 

Opening if the plugin failed.

TRANSCRIPT_CONVERTER_DISABLED 

The converter has been explicitly disabled.

TRANSCRIPT_PACKAGE_FILE 

The converter name references a converter package file, not an actual converter.

TRANSCRIPT_INIT_DLFCN 

Could not initialize dynamic module loading functionality.

TRANSCRIPT_NOT_INITIALIZED 

transcript_init has not been called yet.

TRANSCRIPT_PART_SUCCESS_MAX 

Highest error code which indicates success or end-of-buffer.

Flags for converters and conversions.

Enumerator
TRANSCRIPT_ALLOW_FALLBACK 

Include fallback characters in the conversion.

This flag is only used by transcript_from_unicode.

TRANSCRIPT_SUBST_UNASSIGNED 

Automatically replace unmappable characters by substitute characters.

TRANSCRIPT_SUBST_ILLEGAL 

Automatically insert a substitution character on illegal input.

TRANSCRIPT_ALLOW_PRIVATE_USE 

Allow private-use mappings.

If not allowed, they are handled like unassigned sequences, with the exception that they return a different error..

TRANSCRIPT_FILE_START 

The begining of the input buffer is the begining of a file and a BOM should be expected/generated.

TRANSCRIPT_END_OF_TEXT 

The end of the input buffer is the end of the text.

This flag is only valid when passed to transcript_from_unicode or transcript_to_unicode.

Note
This flag is only used to determine whether an incomplete sequence at the end of the buffer is allowed or not. Clients still need to call transcript_from_unicode_flush to properly end the output buffer.
TRANSCRIPT_SINGLE_CONVERSION 

Only convert the next character, then return (useful for handling fallback/unassigned characters etc).

This flag is only valid when passed to transcript_from_unicode or transcript_to_unicode.

TRANSCRIPT_NO_MN_CONVERSION 

Do not use M:N conversions.

This flag is only valid when passed to transcript_from_unicode or transcript_to_unicode.

TRANSCRIPT_NO_1N_CONVERSION 

Do not use 1:N conversions.

Implies TRANSCRIPT_NO_MN_CONVERSION.

   This flag is only valid when passed to ::transcript_from_unicode or ::transcript_to_unicode.

Function Documentation

void transcript_close_converter ( transcript_t handle)

Close a converter.

Parameters
handleThe converter to close.

This function releases all memory associated with handle. handle may be NULL.

int transcript_equal ( const char *  name_a,
const char *  name_b 
)

Check if two names describe the same converter.

Parameters
name_a
name_b
Returns
1 if name_a and name_b describe the same converter, 0 otherwise.
void transcript_finalize ( void  )

Finalize the library use.

This function will release all memory used by the library when this function has been called as many times as transcript_init has been called. Calling this function is not necessary, but may be useful when trying to find memory leaks.

transcript_error_t transcript_from_unicode ( transcript_t handle,
const char **  inbuf,
const char *  inbuflimit,
char **  outbuf,
const char *  outbuflimit,
int  flags 
)

Convert a buffer from Unicode to a chararcter set.

Parameters
handleThe converter to use.
inbufA double pointer to the start of the input buffer.
inbuflimitA pointer to the end of the input buffer.
outbufA double pointer to the start of the output buffer.
outbuflimitA pointer to the end of the output buffer.
flagsFlags for this conversion (see transcript_flags_t for possible values).
Return values
TRANSCRIPT_SUCCESS
TRANSCRIPT_NO_SPACE
TRANSCRIPT_INCOMPLETE
TRANSCRIPT_FALLBACK
TRANSCRIPT_UNASSIGNED
TRANSCRIPT_ILLEGAL
TRANSCRIPT_ILLEGAL_END
TRANSCRIPT_INTERNAL_ERROR
TRANSCRIPT_PRIVATE_USE 

This function uses the converter indicated by handle to convert data from Unicode to the character set named in opening handle. The interface is designed to work with incomplete buffers, and may return TRANSCRIPT_INCOMPLETE if the bytes at the end of the input buffer do not form a complete sequence. If the output buffer is not large enough to store all the converted data, TRANSCRIPT_NO_SPACE is returned.

If M:N conversions are enabled, the output buffer must be able to hold at least 32 bytes (TRANSCRIPT_MIN_CODEPAGE_BUFFER_SIZE).

transcript_error_t transcript_from_unicode_flush ( transcript_t handle,
char **  outbuf,
const char *  outbuflimit 
)

Write out any bytes required to create a legal output in a character set.

Parameters
handleThe converter to use.
outbufA double pointer to the start of the output buffer.
outbuflimitA pointer to the end of the output buffer.
Return values
TRANSCRIPT_SUCCESS
TRANSCRIPT_NO_SPACE
TRANSCRIPT_INTERNAL_ERROR 

Some stateful encoding converters need to store a shift sequence or some closing bytes at the end of the output, that can only be computed when it is known that there is no more input. For efficiency reasons, this is not done based on the TRANSCRIPT_END_OF_TEXT flag in transcript_from_unicode.

After calling this function, the from-Unicode conversion will be in the initial state.

void transcript_from_unicode_reset ( transcript_t handle)

Reset the from-Unicode conversion to its initial state.

Parameters
handleThe converter to reset.
Note
The to-Unicode and from-Unicode conversions are reset separately.
transcript_error_t transcript_from_unicode_skip ( transcript_t handle,
const char **  inbuf,
const char *  inbuflimit 
)

Skip the next character in Unicode encoding.

Parameters
handleThe converter to use.
inbufA double pointer to the start of the input buffer.
inbuflimitA pointer to the end of the input buffer.
Return values
TRANSCRIPT_SUCCESS
TRANSCRIPT_INCOMPLETE
TRANSCRIPT_INTERNAL_ERROR 

This function can be used to recover stopped from-Unicode conversions, if the next input character can not be converted (either because the input is corrupt, or the conversions are not permitted by the flag settings).

const char * transcript_get_codeset ( void  )

Get a character string describing the current character set indicated by the environment.

Returns
A pointer to a string with the current character set. This string is allocated statically, and may be overwritten by subsequent calls to this function, setlocale or nl_langinfo.

Essentially this function does the same as nl_langinfo(CODESET). However, nl_langinfo may not be available. In those cases, it uses setlocale to retrieve the current value for LC_CTYPE, and tries to retrieve the character set in that. If all else fails, it returns a string representing the ASCII character set.

const transcript_name_t * transcript_get_names ( int *  count)

Retrieve the list of display names known to this instantiation of the library.

Parameters
countA location to store the number of names returned.
Returns
An array of transcript_name_t structures listing the known converters.
long transcript_get_version ( void  )

Get the value of TRANSCRIPT_VERSION corresponding to the actually used library.

Returns
The value of TRANSCRIPT_VERSION.

This function can be useful to determine at runtime what version of the library was linked to the program. Although currently there are no known uses for this information, future library additions may prompt library users to want to operate differently depending on the available features.

transcript_error_t transcript_handle_unassigned ( transcript_t handle,
uint32_t  codepoint,
char **  outbuf,
const char *  outbuflimit,
int  flags 
)

Handle an unassigned codepoint in a from-Unicode conversion.

This function does a lookup in the generic fall-back table. If no generic fall-back is found, this function simply returns TRANSCRIPT_UNASSIGNED. Otherwise, it handles conversion of the generic fall-back as if it were specified in the converter table.

transcript_error_t transcript_init ( void  )

Initialize the library.

This function must be called before calling any other function of the library. It is safe to call this function more than once.

void transcript_load_state ( transcript_t handle,
void *  state 
)

Restore a converter's state.

Parameters
handleThe converter to restore the state for.
stateA pointer to a buffer of at least TRANSCRIPT_SAVE_STATE_SIZE bytes.
void transcript_normalize_name ( const char *  name,
char *  normalized_name,
size_t  normalized_name_max 
)

Normalize a character set name.

Parameters
nameThe name to normalize.
normalized_nameA pointer to a buffer to store the normalized name.
normalized_name_maxThe size of normalized_name.

Any characters in name other than the letters 'a'-'z' (either upper or lower case), and the numbers '0'-'9' are ignored. Furthermore, leading zeros in numbers are ignored as well. The stored result will be nul terminated.

transcript_t * transcript_open_converter ( const char *  name,
transcript_utf_t  utf_type,
int  flags,
transcript_error_t error 
)

Open a converter.

Parameters
nameThe name of the converter to open.
utf_typeThe UTF type to use for representing Unicode codepoints.
flagsThe default flags for the converter (see transcript_flags_t for possible values).
errorThe location to store a possible error code.

The name of the converter is in principle free-form. A list of known names can be retrieved through transcript_get_names. The name argument is passed through transcript_normalize_name first, and at most 79 characters of the normalized name are considered.

int transcript_probe_converter ( const char *  name)

Check if a named converter is available.

Parameters
nameThe name of the converter to check.
Returns
1 if the converter is avaible, 0 otherwise.
void transcript_save_state ( transcript_t handle,
void *  state 
)

Save a converter's state.

Parameters
handleThe converter to save the state for.
stateA pointer to a buffer of at least TRANSCRIPT_SAVE_STATE_SIZE bytes.
const char * transcript_strerror ( transcript_error_t  error)

Get a localized descriptive string for an error code.

Parameters
errorThe error code to retrieve the descriptive string for.
Returns
A static string containing a localized descriptive string.
transcript_error_t transcript_to_unicode ( transcript_t handle,
const char **  inbuf,
const char *  inbuflimit,
char **  outbuf,
const char *  outbuflimit,
int  flags 
)

Convert a buffer from a chararcter set to Unicode.

Parameters
handleThe converter to use.
inbufA double pointer to the start of the input buffer.
inbuflimitA pointer to the end of the input buffer.
outbufA double pointer to the start of the output buffer.
outbuflimitA pointer to the end of the output buffer.
flagsFlags for this conversion (see transcript_flags_t for possible values).
Return values
TRANSCRIPT_SUCCESS
TRANSCRIPT_NO_SPACE
TRANSCRIPT_INCOMPLETE
TRANSCRIPT_FALLBACK
TRANSCRIPT_UNASSIGNED
TRANSCRIPT_ILLEGAL
TRANSCRIPT_ILLEGAL_END
TRANSCRIPT_INTERNAL_ERROR
TRANSCRIPT_PRIVATE_USE 

This function uses the converter indicated by handle to convert data from the character set named in opening handle to Unicode. The interface is designed to work with incomplete buffers, and may return TRANSCRIPT_INCOMPLETE if the bytes at the end of the input buffer do not form a complete sequence. If the output buffer is not large enough to store all the converted data, TRANSCRIPT_NO_SPACE is returned.

If M:N conversions are enabled, the output buffer must be able to hold at least 20 codepoints. This is guaranteed if the size of the output buffer is at least 80 (TRANSCRIPT_MIN_UNICODE_BUFFER_SIZE) bytes.

void transcript_to_unicode_reset ( transcript_t handle)

Reset the to-Unicode conversion to its initial state.

Parameters
handleThe converter to reset.
Note
The to-Unicode and from-Unicode conversions are reset separately.
transcript_error_t transcript_to_unicode_skip ( transcript_t handle,
const char **  inbuf,
const char *  inbuflimit 
)

Skip the next character in character set encoding.

Parameters
handleThe converter to use.
inbufA double pointer to the start of the input buffer.
inbuflimitA pointer to the end of the input buffer.
Return values
TRANSCRIPT_SUCCESS
TRANSCRIPT_INCOMPLETE
TRANSCRIPT_INTERNAL_ERROR 

This function can be used to recover stopped to-Unicode conversions, if the next input character can not be converted (either because the input is corrupt, or the conversions are not permitted by the flag settings).