Index

C/C++ Functions

lxBidiApplyPlus

DT_SWORD lxBidiApplyPlus(DT_SLONG nr_of_chars, const DT_UBYTE* chars, DT_ID_SWORD chars_format, DT_ID_UBYTE types[], DT_ID_UBYTE levels[], DT_ID_UBYTE start_level, DT_SWORD flags)

Short Description: Unicode Bidirectional (BiDi) algorithm

This function applies the Unicode Bidirectional Text Algorithm (BiDi) to a run of text (which is supplied as an array of Unicode characters).

Parameters

ParameterDescription

nr_of_chars

The number of characters in the input chars array and the size of the types and levels arrays.

chars

A pointer to a memory buffer that holds the Unicode characters (input array).

chars_format

Format of the characters in the memory buffer. Can be one of the following:

  • 1 (LX_FORMAT_UBYTE) = indicates that the size of each character in the buffer is 1 byte. All characters have codepoints in the 0 - 255 range.

  • 2 (LX_FORMAT_UWORD_LE) = indicates that the size of each character in the buffer is 2 bytes (little endian byte ordering). All characters have codepoints in the 0 - 65,535 range.

  • -2 (LX_FORMAT_UWORD_BE) = indicates that the size of each character in the buffer is 2 bytes (big endian byte ordering). All characters have codepoints in the 0 - 65,535 range.

  • 3 (LX_FORMAT_UTRIO_LE) = indicates that the size of each character in the buffer is 3 bytes (little endian byte ordering). All characters have codepoints in the 0 - 16,777,216 range.

  • -3 (LX_FORMAT_UTRIO_BE) = indicates that the size of each character in the buffer is 3 bytes (big endian byte ordering). All characters have codepoints in the 0 - 16,777,216 range.

  • 4 (LX_FORMAT_ULONG_LE) = indicates that the size of each character in the buffer is 4 bytes (little endian byte ordering). All characters have codepoints in the 0 - 4,294,967,296 range.

  • -4 (LX_FORMAT_ULONG_BE) = indicates that the size of each character in the buffer is 4 bytes (big endian byte ordering). All characters have codepoints in the 0 - 4,294,967,296 range.

  • 102 (LX_FORMAT_UWORD) = indicates that the size of each character in the buffer is 2 bytes (CPU byte ordering). All characters have codepoints in the 0 - 65,535 range.

  • 104 (LX_FORMAT_ULONG) = indicates that the size of each character in the buffer is 4 bytes (CPU byte ordering). All characters have codepoints in the 0 - 4,294,967,296 range.

  • 124 (LX_FORMAT_UTF16) = indicates the buffer is in UTF-16 format. Each character is encoded with one or two DT_ID_UWORD code units (using CPU specific byte ordering) as per UTF-16 encoding scheme. This variable-length encoding format is capable of encoding all 1,112,064 possible Unicode characters.

types

The resulting array of character types (output array). Each element of this array identifies the Unicode Script of the corresponding Unicode character in the chars array.

levels

The resulting array of character directional levels (output array). Each element of this array describes the directional level of the corresponding Unicode character in the chars array.

start_level

Start embedding level for the run of text supplied via the chars array. Set to an even value (0, 2, 4...60) for left-to-right. Set to an odd value (1, 3..61) for right-to-left. The value 255 is special: it means divide the supplied text into paragraphs and determine the each paragraph's embedding level by finding the first character in the paragraph with a strong bidirectional category. If the character is strongly left-to-right, the paragraph's embedding level will be 0, otherwise (i.e. if the character is strongly right-to-left), the paragraph's embedding level will be 1.

Review the Comments section below for more information on Unicode's bidirectional types and embedding levels.

flags

Flags to configure the behavior of the function.

  • If flags = 0, the function does nothing. In this case, all elements of the types and levels arrays are set to 0.
  • If flags = 1, the function applies BiDi. In this case, each element of the levels array is the directional level of the corresponding character in the chars array. All elements of the types array are set to 0.
  • If flags = 2, the function applies Unicode Script analysis. In this case, each element of the types array identifies the Unicode Script of the corresponding character in the chars array. All elements of the levels array are set to 0.
  • If flags = 3, the function applies BiDi and Unicode Script analysis. In this case, each element of the levels array is the directional level of the corresponding character in the chars array and each element of the types array identifies the Unicode Script of the corresponding character in the chars array.

The chars, types and levels arrays are allocated and freed by the user. Their size must be nr_of_chars.

Return Value

If successful, the function returns 1. If not successful (e.g. an error occurs or an invalid input parameter is supplied), the function returns 0.

Comments

This function implements the following rules of the Unicode Bidirectional Text Algorithm:

The function does not implement the L1, L2, L3 and L4 rules (Reordering Resolved Levels) because these rules act on a per-line basis and are applied after any line wrapping is applied to the paragraph. More details on the Unicode Bidirectional Text Algorithm can be found in the Unicode Standard and/or on the Unicode website.

Bidirectional Types

Unicode characters have a "bidirectional type". There are many types, but they are divided into three categories: strong, weak, and neutral.

Embedding Levels

The Unicode Bidirectional Algorithm works in terms of "levels" of right-to-left text embedded with left-to-right text, and vice versa.

Text at an even level is rendered left-to-right. Text at an odd level is rendered right-to-left.

The Unicode Bidirectional Algorithm works on paragraphs, so the first step is to divide text into paragraphs. The paragraph embedding level can be determined by finding the first character in the paragraph with a strong bidirectional category. If the character is strongly left-to-right, the paragraph embedding level is 0, otherwise (i.e. if the character is strongly right-to-left), the embedding level is 1.

Embedding goes on from there: contained text with the opposite directionality is at the next embedding level, and text with the original directionality that is contained by the text with the opposite directionality is at the next lowest embedding level.

 

Index