Package com.ibm.icu.impl
Class TrieBuilder
java.lang.Object
com.ibm.icu.impl.TrieBuilder
- Direct Known Subclasses:
IntTrieBuilder
Builder class to manipulate and generate a trie.
This is useful for ICU data in primitive types.
Provides a compact way to store information that is indexed by Unicode
values, such as character properties, types, keyboard values, etc. This is
very useful when you have a block of Unicode data that contains significant
values while the rest of the Unicode data is unused in the application or
when you have a lot of redundance, such as where all 21,000 Han ideographs
have the same value. However, lookup is much faster than a hash table.
A trie of any primitive data type serves two purposes:
- Fast access of the indexed values.
- Smaller memory footprint.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
Character data in com.ibm.impl.Trie have different user-specified format for different purposes. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final int
Length of the BMP portion of the index (stage 1) array.static final int
Number of data values in a stage 2 (data array) block.protected static final int
The alignment size of a stage 2 data block.protected static final int
Shift size for shifting left the index array values.protected int
protected int
protected int[]
Index values at build-time are 32 bits wide for easier processing.protected int
protected boolean
protected boolean
protected int[]
Map of adjusted indexes, used in utrie_compact().protected static final int
Mask for getting the lower bits from the input index.private static final int
Maximum length of the build-time data (stage 2) array.protected static final int
Maximum length of the runtime data (stage 2) array.protected static final int
Length of the index (stage 1) array before folding.protected static final int
If set, then the data (stage 2) array is 32 bits wide.protected static final int
Shifting to position the index value in optionsprotected static final int
If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.protected static final int
Shift size for shifting right the input index.protected static final int
Number of index (stage 1) entries per lead surrogate. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected static final boolean
equal_int
(int[] array, int start1, int start2, int length) Compare two sections of an array for equality.protected static final int
findSameIndexBlock
(int[] index, int indexLength, int otherBlock) Finds the same index block as the otherBlockprotected void
Set a value in the trie index map to indicate which data block is referenced and which one is not.boolean
isInZeroBlock
(int ch) Checks if the character belongs to a zero block in the trie
-
Field Details
-
DATA_BLOCK_LENGTH
public static final int DATA_BLOCK_LENGTHNumber of data values in a stage 2 (data array) block. 2, 4, 8, .., 0x200- See Also:
-
m_index_
protected int[] m_index_Index values at build-time are 32 bits wide for easier processing. Bit 31 is set if the data block is used by multiple index values (from setRange()). -
m_indexLength_
protected int m_indexLength_ -
m_dataCapacity_
protected int m_dataCapacity_ -
m_dataLength_
protected int m_dataLength_ -
m_isLatin1Linear_
protected boolean m_isLatin1Linear_ -
m_isCompacted_
protected boolean m_isCompacted_ -
m_map_
protected int[] m_map_Map of adjusted indexes, used in utrie_compact(). Maps from original indexes to new ones. -
SHIFT_
protected static final int SHIFT_Shift size for shifting right the input index. 1..9- See Also:
-
MAX_INDEX_LENGTH_
protected static final int MAX_INDEX_LENGTH_Length of the index (stage 1) array before folding. Maximum number of Unicode code points (0x110000) shifted right by SHIFT.- See Also:
-
BMP_INDEX_LENGTH_
protected static final int BMP_INDEX_LENGTH_Length of the BMP portion of the index (stage 1) array.- See Also:
-
SURROGATE_BLOCK_COUNT_
protected static final int SURROGATE_BLOCK_COUNT_Number of index (stage 1) entries per lead surrogate. Same as number of indexe entries for 1024 trail surrogates, ==0x400>>UTRIE_SHIFT 10 - SHIFT == Number of bits of a trail surrogate that are used in index table lookups.- See Also:
-
MASK_
protected static final int MASK_Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH - 1.- See Also:
-
INDEX_SHIFT_
protected static final int INDEX_SHIFT_Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by UTRIE_DATA_GRANULARITY. 0..UTRIE_SHIFT- See Also:
-
MAX_DATA_LENGTH_
protected static final int MAX_DATA_LENGTH_Maximum length of the runtime data (stage 2) array. Limited by 16-bit index values that are left-shifted by INDEX_SHIFT_.- See Also:
-
OPTIONS_INDEX_SHIFT_
protected static final int OPTIONS_INDEX_SHIFT_Shifting to position the index value in options- See Also:
-
OPTIONS_DATA_IS_32_BIT_
protected static final int OPTIONS_DATA_IS_32_BIT_If set, then the data (stage 2) array is 32 bits wide.- See Also:
-
OPTIONS_LATIN1_IS_LINEAR_
protected static final int OPTIONS_LATIN1_IS_LINEAR_If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.- See Also:
-
DATA_GRANULARITY_
protected static final int DATA_GRANULARITY_The alignment size of a stage 2 data block. Also the granularity for compaction.- See Also:
-
MAX_BUILD_TIME_DATA_LENGTH_
private static final int MAX_BUILD_TIME_DATA_LENGTH_Maximum length of the build-time data (stage 2) array. The maximum length is 0x110000 + DATA_BLOCK_LENGTH + 0x400. (Number of Unicode code points + one all-initial-value block + possible duplicate entries for 1024 lead surrogates.)- See Also:
-
-
Constructor Details
-
TrieBuilder
protected TrieBuilder() -
TrieBuilder
-
-
Method Details
-
isInZeroBlock
public boolean isInZeroBlock(int ch) Checks if the character belongs to a zero block in the trie- Parameters:
ch
- codepoint which data is to be retrieved- Returns:
- true if ch is in the zero block
-
equal_int
protected static final boolean equal_int(int[] array, int start1, int start2, int length) Compare two sections of an array for equality. -
findUnusedBlocks
protected void findUnusedBlocks()Set a value in the trie index map to indicate which data block is referenced and which one is not. utrie_compact() will remove data blocks that are not used at all. Set - 0 if it is used - -1 if it is not used -
findSameIndexBlock
protected static final int findSameIndexBlock(int[] index, int indexLength, int otherBlock) Finds the same index block as the otherBlock- Parameters:
index
- arrayindexLength
- size of indexotherBlock
-- Returns:
- same index block
-