Package com.ibm.icu.text
Class CharsetRecog_2022
java.lang.Object
com.ibm.icu.text.CharsetRecognizer
com.ibm.icu.text.CharsetRecog_2022
- Direct Known Subclasses:
CharsetRecog_2022.CharsetRecog_2022CN
,CharsetRecog_2022.CharsetRecog_2022JP
,CharsetRecog_2022.CharsetRecog_2022KR
class CharsetRecog_2022 part of the ICU charset detection implementation.
This is a superclass for the individual detectors for
each of the detectable members of the ISO 2022 family
of encodings.
The separate classes are nested within this class.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static class
(package private) static class
(package private) static class
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) int
match
(byte[] text, int textLen, byte[][] escapeSequences) Matching function shared among the 2022 detectors JP, CN and KR Counts up the number of legal an unrecognized escape sequences in the sample of text, and computes a score based on the total number invalid input: '&' the proportion that fit the encoding.Methods inherited from class com.ibm.icu.text.CharsetRecognizer
getLanguage, getName, match
-
Constructor Details
-
CharsetRecog_2022
CharsetRecog_2022()
-
-
Method Details
-
match
int match(byte[] text, int textLen, byte[][] escapeSequences) Matching function shared among the 2022 detectors JP, CN and KR Counts up the number of legal an unrecognized escape sequences in the sample of text, and computes a score based on the total number invalid input: '&' the proportion that fit the encoding.- Parameters:
text
- the byte buffer containing text to analysetextLen
- the size of the text in the byte.escapeSequences
- the byte escape sequences to test for.- Returns:
- match quality, in the range of 0-100.
-