Class TransliteratorParser
java.lang.Object
com.ibm.icu.text.TransliteratorParser
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate classThis class implements the SymbolTable interface.private static classRuleBody subclass for a String[] array.private static classA private abstract class representing the interface to rule source code that is broken up into lines.private static classA class representing one side of a rule. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final charprivate static final charprivate static final charprivate static final charprivate static final charPUBLIC data member containing the parsed compound filter, if any.private static final charprivate static final charprivate RuleBasedTransliterator.DataThe current data object for which we are parsing rulesprivate static final charprivate static final charPUBLIC data member.private intprivate static final charprivate static final Stringprivate intThe stand-in character for the 'dot' set, represented by '.' in patterns.private static final charprivate static final charprivate static final charprivate static final charprivate static final charprivate static final Stringprivate static final Stringprivate static final intPUBLIC data member.private static UnicodeSetprivate static UnicodeSetprivate static UnicodeSetprivate static final charprivate static final charprivate static final Stringprivate TransliteratorParser.ParseDataTemporary symbol table used during parsing.private static final charprivate static final charprivate static final charprivate static final charprivate static final charprivate List<StringMatcher> Vector of StringMatcher objects for segments.private StringBuilderString of standins for segments.private StringWhen we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];".private static final charprivate charThe last available stand-in for variables.Temporary table of variable names.private charThe next available stand-in for variables.Temporary vector of set variables.private static final char -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate voidappendVariableDef(String name, StringBuilder buf) Append the value of the given variable name to the given StringBuilder.private voidcheckVariableRange(int ch, String rule, int start) Assert that the given character is NOT within the variable range.(package private) chargenerateStandInFor(Object obj) Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer.(package private) charReturn the stand-in for the dot set.chargetSegmentStandin(int seg) Return the standin for segment seg (1-based).voidParse a set of rules.private intparsePragma(String rule, int pos, int limit) Parse a pragma.private intMAIN PARSER.(package private) voidparseRules(TransliteratorParser.RuleBody ruleArray, int dir) Parse an array of zero or more rules.private final charparseSet(String rule, ParsePosition pos) Parse a UnicodeSet out, store it, and return the stand-in character used to represent it.private voidpragmaMaximumBackup(int backup) Set the maximum backup to 'backup', in response to a pragma statement.private voidBegin normalizing all rules using the given mode, in response to a pragma statement.(package private) static booleanresemblesPragma(String rule, int pos, int limit) Return true if the given rule looks like a pragma.(package private) static final intvoidsetSegmentObject(int seg, StringMatcher obj) Set the object for segment seg (1-based).private voidsetVariableRange(int start, int end) Set the variable range to [start, end] (inclusive).(package private) static final voidsyntaxError(String msg, String rule, int start) Throw an exception indicating a syntax error.
-
Field Details
-
dataVector
PUBLIC data member. A Vector of RuleBasedTransliterator.Data objects, one for each discrete group of rules in the rule set -
idBlockVector
-
curData
The current data object for which we are parsing rules -
compoundFilter
PUBLIC data member containing the parsed compound filter, if any. -
direction
private int direction -
parseData
Temporary symbol table used during parsing. -
variablesVector
-
variableNames
-
segmentStandins
String of standins for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc. -
segmentObjects
Vector of StringMatcher objects for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc. -
variableNext
private char variableNextThe next available stand-in for variables. This starts at some point in the private use area (discovered dynamically) and increments up towardvariableLimit. At any point during parsing, available variables arevariableNext..variableLimit-1. -
variableLimit
private char variableLimitThe last available stand-in for variables. This is discovered dynamically. At any point during parsing, available variables arevariableNext..variableLimit-1. During variable definition we use the special value variableLimit-1 as a placeholder. -
undefinedVariableName
When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];". Instead, we save the name of the undefined variable, and substitute in the placeholder char variableLimit - 1, and decrement variableLimit. -
dotStandIn
private int dotStandInThe stand-in character for the 'dot' set, represented by '.' in patterns. This is allocated the first time it is needed, and reused thereafter. -
ID_TOKEN
- See Also:
-
ID_TOKEN_LEN
private static final int ID_TOKEN_LEN- See Also:
-
VARIABLE_DEF_OP
private static final char VARIABLE_DEF_OP- See Also:
-
FORWARD_RULE_OP
private static final char FORWARD_RULE_OP- See Also:
-
REVERSE_RULE_OP
private static final char REVERSE_RULE_OP- See Also:
-
FWDREV_RULE_OP
private static final char FWDREV_RULE_OP- See Also:
-
OPERATORS
- See Also:
-
HALF_ENDERS
- See Also:
-
QUOTE
private static final char QUOTE- See Also:
-
ESCAPE
private static final char ESCAPE- See Also:
-
END_OF_RULE
private static final char END_OF_RULE- See Also:
-
RULE_COMMENT_CHAR
private static final char RULE_COMMENT_CHAR- See Also:
-
CONTEXT_ANTE
private static final char CONTEXT_ANTE- See Also:
-
CONTEXT_POST
private static final char CONTEXT_POST- See Also:
-
CURSOR_POS
private static final char CURSOR_POS- See Also:
-
CURSOR_OFFSET
private static final char CURSOR_OFFSET- See Also:
-
ANCHOR_START
private static final char ANCHOR_START- See Also:
-
KLEENE_STAR
private static final char KLEENE_STAR- See Also:
-
ONE_OR_MORE
private static final char ONE_OR_MORE- See Also:
-
ZERO_OR_ONE
private static final char ZERO_OR_ONE- See Also:
-
DOT
private static final char DOT- See Also:
-
DOT_SET
- See Also:
-
SEGMENT_OPEN
private static final char SEGMENT_OPEN- See Also:
-
SEGMENT_CLOSE
private static final char SEGMENT_CLOSE- See Also:
-
FUNCTION
private static final char FUNCTION- See Also:
-
ALT_REVERSE_RULE_OP
private static final char ALT_REVERSE_RULE_OP- See Also:
-
ALT_FORWARD_RULE_OP
private static final char ALT_FORWARD_RULE_OP- See Also:
-
ALT_FWDREV_RULE_OP
private static final char ALT_FWDREV_RULE_OP- See Also:
-
ALT_FUNCTION
private static final char ALT_FUNCTION- See Also:
-
ILLEGAL_TOP
-
ILLEGAL_SEG
-
ILLEGAL_FUNC
-
-
Constructor Details
-
TransliteratorParser
public TransliteratorParser()Constructor.
-
-
Method Details
-
parse
Parse a set of rules. After the parse completes, examine the public data members for results. -
parseRules
Parse an array of zero or more rules. The strings in the array are treated as if they were concatenated together, with rule terminators inserted between array elements if not present already. Any previous rules are discarded. Typically this method is called exactly once, during construction. The member this.data will be set to null if there are no rules.- Throws:
IllegalIcuArgumentException- if there is a syntax error in the rules
-
parseRule
MAIN PARSER. Parse the next rule in the given rule string, starting at pos. Return the index after the last character parsed. Do not parse characters at or after limit. Important: The character at pos must be a non-whitespace character that is not the comment character. This method handles quoting, escaping, and whitespace removal. It parses the end-of-rule character. It recognizes context and cursor indicators. Once it does a lexical breakdown of the rule at pos, it creates a rule object and adds it to our rule list. This method is tightly coupled to the inner class RuleHalf. -
setVariableRange
private void setVariableRange(int start, int end) Set the variable range to [start, end] (inclusive). -
checkVariableRange
Assert that the given character is NOT within the variable range. If it is, signal an error. This is necessary to ensure that the variable range does not overlap characters used in a rule. -
pragmaMaximumBackup
private void pragmaMaximumBackup(int backup) Set the maximum backup to 'backup', in response to a pragma statement. -
pragmaNormalizeRules
Begin normalizing all rules using the given mode, in response to a pragma statement. -
resemblesPragma
Return true if the given rule looks like a pragma.- Parameters:
pos- offset to the first non-whitespace character of the rule.limit- pointer past the last character of the rule.
-
parsePragma
Parse a pragma. This method assumes resemblesPragma() has already returned true.- Parameters:
pos- offset to the first non-whitespace character of the rule.limit- pointer past the last character of the rule.- Returns:
- the position index after the final ';' of the pragma, or -1 on failure.
-
syntaxError
Throw an exception indicating a syntax error. Search the rule string for the probable end of the rule. Of course, if the error is that the end of rule marker is missing, then the rule end will not be found. In any case the rule start will be correctly reported.- Parameters:
msg- error descriptionrule- pattern stringstart- position of first character of current rule
-
ruleEnd
-
parseSet
Parse a UnicodeSet out, store it, and return the stand-in character used to represent it. -
generateStandInFor
Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer. Store the object. -
getSegmentStandin
public char getSegmentStandin(int seg) Return the standin for segment seg (1-based). -
setSegmentObject
Set the object for segment seg (1-based). -
getDotStandIn
char getDotStandIn()Return the stand-in for the dot set. It is allocated the first time and reused thereafter. -
appendVariableDef
Append the value of the given variable name to the given StringBuilder.- Throws:
IllegalIcuArgumentException- if the name is unknown.
-