Package com.univocity.parsers.common
Class NormalizedString
java.lang.Object
com.univocity.parsers.common.NormalizedString
- All Implemented Interfaces:
Serializable
,CharSequence
,Comparable<NormalizedString>
public final class NormalizedString
extends Object
implements Serializable, Comparable<NormalizedString>, CharSequence
A
NormalizedString
allows representing text in a normalized fashion. Strings
with different character case or surrounding whitespace are considered the same.
Used to represent groups of fields, where users may refer to their names using
different character cases or whitespaces.
Where the character case or the surrounding space is relevant, the NormalizedString
will have its isLiteral()
method return true
, meaning the exact
character case and surrounding whitespaces are required for matching it.
Invoking valueOf(String)
with a String
surrounded by single quotes
will create a literal NormalizedString
. Use literalValueOf(String)
to obtain the same NormalizedString
without having to introduce single quotes.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final int
private final boolean
private final String
private final String
private static final long
private static final StringCache<NormalizedString>
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionchar
charAt
(int index) int
int
Compares aNormalizedString
against aString
lexicographically.boolean
static StringCache<NormalizedString>
getCache()
Returns the internal string cache to allow users to tweak its size limit or clear it when appropriateprivate static <T extends Collection<String>>
TgetCollection
(T out, NormalizedString... args) private static <T extends Collection<NormalizedString>>
TgetCollection
(T out, String... args) private static <T extends Collection<NormalizedString>>
TgetCollection
(T out, Collection<String> args) private static <T extends Collection<String>>
TgetStringCollection
(T out, Collection<NormalizedString> args) int
hashCode()
static boolean
identifyLiterals
(NormalizedString[] strings) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes.static boolean
identifyLiterals
(NormalizedString[] strings, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes.boolean
int
length()
static NormalizedString
literalValueOf
(String string) Creates a literalNormalizedString
, meaning it will only match with otherString
orNormalizedString
if they have the exact same content including character case and surrounding whitespaces.private String
private static boolean
shouldBeLiteral
(String string, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) subSequence
(int start, int end) static String[]
toArray
(NormalizedString... args) Converts multiple normalized strings into an array ofString
.static NormalizedString[]
Converts multiple plain strings into an array ofNormalizedString
.static NormalizedString[]
toArray
(Collection<String> args) Converts a collection of plain strings into an array ofNormalizedString
static ArrayList<NormalizedString>
toArrayList
(String... args) Converts multiple plain strings into anArrayList
ofNormalizedString
.static ArrayList<NormalizedString>
toArrayList
(Collection<String> args) Converts multiple plain strings into anArrayList
ofNormalizedString
.toArrayListOfStrings
(NormalizedString... args) Converts multiple normalized strings into aHashSet
ofString
.Converts multiple normalized strings into aHashSet
ofString
.static HashSet<NormalizedString>
Converts multiple plain strings into aHashSet
ofNormalizedString
.static HashSet<NormalizedString>
toHashSet
(Collection<String> args) Converts multiple plain strings into aHashSet
ofNormalizedString
.toHashSetOfStrings
(NormalizedString... args) Converts multiple normalized strings into aHashSet
ofString
.Converts multiple normalized strings into aHashSet
ofString
.static NormalizedString[]
toIdentifierGroupArray
(NormalizedString[] strings) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes.static NormalizedString[]
toIdentifierGroupArray
(String[] strings) Analyzes a group of String to identify any instances whose normalized content will generate clashes.static LinkedHashSet<NormalizedString>
toLinkedHashSet
(String... args) Converts multiple plain strings into aLinkedHashSet
ofNormalizedString
.static LinkedHashSet<NormalizedString>
toLinkedHashSet
(Collection<String> args) Converts multiple plain strings into aLinkedHashSet
ofNormalizedString
.static LinkedHashSet<String>
toLinkedHashSetOfStrings
(NormalizedString... args) Converts multiple normalized strings into aLinkedHashSet
ofString
.static LinkedHashSet<String>
Converts multiple normalized strings into aLinkedHashSet
ofString
.Returns the literal representation of thisNormalizedString
, meaning it will only match with otherString
orNormalizedString
if they have the exact same content including character case and surrounding whitespaces.toString()
static String[]
Converts a collection of normalized strings into an array ofString
static TreeSet<NormalizedString>
Converts multiple plain strings into aTreeSet
ofNormalizedString
.static TreeSet<NormalizedString>
toTreeSet
(Collection<String> args) Converts multiple plain strings into aTreeSet
ofNormalizedString
.toTreeSetOfStrings
(NormalizedString... args) Converts multiple normalized strings into aHashSet
ofString
.Converts multiple normalized strings into aHashSet
ofString
.static NormalizedString[]
toUniqueArray
(String... args) Converts multiple plain strings into an array ofNormalizedString
, ensuring no duplicateNormalizedString
elements exist, even if their originalString
s are different.static String
valueOf
(NormalizedString string) Converts aNormalizedString
back to its originalString
representationstatic NormalizedString
Creates a non-literalNormalizedString
, meaning it will match with otherString
orNormalizedString
regardless of different including character case and surrounding whitespaces.static NormalizedString
Creates a non-literalNormalizedString
, meaning it will match with otherString
orNormalizedString
regardless of different including character case and surrounding whitespaces.Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.CharSequence
chars, codePoints, isEmpty
-
Field Details
-
serialVersionUID
private static final long serialVersionUID- See Also:
-
stringCache
-
original
-
normalized
-
literal
private final boolean literal -
hashCode
private final int hashCode
-
-
Constructor Details
-
NormalizedString
-
-
Method Details
-
normalize
-
isLiteral
public boolean isLiteral() -
equals
-
hashCode
public int hashCode() -
length
public int length()- Specified by:
length
in interfaceCharSequence
-
charAt
public char charAt(int index) - Specified by:
charAt
in interfaceCharSequence
-
subSequence
- Specified by:
subSequence
in interfaceCharSequence
-
compareTo
- Specified by:
compareTo
in interfaceComparable<NormalizedString>
-
compareTo
Compares aNormalizedString
against aString
lexicographically.- Parameters:
o
- a plainString
- Returns:
- the result of
String.compareTo(String)
. If thisNormalizedString
is a literal, the original argument string will be compared. If thisNormalizedString
is not a literal, the result will be from the comparison of the normalized content of both strings (i.e. surrounding whitespaces and character case differences will be ignored).
-
toString
- Specified by:
toString
in interfaceCharSequence
- Overrides:
toString
in classObject
-
literalValueOf
Creates a literalNormalizedString
, meaning it will only match with otherString
orNormalizedString
if they have the exact same content including character case and surrounding whitespaces.- Parameters:
string
- the inputString
- Returns:
- the literal
NormalizedString
version of the given string.
-
valueOf
Creates a non-literalNormalizedString
, meaning it will match with otherString
orNormalizedString
regardless of different including character case and surrounding whitespaces. If the input value is enclosed with single quotes, a literalNormalizedString
will be returned, as described inliteralValueOf(String)
- Parameters:
o
- the input object whoseString
representation will be used- Returns:
- the
NormalizedString
of the given object.
-
valueOf
Creates a non-literalNormalizedString
, meaning it will match with otherString
orNormalizedString
regardless of different including character case and surrounding whitespaces. If the input string is enclosed with single quotes, a literalNormalizedString
will be returned, as described inliteralValueOf(String)
- Parameters:
string
- the input string- Returns:
- the
NormalizedString
of the given string.
-
valueOf
Converts aNormalizedString
back to its originalString
representation- Parameters:
string
- the normalized string- Returns:
- the original string used to create the given normalized representation.
-
toArray
Converts a collection of plain strings into an array ofNormalizedString
- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toStringArray
Converts a collection of normalized strings into an array ofString
- Parameters:
args
- the normalized strings to convert back to toString
- Returns:
- the
String
representations of all normalized strings.
-
toUniqueArray
Converts multiple plain strings into an array ofNormalizedString
, ensuring no duplicateNormalizedString
elements exist, even if their originalString
s are different.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toArray
Converts multiple plain strings into an array ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toArray
Converts multiple normalized strings into an array ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the
String
representations of all input strings.
-
getCollection
-
getCollection
private static <T extends Collection<NormalizedString>> T getCollection(T out, Collection<String> args) -
getCollection
-
getStringCollection
private static <T extends Collection<String>> T getStringCollection(T out, Collection<NormalizedString> args) -
toArrayList
Converts multiple plain strings into anArrayList
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toArrayList
Converts multiple plain strings into anArrayList
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toArrayListOfStrings
Converts multiple normalized strings into aHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toArrayListOfStrings
Converts multiple normalized strings into aHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toTreeSet
Converts multiple plain strings into aTreeSet
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toTreeSet
Converts multiple plain strings into aTreeSet
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toTreeSetOfStrings
Converts multiple normalized strings into aHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toTreeSetOfStrings
Converts multiple normalized strings into aHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toHashSet
Converts multiple plain strings into aHashSet
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toHashSet
Converts multiple plain strings into aHashSet
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toHashSetOfStrings
Converts multiple normalized strings into aHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toHashSetOfStrings
Converts multiple normalized strings into aHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toLinkedHashSet
Converts multiple plain strings into aLinkedHashSet
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toLinkedHashSet
Converts multiple plain strings into aLinkedHashSet
ofNormalizedString
.- Parameters:
args
- the strings to convert toNormalizedString
- Returns:
- the
NormalizedString
representations of all input strings.
-
toLinkedHashSetOfStrings
Converts multiple normalized strings into aLinkedHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toLinkedHashSetOfStrings
Converts multiple normalized strings into aLinkedHashSet
ofString
.- Parameters:
args
- the normalized strings to convert toString
- Returns:
- the original
String
s of all input normalized strings.
-
toLiteral
Returns the literal representation of thisNormalizedString
, meaning it will only match with otherString
orNormalizedString
if they have the exact same content including character case and surrounding whitespaces.- Returns:
- the literal representation of the current
NormalizedString
-
toIdentifierGroupArray
Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()
), making it possible to identify one from the other.- Parameters:
strings
- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.- Returns:
- the input string array, with
NormalizedString
literals in the positions where clashes would originally occur.
-
toIdentifierGroupArray
Analyzes a group of String to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()
), making it possible to identify one from the other.- Parameters:
strings
- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered.- Returns:
- a
NormalizedString
array with literals in the positions where clashes would originally occur.
-
identifyLiterals
Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()
), making it possible to identify one from the other.- Parameters:
strings
- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.- Returns:
true
if any entry has been modified to be a literal, otherwisefalse
-
identifyLiterals
public static boolean identifyLiterals(NormalizedString[] strings, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()
), making it possible to identify one from the other.- Parameters:
strings
- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.lowercaseIdentifiers
- flag indicating that identifiers are stored in lower case (for compatibility with databases). If a string has a uppercase character, it means it must become a literal.uppercaseIdentifiers
- flag indicating that identifiers are stored in upper case (for compatibility with databases). If a string has a lowercase character, it means it must become a literal.- Returns:
true
if any entry has been modified to be a literal, otherwisefalse
-
shouldBeLiteral
private static boolean shouldBeLiteral(String string, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) -
getCache
Returns the internal string cache to allow users to tweak its size limit or clear it when appropriate- Returns:
- the string cache used to store
NormalizedString
instances associated with their originalString
.
-