Class NormalizedString

java.lang.Object
com.univocity.parsers.common.NormalizedString
All Implemented Interfaces:
Serializable, CharSequence, Comparable<NormalizedString>

public final class NormalizedString extends Object implements Serializable, Comparable<NormalizedString>, CharSequence
A NormalizedString allows representing text in a normalized fashion. Strings with different character case or surrounding whitespace are considered the same. Used to represent groups of fields, where users may refer to their names using different character cases or whitespaces. Where the character case or the surrounding space is relevant, the NormalizedString will have its isLiteral() method return true, meaning the exact character case and surrounding whitespaces are required for matching it. Invoking valueOf(String) with a String surrounded by single quotes will create a literal NormalizedString. Use literalValueOf(String) to obtain the same NormalizedString without having to introduce single quotes.
See Also:
  • Field Details

    • serialVersionUID

      private static final long serialVersionUID
      See Also:
    • stringCache

      private static final StringCache<NormalizedString> stringCache
    • original

      private final String original
    • normalized

      private final String normalized
    • literal

      private final boolean literal
    • hashCode

      private final int hashCode
  • Constructor Details

    • NormalizedString

      private NormalizedString(String string)
  • Method Details

    • normalize

      private String normalize(Object value)
    • isLiteral

      public boolean isLiteral()
    • equals

      public boolean equals(Object anObject)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • length

      public int length()
      Specified by:
      length in interface CharSequence
    • charAt

      public char charAt(int index)
      Specified by:
      charAt in interface CharSequence
    • subSequence

      public CharSequence subSequence(int start, int end)
      Specified by:
      subSequence in interface CharSequence
    • compareTo

      public int compareTo(NormalizedString o)
      Specified by:
      compareTo in interface Comparable<NormalizedString>
    • compareTo

      public int compareTo(String o)
      Compares a NormalizedString against a String lexicographically.
      Parameters:
      o - a plain String
      Returns:
      the result of String.compareTo(String). If this NormalizedString is a literal, the original argument string will be compared. If this NormalizedString is not a literal, the result will be from the comparison of the normalized content of both strings (i.e. surrounding whitespaces and character case differences will be ignored).
    • toString

      public String toString()
      Specified by:
      toString in interface CharSequence
      Overrides:
      toString in class Object
    • literalValueOf

      public static NormalizedString literalValueOf(String string)
      Creates a literal NormalizedString, meaning it will only match with other String or NormalizedString if they have the exact same content including character case and surrounding whitespaces.
      Parameters:
      string - the input String
      Returns:
      the literal NormalizedString version of the given string.
    • valueOf

      public static NormalizedString valueOf(Object o)
      Creates a non-literal NormalizedString, meaning it will match with other String or NormalizedString regardless of different including character case and surrounding whitespaces. If the input value is enclosed with single quotes, a literal NormalizedString will be returned, as described in literalValueOf(String)
      Parameters:
      o - the input object whose String representation will be used
      Returns:
      the NormalizedString of the given object.
    • valueOf

      public static NormalizedString valueOf(String string)
      Creates a non-literal NormalizedString, meaning it will match with other String or NormalizedString regardless of different including character case and surrounding whitespaces. If the input string is enclosed with single quotes, a literal NormalizedString will be returned, as described in literalValueOf(String)
      Parameters:
      string - the input string
      Returns:
      the NormalizedString of the given string.
    • valueOf

      public static String valueOf(NormalizedString string)
      Converts a NormalizedString back to its original String representation
      Parameters:
      string - the normalized string
      Returns:
      the original string used to create the given normalized representation.
    • toArray

      public static NormalizedString[] toArray(Collection<String> args)
      Converts a collection of plain strings into an array of NormalizedString
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toStringArray

      public static String[] toStringArray(Collection<NormalizedString> args)
      Converts a collection of normalized strings into an array of String
      Parameters:
      args - the normalized strings to convert back to to String
      Returns:
      the String representations of all normalized strings.
    • toUniqueArray

      public static NormalizedString[] toUniqueArray(String... args)
      Converts multiple plain strings into an array of NormalizedString, ensuring no duplicate NormalizedString elements exist, even if their original Strings are different.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toArray

      public static NormalizedString[] toArray(String... args)
      Converts multiple plain strings into an array of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toArray

      public static String[] toArray(NormalizedString... args)
      Converts multiple normalized strings into an array of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the String representations of all input strings.
    • getCollection

      private static <T extends Collection<NormalizedString>> T getCollection(T out, String... args)
    • getCollection

      private static <T extends Collection<NormalizedString>> T getCollection(T out, Collection<String> args)
    • getCollection

      private static <T extends Collection<String>> T getCollection(T out, NormalizedString... args)
    • getStringCollection

      private static <T extends Collection<String>> T getStringCollection(T out, Collection<NormalizedString> args)
    • toArrayList

      public static ArrayList<NormalizedString> toArrayList(String... args)
      Converts multiple plain strings into an ArrayList of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toArrayList

      public static ArrayList<NormalizedString> toArrayList(Collection<String> args)
      Converts multiple plain strings into an ArrayList of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toArrayListOfStrings

      public static ArrayList<String> toArrayListOfStrings(NormalizedString... args)
      Converts multiple normalized strings into a HashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toArrayListOfStrings

      public static ArrayList<String> toArrayListOfStrings(Collection<NormalizedString> args)
      Converts multiple normalized strings into a HashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toTreeSet

      public static TreeSet<NormalizedString> toTreeSet(String... args)
      Converts multiple plain strings into a TreeSet of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toTreeSet

      public static TreeSet<NormalizedString> toTreeSet(Collection<String> args)
      Converts multiple plain strings into a TreeSet of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toTreeSetOfStrings

      public static TreeSet<String> toTreeSetOfStrings(NormalizedString... args)
      Converts multiple normalized strings into a HashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toTreeSetOfStrings

      public static TreeSet<String> toTreeSetOfStrings(Collection<NormalizedString> args)
      Converts multiple normalized strings into a HashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toHashSet

      public static HashSet<NormalizedString> toHashSet(String... args)
      Converts multiple plain strings into a HashSet of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toHashSet

      public static HashSet<NormalizedString> toHashSet(Collection<String> args)
      Converts multiple plain strings into a HashSet of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toHashSetOfStrings

      public static HashSet<String> toHashSetOfStrings(NormalizedString... args)
      Converts multiple normalized strings into a HashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toHashSetOfStrings

      public static HashSet<String> toHashSetOfStrings(Collection<NormalizedString> args)
      Converts multiple normalized strings into a HashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toLinkedHashSet

      public static LinkedHashSet<NormalizedString> toLinkedHashSet(String... args)
      Converts multiple plain strings into a LinkedHashSet of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toLinkedHashSet

      public static LinkedHashSet<NormalizedString> toLinkedHashSet(Collection<String> args)
      Converts multiple plain strings into a LinkedHashSet of NormalizedString.
      Parameters:
      args - the strings to convert to NormalizedString
      Returns:
      the NormalizedString representations of all input strings.
    • toLinkedHashSetOfStrings

      public static LinkedHashSet<String> toLinkedHashSetOfStrings(NormalizedString... args)
      Converts multiple normalized strings into a LinkedHashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toLinkedHashSetOfStrings

      public static LinkedHashSet<String> toLinkedHashSetOfStrings(Collection<NormalizedString> args)
      Converts multiple normalized strings into a LinkedHashSet of String.
      Parameters:
      args - the normalized strings to convert to String
      Returns:
      the original Strings of all input normalized strings.
    • toLiteral

      public NormalizedString toLiteral()
      Returns the literal representation of this NormalizedString, meaning it will only match with other String or NormalizedString if they have the exact same content including character case and surrounding whitespaces.
      Returns:
      the literal representation of the current NormalizedString
    • toIdentifierGroupArray

      public static NormalizedString[] toIdentifierGroupArray(NormalizedString[] strings)
      Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (using toLiteral()), making it possible to identify one from the other.
      Parameters:
      strings - a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.
      Returns:
      the input string array, with NormalizedString literals in the positions where clashes would originally occur.
    • toIdentifierGroupArray

      public static NormalizedString[] toIdentifierGroupArray(String[] strings)
      Analyzes a group of String to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (using toLiteral()), making it possible to identify one from the other.
      Parameters:
      strings - a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered.
      Returns:
      a NormalizedString array with literals in the positions where clashes would originally occur.
    • identifyLiterals

      public static boolean identifyLiterals(NormalizedString[] strings)
      Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (using toLiteral()), making it possible to identify one from the other.
      Parameters:
      strings - a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.
      Returns:
      true if any entry has been modified to be a literal, otherwise false
    • identifyLiterals

      public static boolean identifyLiterals(NormalizedString[] strings, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers)
      Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (using toLiteral()), making it possible to identify one from the other.
      Parameters:
      strings - a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.
      lowercaseIdentifiers - flag indicating that identifiers are stored in lower case (for compatibility with databases). If a string has a uppercase character, it means it must become a literal.
      uppercaseIdentifiers - flag indicating that identifiers are stored in upper case (for compatibility with databases). If a string has a lowercase character, it means it must become a literal.
      Returns:
      true if any entry has been modified to be a literal, otherwise false
    • shouldBeLiteral

      private static boolean shouldBeLiteral(String string, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers)
    • getCache

      public static StringCache<NormalizedString> getCache()
      Returns the internal string cache to allow users to tweak its size limit or clear it when appropriate
      Returns:
      the string cache used to store NormalizedString instances associated with their original String.