Class CsvFormatDetector

java.lang.Object
com.univocity.parsers.csv.CsvFormatDetector
All Implemented Interfaces:
InputAnalysisProcess

public abstract class CsvFormatDetector extends Object implements InputAnalysisProcess
An InputAnalysisProcess to detect column delimiters, quotes and quote escapes in a CSV input.
  • Field Details

    • MAX_ROW_SAMPLES

      private final int MAX_ROW_SAMPLES
    • comment

      private final char comment
    • suggestedDelimiter

      private final char suggestedDelimiter
    • normalizedNewLine

      private final char normalizedNewLine
    • whitespaceRangeStart

      private final int whitespaceRangeStart
    • allowedDelimiters

      private char[] allowedDelimiters
    • delimiterPreference

      private char[] delimiterPreference
    • suggestedQuote

      private final char suggestedQuote
    • suggestedQuoteEscape

      private final char suggestedQuoteEscape
  • Constructor Details

    • CsvFormatDetector

      public CsvFormatDetector(int maxRowSamples, CsvParserSettings settings, int whitespaceRangeStart)
      Builds a new CsvFormatDetector
      Parameters:
      maxRowSamples - the number of row samples to collect before analyzing the statistics
      settings - the configuration provided by the user with potential defaults in case the detection is unable to discover the proper column delimiter or quote character.
      whitespaceRangeStart - starting range of characters considered to be whitespace.
  • Method Details

    • calculateTotals

      protected Map<Character,Integer> calculateTotals(List<Map<Character,Integer>> symbolsPerRow)
    • execute

      public void execute(char[] characters, int length)
      Description copied from interface: InputAnalysisProcess
      A sequence of characters of the input buffer to be analyzed.
      Specified by:
      execute in interface InputAnalysisProcess
      Parameters:
      characters - the input buffer
      length - the last character position loaded into the buffer.
    • pickDelimiter

      protected char pickDelimiter(Map<Character,Integer> sums, Map<Character,Integer> totals)
    • increment

      protected void increment(Map<Character,Integer> map, char symbol)
      Increments the number associated with a character in a map by 1
      Parameters:
      map - the map of characters and their numbers
      symbol - the character whose number should be increment
    • increment

      protected void increment(Map<Character,Integer> map, char symbol, int incrementSize)
      Increments the number associated with a character in a map
      Parameters:
      map - the map of characters and their numbers
      symbol - the character whose number should be increment
      incrementSize - the size of the increment
    • min

      protected char min(Map<Character,Integer> map, Map<Character,Integer> totals, char defaultChar)
      Returns the character with the lowest associated number.
      Parameters:
      map - the map of characters and their numbers
      defaultChar - the default character to return in case the map is empty
      Returns:
      the character with the lowest number associated.
    • max

      protected char max(Map<Character,Integer> map, Map<Character,Integer> totals, char defaultChar)
      Returns the character with the highest associated number.
      Parameters:
      map - the map of characters and their numbers
      defaultChar - the default character to return in case the map is empty
      Returns:
      the character with the highest number associated.
    • getChar

      protected char getChar(Map<Character,Integer> map, Map<Character,Integer> totals, char defaultChar, boolean min)
      Returns the character with the highest or lowest associated number.
      Parameters:
      map - the map of characters and their numbers
      defaultChar - the default character to return in case the map is empty
      min - a flag indicating whether to return the character associated with the lowest number in the map. If false then the character associated with the highest number found will be returned.
      Returns:
      the character with the highest/lowest number associated.
    • isSymbol

      protected boolean isSymbol(char ch)
    • isAllowedDelimiter

      protected boolean isAllowedDelimiter(char ch)
    • apply

      protected abstract void apply(char delimiter, char quote, char quoteEscape)
      Applies the discovered CSV format elements to the CsvParser
      Parameters:
      delimiter - the discovered delimiter character
      quote - the discovered quote character
      quoteEscape - the discovered quote escape character.