Class CsvParser


public final class CsvParser extends AbstractParser<CsvParserSettings>
A very fast CSV parser implementation.
See Also:
  • Field Details

    • parseUnescapedQuotes

      private boolean parseUnescapedQuotes
    • parseUnescapedQuotesUntilDelimiter

      private boolean parseUnescapedQuotesUntilDelimiter
    • backToDelimiter

      private boolean backToDelimiter
    • doNotEscapeUnquotedValues

      private final boolean doNotEscapeUnquotedValues
    • keepEscape

      private final boolean keepEscape
    • keepQuotes

      private final boolean keepQuotes
    • unescaped

      private boolean unescaped
    • prev

      private char prev
    • delimiter

      private char delimiter
    • multiDelimiter

      private char[] multiDelimiter
    • quote

      private char quote
    • quoteEscape

      private char quoteEscape
    • escapeEscape

      private char escapeEscape
    • newLine

      private char newLine
    • whitespaceAppender

      private final DefaultCharAppender whitespaceAppender
    • normalizeLineEndingsInQuotes

      private final boolean normalizeLineEndingsInQuotes
    • quoteHandling

      private UnescapedQuoteHandling quoteHandling
    • nullValue

      private final String nullValue
    • maxColumnLength

      private final int maxColumnLength
    • emptyValue

      private final String emptyValue
    • trimQuotedLeading

      private final boolean trimQuotedLeading
    • trimQuotedTrailing

      private final boolean trimQuotedTrailing
    • delimiters

      private char[] delimiters
    • match

      private int match
    • formatDetectorRowSampleCount

      private int formatDetectorRowSampleCount
  • Constructor Details

    • CsvParser

      public CsvParser(CsvParserSettings settings)
      The CsvParser supports all settings provided by CsvParserSettings, and requires this configuration to be properly initialized.
      Parameters:
      settings - the parser configuration
  • Method Details

    • parseRecord

      protected final void parseRecord()
      Description copied from class: AbstractParser
      Parser-specific implementation for reading a single record from the input.

      The AbstractParser handles the initialization and processing of the input until it is ready to be parsed.

      It then delegates the input to the parser-specific implementation defined by AbstractParser.parseRecord(). In general, an implementation of AbstractParser.parseRecord() will perform the following steps:

      • Test the character stored in ch and take some action on it (e.g. is while (ch != '\n'){doSomething()})
      • Request more characters by calling ch = input.nextChar();
      • Append the desired characters to the output by executing, for example, output.appender.append(ch)
      • Notify a value of the record has been fully read by executing output.valueParsed(). This will clear the output appender (CharAppender) so the next call to output.appender.append(ch) will be store the character of the next parsed value
      • Rinse and repeat until all values of the record are parsed

      Once the AbstractParser.parseRecord() returns, the AbstractParser takes over and handles the information (generally, reorganizing it and passing it on to a RowProcessor).

      After the record processing, the AbstractParser reads the next characters from the input, delegating control again to the parseRecord() implementation for processing of the next record.

      This cycle repeats until the reading process is stopped by the user, the input is exhausted, or an error happens.

      In case of errors, the unchecked exception TextParsingException will be thrown and all resources in use will be closed automatically unless CommonParserSettings.isAutoClosingEnabled() evaluates to false. The exception should contain the cause and more information about where in the input the error happened.

      Specified by:
      parseRecord in class AbstractParser<CsvParserSettings>
      See Also:
    • parseSingleDelimiterRecord

      private final void parseSingleDelimiterRecord()
    • skipValue

      private void skipValue()
    • handleValueSkipping

      private void handleValueSkipping(boolean quoted)
    • handleUnescapedQuoteInValue

      private void handleUnescapedQuoteInValue()
    • nextDelimiter

      private int nextDelimiter()
    • handleUnescapedQuote

      private boolean handleUnescapedQuote()
    • processQuoteEscape

      private void processQuoteEscape()
    • parseValueProcessingEscape

      private void parseValueProcessingEscape()
    • parseQuotedValue

      private void parseQuotedValue()
    • getInputAnalysisProcess

      protected final InputAnalysisProcess getInputAnalysisProcess()
      Description copied from class: AbstractParser
      Allows the parser implementation to traverse the input buffer before the parsing process starts, in order to enable automatic configuration and discovery of data formats.
      Overrides:
      getInputAnalysisProcess in class AbstractParser<CsvParserSettings>
      Returns:
      a custom implementation of InputAnalysisProcess. By default, null is returned and no special input analysis will be performed.
    • getDetectedFormat

      public final CsvFormat getDetectedFormat()
      Returns the CSV format detected when one of the following settings is enabled: The detected format will be available once the parsing process is initialized (i.e. when runs.
      Returns:
      the detected CSV format, or null if no detection has been enabled or if the parsing process has not been started yet.
    • consumeValueOnEOF

      protected final boolean consumeValueOnEOF()
      Description copied from class: AbstractParser
      Allows the parser implementation to handle any value that was being consumed when the end of the input was reached
      Overrides:
      consumeValueOnEOF in class AbstractParser<CsvParserSettings>
      Returns:
      a flag indicating whether the parser was processing a value when the end of the input was reached.
    • updateFormat

      public final void updateFormat(CsvFormat format)
      Allows changing the format of the input on the fly.
      Parameters:
      format - the new format to use.
    • skipWhitespace

      private void skipWhitespace()
    • saveMatchingCharacters

      private void saveMatchingCharacters()
    • matchDelimiter

      private boolean matchDelimiter()
    • matchDelimiterAfterQuote

      private boolean matchDelimiterAfterQuote()
    • parseMultiDelimiterRecord

      private void parseMultiDelimiterRecord()
    • appendUntilMultiDelimiter

      private void appendUntilMultiDelimiter()
    • parseQuotedValueMultiDelimiter

      private void parseQuotedValueMultiDelimiter()
    • parseValueProcessingEscapeMultiDelimiter

      private void parseValueProcessingEscapeMultiDelimiter()