Class CsvParserSettings

All Implemented Interfaces:
Cloneable

public class CsvParserSettings extends CommonParserSettings<CsvFormat>
This is the configuration class used by the CSV parser (CsvParser)

In addition to the configuration options provided by CommonParserSettings, the CSVParserSettings include:

  • emptyValue (defaults to null): Defines a replacement string to signify an empty value (which is not a null value)

    When reading, if the parser does not read any character from the input, and the input is within quotes, the empty is used instead of an empty string

See Also:
  • Field Details

    • emptyValue

      private String emptyValue
    • parseUnescapedQuotes

      private boolean parseUnescapedQuotes
    • parseUnescapedQuotesUntilDelimiter

      private boolean parseUnescapedQuotesUntilDelimiter
    • escapeUnquotedValues

      private boolean escapeUnquotedValues
    • keepEscapeSequences

      private boolean keepEscapeSequences
    • keepQuotes

      private boolean keepQuotes
    • normalizeLineEndingsWithinQuotes

      private boolean normalizeLineEndingsWithinQuotes
    • ignoreTrailingWhitespacesInQuotes

      private boolean ignoreTrailingWhitespacesInQuotes
    • ignoreLeadingWhitespacesInQuotes

      private boolean ignoreLeadingWhitespacesInQuotes
    • delimiterDetectionEnabled

      private boolean delimiterDetectionEnabled
    • quoteDetectionEnabled

      private boolean quoteDetectionEnabled
    • unescapedQuoteHandling

      private UnescapedQuoteHandling unescapedQuoteHandling
    • delimitersForDetection

      private char[] delimitersForDetection
    • formatDetectorRowSampleCount

      private int formatDetectorRowSampleCount
  • Constructor Details

    • CsvParserSettings

      public CsvParserSettings()
  • Method Details

    • getEmptyValue

      public String getEmptyValue()
      Returns the String representation of an empty value (defaults to null)

      When reading, if the parser does not read any character from the input, and the input is within quotes, the empty is used instead of an empty string

      Returns:
      the String representation of an empty value
    • setEmptyValue

      public void setEmptyValue(String emptyValue)
      Sets the String representation of an empty value (defaults to null)

      When reading, if the parser does not read any character from the input, and the input is within quotes, the empty is used instead of an empty string

      Parameters:
      emptyValue - the String representation of an empty value
    • newCharAppender

      protected CharAppender newCharAppender()
      Returns an instance of CharAppender with the configured limit of maximum characters per column and the default value used to represent an empty value (when the String parsed from the input, within quotes, is empty)

      This overrides the parent's version because the CSV parser does not rely on the appender to identify null values, but on the other hand, the appender is required to identify empty values.

      Overrides:
      newCharAppender in class CommonParserSettings<CsvFormat>
      Returns:
      an instance of CharAppender with the configured limit of maximum characters per column and the default value used to represent an empty value (when the String parsed from the input, within quotes, is empty)
    • createDefaultFormat

      protected CsvFormat createDefaultFormat()
      Returns the default CsvFormat configured to handle CSV inputs compliant to the RFC4180 standard.
      Specified by:
      createDefaultFormat in class CommonSettings<CsvFormat>
      Returns:
      and instance of CsvFormat configured to handle CSV inputs compliant to the RFC4180 standard.
    • isParseUnescapedQuotes

      @Deprecated public boolean isParseUnescapedQuotes()
      Deprecated.
      use getUnescapedQuoteHandling() instead. The configuration returned by getUnescapedQuoteHandling() will override this setting if not null.
      Indicates whether the CSV parser should accept unescaped quotes inside quoted values and parse them normally. Defaults to true.
      Returns:
      a flag indicating whether or not the CSV parser should accept unescaped quotes inside quoted values.
    • setParseUnescapedQuotes

      @Deprecated public void setParseUnescapedQuotes(boolean parseUnescapedQuotes)
      Deprecated.
      use setUnescapedQuoteHandling(UnescapedQuoteHandling) instead. The configuration returned by getUnescapedQuoteHandling() will override this setting if not null.
      Configures how to handle unescaped quotes inside quoted values. If set to true, the parser will parse the quote normally as part of the value. If set the false, a TextParsingException will be thrown. Defaults to true.
      Parameters:
      parseUnescapedQuotes - indicates whether or not the CSV parser should accept unescaped quotes inside quoted values.
    • setParseUnescapedQuotesUntilDelimiter

      @Deprecated public void setParseUnescapedQuotesUntilDelimiter(boolean parseUnescapedQuotesUntilDelimiter)
      Deprecated.
      use setUnescapedQuoteHandling(UnescapedQuoteHandling) instead. The configuration returned by getUnescapedQuoteHandling() will override this setting if not null.
      Configures the parser to process values with unescaped quotes, and stop accumulating characters and consider the value parsed when a delimiter is found. (defaults to true)
      Parameters:
      parseUnescapedQuotesUntilDelimiter - a flag indicating that the parser should stop accumulating values when a field delimiter character is found when parsing unquoted and unescaped values.
    • isParseUnescapedQuotesUntilDelimiter

      @Deprecated public boolean isParseUnescapedQuotesUntilDelimiter()
      Deprecated.
      use getUnescapedQuoteHandling() instead. The configuration returned by getUnescapedQuoteHandling() will override this setting if not null.
      When parsing unescaped quotes, indicates the parser should stop accumulating characters and consider the value parsed when a delimiter is found. (defaults to true)
      Returns:
      a flag indicating that the parser should stop accumulating values when a field delimiter character is found when parsing unquoted and unescaped values.
    • isEscapeUnquotedValues

      public boolean isEscapeUnquotedValues()
      Indicates whether escape sequences should be processed in unquoted values. Defaults to false.

      By default, this is disabled and if the input is A""B,C, the resulting value will be [A""B] and [C] (i.e. the content is read as-is). However, if the parser is configured to process escape sequences in unquoted values, the result will be [A"B] and [C]

      Returns:
      true if escape sequences should be processed in unquoted values, otherwise false
    • setEscapeUnquotedValues

      public void setEscapeUnquotedValues(boolean escapeUnquotedValues)
      Configures the parser to process escape sequences in unquoted values. Defaults to false.

      By default, this is disabled and if the input is A""B,C, the resulting value will be [A""B] and [C] (i.e. the content is read as-is). However, if the parser is configured to process escape sequences in unquoted values, the result will be [A"B] and [C]

      Parameters:
      escapeUnquotedValues - a flag indicating whether escape sequences should be processed in unquoted values
    • isKeepEscapeSequences

      public final boolean isKeepEscapeSequences()
      Indicates whether the parser should keep any escape sequences if they are present in the input (i.e. a quote escape sequence such as two double quotes "" won't be replaced by a single double quote ").

      This is disabled by default

      Returns:
      a flag indicating whether escape sequences should be kept (and not replaced) by the parser.
    • setKeepEscapeSequences

      public final void setKeepEscapeSequences(boolean keepEscapeSequences)
      Configures the parser to keep any escape sequences if they are present in the input (i.e. a quote escape sequence such as 2 double quotes "" won't be replaced by a single double quote ").

      This is disabled by default

      Parameters:
      keepEscapeSequences - the flag indicating whether escape sequences should be kept (and not replaced) by the parser.
    • isDelimiterDetectionEnabled

      public final boolean isDelimiterDetectionEnabled()
      Returns a flag indicating whether the parser should analyze the input to discover the column delimiter character.

      Note that the detection process is not guaranteed to discover the correct column delimiter. In this case the delimiter provided by CsvFormat.getDelimiter() will be used

      Returns:
      a flag indicating whether the parser should analyze the input to discover the column delimiter character.
    • setDelimiterDetectionEnabled

      public final void setDelimiterDetectionEnabled(boolean separatorDetectionEnabled)
      Configures the parser to analyze the input before parsing to discover the column delimiter character.

      Note that the detection process is not guaranteed to discover the correct column delimiter. The first character in the list of delimiters allowed for detection will be used, if available, otherwise the delimiter returned by CsvFormat.getDelimiter() will be used.

      Parameters:
      separatorDetectionEnabled - the flag to enable/disable discovery of the column delimiter character. to true, in order of priority.
    • setDelimiterDetectionEnabled

      public final void setDelimiterDetectionEnabled(boolean separatorDetectionEnabled, char... delimitersForDetection)
      Configures the parser to analyze the input before parsing to discover the column delimiter character.

      Note that the detection process is not guaranteed to discover the correct column delimiter. The first character in the list of delimiters allowed for detection will be used, if available, otherwise the delimiter returned by CsvFormat.getDelimiter() will be used.

      Parameters:
      separatorDetectionEnabled - the flag to enable/disable discovery of the column delimiter character.
      delimitersForDetection - possible delimiters for detection when isDelimiterDetectionEnabled() evaluates to true, in order of priority.
    • isQuoteDetectionEnabled

      public final boolean isQuoteDetectionEnabled()
      Returns a flag indicating whether the parser should analyze the input to discover the quote character. The quote escape will also be detected as part of this process.

      Note that the detection process is not guaranteed to discover the correct quote & escape. In this case the characters provided by CsvFormat.getQuote() and CsvFormat.getQuoteEscape() will be used

      Returns:
      a flag indicating whether the parser should analyze the input to discover the quote character. The quote escape will also be detected as part of this process.
    • setQuoteDetectionEnabled

      public final void setQuoteDetectionEnabled(boolean quoteDetectionEnabled)
      Configures the parser to analyze the input before parsing to discover the quote character. The quote escape will also be detected as part of this process.

      Note that the detection process is not guaranteed to discover the correct quote & escape. In this case the characters provided by CsvFormat.getQuote() and CsvFormat.getQuoteEscape() will be used

      Parameters:
      quoteDetectionEnabled - the flag to enable/disable discovery of the quote character. The quote escape will also be detected as part of this process.
    • detectFormatAutomatically

      public final void detectFormatAutomatically()
      Convenience method to turn on all format detection features in a single method call, namely:
    • detectFormatAutomatically

      public final void detectFormatAutomatically(char... delimitersForDetection)
      Convenience method to turn on all format detection features in a single method call, namely:
      Parameters:
      delimitersForDetection - possible delimiters for detection, in order of priority.
    • isNormalizeLineEndingsWithinQuotes

      public boolean isNormalizeLineEndingsWithinQuotes()
      Flag indicating whether the parser should replace line separators, specified in Format.getLineSeparator() by the normalized line separator character specified in Format.getNormalizedNewline(), even on quoted values. This is enabled by default and is used to ensure data be read on any platform without introducing unwanted blank lines. For example, consider the quoted value "Line1 \r\n Line2". If this is parsed using "\r\n" as the line separator sequence, and the normalized new line is set to '\n' (the default), the output will be: [Line1 \n Line2] However, if the value is meant to be kept untouched, and the original line separator should be maintained, set the normalizeLineEndingsWithinQuotes to false. This will make the parser read the value as-is, producing: [Line1 \r\n Line2]
      Returns:
      true if line separators in quoted values will be normalized, false otherwise
    • setNormalizeLineEndingsWithinQuotes

      public void setNormalizeLineEndingsWithinQuotes(boolean normalizeLineEndingsWithinQuotes)
      Configures the parser to replace line separators, specified in Format.getLineSeparator() by the normalized line separator character specified in Format.getNormalizedNewline(), even on quoted values. This is enabled by default and is used to ensure data be read on any platform without introducing unwanted blank lines. For example, consider the quoted value "Line1 \r\n Line2". If this is parsed using "\r\n" as the line separator sequence, and the normalized new line is set to '\n' (the default), the output will be: [Line1 \n Line2] However, if the value is meant to be kept untouched, and the original line separator should be maintained, set the normalizeLineEndingsWithinQuotes to false. This will make the parser read the value as-is, producing: [Line1 \r\n Line2]
      Parameters:
      normalizeLineEndingsWithinQuotes - flag indicating whether line separators in quoted values should be replaced by the the character specified in Format.getNormalizedNewline() .
    • setUnescapedQuoteHandling

      public void setUnescapedQuoteHandling(UnescapedQuoteHandling unescapedQuoteHandling)
      Configures the handling of values with unescaped quotes. Defaults to null, for backward compatibility with isParseUnescapedQuotes() and isParseUnescapedQuotesUntilDelimiter(). If set to a non-null value, this setting will override the configuration of isParseUnescapedQuotes() and isParseUnescapedQuotesUntilDelimiter().
      Parameters:
      unescapedQuoteHandling - the handling method to be used when unescaped quotes are found in the input.
    • getUnescapedQuoteHandling

      public UnescapedQuoteHandling getUnescapedQuoteHandling()
      Returns the method of handling values with unescaped quotes. Defaults to null, for backward compatibility with isParseUnescapedQuotes() and isParseUnescapedQuotesUntilDelimiter() If set to a non-null value, this setting will override the configuration of isParseUnescapedQuotes() and isParseUnescapedQuotesUntilDelimiter().
      Returns:
      the handling method to be used when unescaped quotes are found in the input, or null if not set.
    • getKeepQuotes

      public boolean getKeepQuotes()
      Flag indicating whether the parser should keep enclosing quote characters in the values parsed from the input.

      Defaults to false

      Returns:
      a flag indicating whether enclosing quotes should be maintained when parsing quoted values.
    • setKeepQuotes

      public void setKeepQuotes(boolean keepQuotes)
      Configures the parser to keep enclosing quote characters in the values parsed from the input.

      Defaults to false

      Parameters:
      keepQuotes - flag indicating whether enclosing quotes should be maintained when parsing quoted values.
    • addConfiguration

      protected void addConfiguration(Map<String,Object> out)
      Overrides:
      addConfiguration in class CommonParserSettings<CsvFormat>
    • clone

      public final CsvParserSettings clone()
      Description copied from class: CommonSettings
      Clones this configuration object. Use alternative CommonSettings.clone(boolean) method to reset properties that are specific to a given input, such as header names and selection of fields.
      Overrides:
      clone in class CommonParserSettings<CsvFormat>
      Returns:
      a copy of all configurations applied to the current instance.
    • clone

      public final CsvParserSettings clone(boolean clearInputSpecificSettings)
      Description copied from class: CommonSettings
      Clones this configuration object to reuse user-provided settings. Properties that are specific to a given input (such as header names and selection of fields) can be reset to their defaults if the clearInputSpecificSettings flag is set to true
      Overrides:
      clone in class CommonParserSettings<CsvFormat>
      Parameters:
      clearInputSpecificSettings - flag indicating whether to clear settings that are likely to be associated with a given input.
      Returns:
      a copy of the configurations applied to the current instance.
    • getDelimitersForDetection

      public final char[] getDelimitersForDetection()
      Returns the sequence of possible delimiters for detection when isDelimiterDetectionEnabled() evaluates to true, in order of priority.
      Returns:
      the possible delimiter characters, in order of priority.
    • getIgnoreTrailingWhitespacesInQuotes

      public boolean getIgnoreTrailingWhitespacesInQuotes()
      Returns whether or not trailing whitespaces from within quoted values should be skipped (defaults to false) Note: if keepQuotes evaluates to true, values won't be trimmed.
      Returns:
      true if trailing whitespaces from quoted values should be skipped, false otherwise
    • setIgnoreTrailingWhitespacesInQuotes

      public void setIgnoreTrailingWhitespacesInQuotes(boolean ignoreTrailingWhitespacesInQuotes)
      Defines whether or not trailing whitespaces from quoted values should be skipped (defaults to false) Note: if keepQuotes evaluates to true, values won't be trimmed.
      Parameters:
      ignoreTrailingWhitespacesInQuotes - whether trailing whitespaces from quoted values should be skipped
    • getIgnoreLeadingWhitespacesInQuotes

      public boolean getIgnoreLeadingWhitespacesInQuotes()
      Returns whether or not leading whitespaces from quoted values should be skipped (defaults to false) Note: if keepQuotes evaluates to true, values won't be trimmed.
      Returns:
      true if leading whitespaces from quoted values should be skipped, false otherwise
    • setIgnoreLeadingWhitespacesInQuotes

      public void setIgnoreLeadingWhitespacesInQuotes(boolean ignoreLeadingWhitespacesInQuotes)
      Defines whether or not leading whitespaces from quoted values should be skipped (defaults to false) Note: if keepQuotes evaluates to true, values won't be trimmed.
      Parameters:
      ignoreLeadingWhitespacesInQuotes - whether leading whitespaces from quoted values should be skipped
    • trimQuotedValues

      public final void trimQuotedValues(boolean trim)
      Configures the parser to trim any whitespaces around values extracted from within quotes. Shorthand for setIgnoreLeadingWhitespacesInQuotes(boolean) and setIgnoreTrailingWhitespacesInQuotes(boolean) Note: if keepQuotes evaluates to true, values won't be trimmed.
      Parameters:
      trim - a flag indicating whether whitespaces around values extracted from a quoted field should be removed
    • getFormatDetectorRowSampleCount

      public int getFormatDetectorRowSampleCount()
      Returns the number of sample rows used in the CSV format auto-detection process (defaults to 20)
      Returns:
      the number of sample rows used in the CSV format auto-detection process
    • setFormatDetectorRowSampleCount

      public void setFormatDetectorRowSampleCount(int formatDetectorRowSampleCount)
      Updates the number of sample rows used in the CSV format auto-detection process (defaults to 20)
      Parameters:
      formatDetectorRowSampleCount - the number of sample rows used in the CSV format auto-detection process