Class CommonSettings<F extends Format>

java.lang.Object
com.univocity.parsers.common.CommonSettings<F>
Type Parameters:
F - the format supported by this settings class.
All Implemented Interfaces:
Cloneable
Direct Known Subclasses:
CommonParserSettings, CommonWriterSettings

public abstract class CommonSettings<F extends Format> extends Object implements Cloneable
This is the parent class for all configuration classes used by parsers (AbstractParser) and writers (AbstractWriter)

By default, all parsers and writers work with, at least, the following configuration options:

  • format (each file format provides its default): the input/output format of a given file
  • nullValue (defaults to null):

    when reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string

    when writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string

  • maxCharsPerColumn (defaults to 4096): The maximum number of characters allowed for any given value being written/read.

    You need this to avoid OutOfMemoryErrors in case a file does not have a valid format. In such cases the parser might just keep reading from the input until its end or the memory is exhausted. This sets a limit which avoids unwanted JVM crashes.

  • maxColumns (defaults to 512): a hard limit on how many columns a record can have. You need this to avoid OutOfMemory errors in case of inputs that might be inconsistent with the format you are dealing with
  • skipEmptyLines (defaults to true):

    when reading, if the parser reads a line that is empty, it will be skipped.

    when writing, if the writer receives an empty or null row to write to the output, it will be ignored

  • ignoreTrailingWhitespaces (defaults to true): removes trailing whitespaces from values being read/written
  • ignoreLeadingWhitespaces (defaults to true): removes leading whitespaces from values being read/written
  • headers (defaults to null): the field names in the input/output, in the sequence they occur.

    when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

    when writing, the given header names will be used to refer to each column and can be used for writing the header row

  • field selection (defaults to none): a selection of fields for reading and writing. Fields can be selected by their name or their position.

    when reading, the selected fields only will be parsed and the remaining fields will be discarded.

    when writing, the selected fields only will be written and the remaining fields will be discarded

See Also:
  • Field Details

    • format

      private F extends Format format
    • nullValue

      private String nullValue
    • maxCharsPerColumn

      private int maxCharsPerColumn
    • maxColumns

      private int maxColumns
    • skipEmptyLines

      private boolean skipEmptyLines
    • ignoreTrailingWhitespaces

      private boolean ignoreTrailingWhitespaces
    • ignoreLeadingWhitespaces

      private boolean ignoreLeadingWhitespaces
    • fieldSelector

      private FieldSelector fieldSelector
    • autoConfigurationEnabled

      private boolean autoConfigurationEnabled
    • errorHandler

      private ProcessorErrorHandler<? extends Context> errorHandler
    • errorContentLength

      private int errorContentLength
    • skipBitsAsWhitespace

      private boolean skipBitsAsWhitespace
    • headers

      private String[] headers
    • headerSourceClass

      Class<?> headerSourceClass
  • Constructor Details

    • CommonSettings

      public CommonSettings()
      Creates a new instance of this settings object using the default format specified by the concrete class that inherits from CommonSettings
  • Method Details

    • getNullValue

      public String getNullValue()
      Returns the String representation of a null value (defaults to null)

      When reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string

      When writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string

      Returns:
      the String representation of a null value
    • setNullValue

      public void setNullValue(String emptyValue)
      Sets the String representation of a null value (defaults to null)

      When reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string

      When writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string

      Parameters:
      emptyValue - the String representation of a null value
    • getMaxCharsPerColumn

      public int getMaxCharsPerColumn()
      The maximum number of characters allowed for any given value being written/read. Used to avoid OutOfMemoryErrors (defaults to 4096).

      If set to -1, then the internal internal array will expand automatically, up to the limit allowed by the JVM

      Returns:
      The maximum number of characters allowed for any given value being written/read
    • setMaxCharsPerColumn

      public void setMaxCharsPerColumn(int maxCharsPerColumn)
      Defines the maximum number of characters allowed for any given value being written/read. Used to avoid OutOfMemoryErrors (defaults to 4096).

      To enable auto-expansion of the internal array, set this property to -1

      Parameters:
      maxCharsPerColumn - The maximum number of characters allowed for any given value being written/read
    • getSkipEmptyLines

      public boolean getSkipEmptyLines()
      Returns whether or not empty lines should be ignored (defaults to true)

      when reading, if the parser reads a line that is empty, it will be skipped.

      when writing, if the writer receives an empty or null row to write to the output, it will be ignored

      Returns:
      true if empty lines are configured to be ignored, false otherwise
    • setSkipEmptyLines

      public void setSkipEmptyLines(boolean skipEmptyLines)
      Defines whether or not empty lines should be ignored (defaults to true)

      when reading, if the parser reads a line that is empty, it will be skipped.

      when writing, if the writer receives an empty or null row to write to the output, it will be ignored

      Parameters:
      skipEmptyLines - true if empty lines should be ignored, false otherwise
    • getIgnoreTrailingWhitespaces

      public boolean getIgnoreTrailingWhitespaces()
      Returns whether or not trailing whitespaces from values being read/written should be skipped (defaults to true)
      Returns:
      true if trailing whitespaces from values being read/written should be skipped, false otherwise
    • setIgnoreTrailingWhitespaces

      public void setIgnoreTrailingWhitespaces(boolean ignoreTrailingWhitespaces)
      Defines whether or not trailing whitespaces from values being read/written should be skipped (defaults to true)
      Parameters:
      ignoreTrailingWhitespaces - true if trailing whitespaces from values being read/written should be skipped, false otherwise
    • getIgnoreLeadingWhitespaces

      public boolean getIgnoreLeadingWhitespaces()
      Returns whether or not leading whitespaces from values being read/written should be skipped (defaults to true)
      Returns:
      true if leading whitespaces from values being read/written should be skipped, false otherwise
    • setIgnoreLeadingWhitespaces

      public void setIgnoreLeadingWhitespaces(boolean ignoreLeadingWhitespaces)
      Defines whether or not leading whitespaces from values being read/written should be skipped (defaults to true)
      Parameters:
      ignoreLeadingWhitespaces - true if leading whitespaces from values being read/written should be skipped, false otherwise
    • setHeaders

      public void setHeaders(String... headers)
      Defines the field names in the input/output, in the sequence they occur (defaults to null).

      when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

      when writing, the given header names will be used to refer to each column and can be used for writing the header row

      Parameters:
      headers - the field name sequence associated with each column in the input/output.
    • setHeadersDerivedFromClass

      void setHeadersDerivedFromClass(Class<?> headerSourceClass, String... headers)
      Defines the field names in the input/output derived from a given class with Parsed annotated attributes/methods.

      when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

      when writing, the given header names will be used to refer to each column and can be used for writing the header row

      Parameters:
      headerSourceClass - the class from which the headers have been derived.
      headers - the field name sequence associated with each column in the input/output.
    • deriveHeadersFrom

      boolean deriveHeadersFrom(Class<?> beanClass)
      Indicates whether headers should be derived from a given class.
      Parameters:
      beanClass - the class to derive headers from
      Returns:
      true if the headers used for parsing/writing should be derived from the given class; otherwise false
    • getHeaders

      public String[] getHeaders()
      Returns the field names in the input/output, in the sequence they occur (defaults to null).

      when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

      when writing, the given header names will be used to refer to each column and can be used for writing the header row

      Returns:
      the field name sequence associated with each column in the input/output.
    • getMaxColumns

      public int getMaxColumns()
      Returns the hard limit of how many columns a record can have (defaults to 512). You need this to avoid OutOfMemory errors in case of inputs that might be inconsistent with the format you are dealing with .
      Returns:
      The maximum number of columns a record can have.
    • setMaxColumns

      public void setMaxColumns(int maxColumns)
      Defines a hard limit of how many columns a record can have (defaults to 512). You need this to avoid OutOfMemory errors in case of inputs that might be inconsistent with the format you are dealing with.
      Parameters:
      maxColumns - The maximum number of columns a record can have.
    • getFormat

      public F getFormat()
      The format of the file to be parsed/written (returns the format's defaults).
      Returns:
      The format of the file to be parsed/written
    • setFormat

      public void setFormat(F format)
      Defines the format of the file to be parsed/written (returns the format's defaults).
      Parameters:
      format - The format of the file to be parsed/written
    • selectFields

      public FieldSet<String> selectFields(String... fieldNames)
      Selects a sequence of fields for reading/writing by their names.

      When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

      When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

      Parameters:
      fieldNames - The field names to read/write
      Returns:
      the (modifiable) set of selected fields
    • excludeFields

      public FieldSet<String> excludeFields(String... fieldNames)
      Selects fields which will not be read/written, by their names

      When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

      When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

      Parameters:
      fieldNames - The field names to exclude from the parsing/writing process
      Returns:
      the (modifiable) set of ignored fields
    • selectIndexes

      public FieldSet<Integer> selectIndexes(Integer... fieldIndexes)
      Selects a sequence of fields for reading/writing by their positions.

      When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

      When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting indexes "2" and "0" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

      Parameters:
      fieldIndexes - The indexes to read/write
      Returns:
      the (modifiable) set of selected fields
    • excludeIndexes

      public FieldSet<Integer> excludeIndexes(Integer... fieldIndexes)
      Selects columns which will not be read/written, by their positions

      When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

      When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields by index, such as "2" and "0" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

      Parameters:
      fieldIndexes - indexes of columns to exclude from the parsing/writing process
      Returns:
      the (modifiable) set of ignored fields
    • selectFields

      public FieldSet<Enum> selectFields(Enum... columns)
      Selects a sequence of fields for reading/writing by their names

      When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

      When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

      Parameters:
      columns - The columns to read/write
      Returns:
      the (modifiable) set of selected fields
    • excludeFields

      public FieldSet<Enum> excludeFields(Enum... columns)
      Selects columns which will not be read/written, by their names

      When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

      When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

      Parameters:
      columns - The columns to exclude from the parsing/writing process
      Returns:
      the (modifiable) set of ignored fields
    • setFieldSet

      private <T> FieldSet<T> setFieldSet(FieldSet<T> fieldSet, T... values)
      Replaces the current field selection
      Parameters:
      fieldSet - the new set of selected fields
      values - the values to include to the selection
      Returns:
      the set of selected fields given in as a parameter.
    • getFieldSet

      FieldSet<?> getFieldSet()
      Returns the set of selected fields, if any
      Returns:
      the set of selected fields. Null if no field was selected/excluded
    • getFieldSelector

      FieldSelector getFieldSelector()
      Returns the FieldSelector object, which handles selected fields.
      Returns:
      the FieldSelector object, which handles selected fields. Null if no field was selected/excluded
    • isAutoConfigurationEnabled

      public final boolean isAutoConfigurationEnabled()
      Indicates whether this settings object can automatically derive configuration options. This is used, for example, to define the headers when the user provides a BeanWriterProcessor where the bean class contains a Headers annotation, or to enable header extraction when the bean class of a BeanProcessor has attributes mapping to header names.

      Defaults to true

      Returns:
      true if the automatic configuration feature is enabled, false otherwise
    • setAutoConfigurationEnabled

      public final void setAutoConfigurationEnabled(boolean autoConfigurationEnabled)
      Indicates whether this settings object can automatically derive configuration options. This is used, for example, to define the headers when the user provides a BeanWriterProcessor where the bean class contains a Headers annotation, or to enable header extraction when the bean class of a BeanProcessor has attributes mapping to header names.
      Parameters:
      autoConfigurationEnabled - a flag to turn the automatic configuration feature on/off.
    • getRowProcessorErrorHandler

      @Deprecated public RowProcessorErrorHandler getRowProcessorErrorHandler()
      Deprecated.
      Use the getProcessorErrorHandler() method as it allows format-specific error handlers to be built to work with different implementations of Context. Implementations based on RowProcessorErrorHandler allow only parsers who provide a ParsingContext to be used.
      Returns the custom error handler to be used to capture and handle errors that might happen while processing records with a RowProcessor or a RowWriterProcessor (i.e. non-fatal DataProcessingExceptions).

      The parsing/writing process won't stop (unless the error handler rethrows the DataProcessingException or manually stops the process).

      Returns:
      the callback error handler with custom code to manage occurrences of DataProcessingException.
    • setRowProcessorErrorHandler

      @Deprecated public void setRowProcessorErrorHandler(RowProcessorErrorHandler rowProcessorErrorHandler)
      Deprecated.
      Use the setProcessorErrorHandler(ProcessorErrorHandler) method as it allows format-specific error handlers to be built to work with different implementations of Context. Implementations based on RowProcessorErrorHandler allow only parsers who provide a ParsingContext to be used.
      Defines a custom error handler to capture and handle errors that might happen while processing records with a RowProcessor or a RowWriterProcessor (i.e. non-fatal DataProcessingExceptions).

      The parsing parsing/writing won't stop (unless the error handler rethrows the DataProcessingException or manually stops the process).

      Parameters:
      rowProcessorErrorHandler - the callback error handler with custom code to manage occurrences of DataProcessingException.
    • getProcessorErrorHandler

      public <T extends Context> ProcessorErrorHandler<T> getProcessorErrorHandler()
      Returns the custom error handler to be used to capture and handle errors that might happen while processing records with a Processor or a RowWriterProcessor (i.e. non-fatal DataProcessingExceptions).

      The parsing/writing process won't stop (unless the error handler rethrows the DataProcessingException or manually stops the process).

      Type Parameters:
      T - the Context type provided by the parser implementation.
      Returns:
      the callback error handler with custom code to manage occurrences of DataProcessingException.
    • setProcessorErrorHandler

      public void setProcessorErrorHandler(ProcessorErrorHandler<? extends Context> processorErrorHandler)
      Defines a custom error handler to capture and handle errors that might happen while processing records with a Processor or a RowWriterProcessor (i.e. non-fatal DataProcessingExceptions).

      The parsing parsing/writing won't stop (unless the error handler rethrows the DataProcessingException or manually stops the process).

      Parameters:
      processorErrorHandler - the callback error handler with custom code to manage occurrences of DataProcessingException.
    • isProcessorErrorHandlerDefined

      public boolean isProcessorErrorHandlerDefined()
      Returns a flag indicating whether or not a ProcessorErrorHandler has been defined through the use of method setProcessorErrorHandler(ProcessorErrorHandler)
      Returns:
      true if the parser/writer is configured to use a ProcessorErrorHandler
    • createDefaultFormat

      protected abstract F createDefaultFormat()
      Extending classes must implement this method to return the default format settings for their parser/writer
      Returns:
      Default format configuration for the given parser/writer settings.
    • autoConfigure

      final void autoConfigure()
    • trimValues

      public final void trimValues(boolean trim)
      Configures the parser/writer to trim or keep leading and trailing whitespaces around values This has the same effect as invoking both setIgnoreLeadingWhitespaces(boolean) and setIgnoreTrailingWhitespaces(boolean) with the same value.
      Parameters:
      trim - a flag indicating whether the whitespaces should remove whitespaces around values parsed/written.
    • getErrorContentLength

      public int getErrorContentLength()
      Configures the parser/writer to limit the length of displayed contents being parsed/written in the exception message when an error occurs

      If set to 0, then no exceptions will include the content being manipulated in their attributes, and the "<omitted>" string will appear in error messages as the parsed/written content.

      defaults to -1 (no limit)

      .
      Returns:
      the maximum length of contents displayed in exception messages in case of errors while parsing/writing.
    • setErrorContentLength

      public void setErrorContentLength(int errorContentLength)
      Configures the parser/writer to limit the length of displayed contents being parsed/written in the exception message when an error occurs.

      If set to 0, then no exceptions will include the content being manipulated in their attributes, and the "<omitted>" string will appear in error messages as the parsed/written content.

      defaults to -1 (no limit)

      .
      Parameters:
      errorContentLength - maximum length of contents displayed in exception messages in case of errors while parsing/writing.
    • runAutomaticConfiguration

      void runAutomaticConfiguration()
    • getSkipBitsAsWhitespace

      public final boolean getSkipBitsAsWhitespace()
      Returns a flag indicating whether the parser/writer should skip bit values as whitespace. By default the parser/writer removes control characters and considers a whitespace any character where character <= ' ' evaluates to true. This includes bit values, i.e. 0 (the \0 character) and 1 which might be produced by database dumps. Disabling this flag will prevent the parser/writer from discarding these characters when getIgnoreLeadingWhitespaces() or getIgnoreTrailingWhitespaces() evaluate to true.

      defaults to true

      Returns:
      a flag indicating whether bit values (0 or 1) should be considered whitespace.
    • setSkipBitsAsWhitespace

      public final void setSkipBitsAsWhitespace(boolean skipBitsAsWhitespace)
      Configures the parser to skip bit values as whitespace. By default the parser/writer removes control characters and considers a whitespace any character where character <= ' ' evaluates to true. This includes bit values, i.e. 0 (the \0 character) and 1 which might be produced by database dumps. Disabling this flag will prevent the parser/writer from discarding these characters when getIgnoreLeadingWhitespaces() or getIgnoreTrailingWhitespaces() evaluate to true.

      defaults to true

      Parameters:
      skipBitsAsWhitespace - a flag indicating whether bit values (0 or 1) should be considered whitespace.
    • getWhitespaceRangeStart

      protected final int getWhitespaceRangeStart()
      Returns the starting decimal range for characters <= ' ' that should be skipped as whitespace, as determined by getSkipBitsAsWhitespace()
      Returns:
      the starting range after which characters will be considered whitespace
    • toString

      public final String toString()
      Overrides:
      toString in class Object
    • addConfiguration

      protected void addConfiguration(Map<String,Object> out)
    • clone

      protected CommonSettings clone(boolean clearInputSpecificSettings)
      Clones this configuration object to reuse user-provided settings. Properties that are specific to a given input (such as header names and selection of fields) can be reset to their defaults if the clearInputSpecificSettings flag is set to true
      Parameters:
      clearInputSpecificSettings - flag indicating whether to clear settings that are likely to be associated with a given input.
      Returns:
      a copy of the configurations applied to the current instance.
    • clone

      protected CommonSettings clone()
      Clones this configuration object. Use alternative clone(boolean) method to reset properties that are specific to a given input, such as header names and selection of fields.
      Overrides:
      clone in class Object
      Returns:
      a copy of all configurations applied to the current instance.
    • clearInputSpecificSettings

      protected void clearInputSpecificSettings()
      Clears settings that are likely to be specific to a given input.