Class AbstractCharInputReader

java.lang.Object
com.univocity.parsers.common.input.AbstractCharInputReader
All Implemented Interfaces:
CharInput, CharInputReader
Direct Known Subclasses:
ConcurrentCharInputReader, DefaultCharInputReader

public abstract class AbstractCharInputReader extends Object implements CharInputReader
The base class for implementing different flavours of CharInputReader.

It provides the essential conversion of sequences of newline characters defined by Format.getLineSeparator() into the normalized newline character provided in Format.getNormalizedNewline().

It also provides a default implementation for most of the methods specified by the CharInputReader interface.

Extending classes must essentially read characters from a given Reader and assign it to the public buffer when requested (in the reloadBuffer() method).

See Also:
  • Field Details

    • tmp

      private final ExpandingCharAppender tmp
    • lineSeparatorDetected

      private boolean lineSeparatorDetected
    • detectLineSeparator

      private final boolean detectLineSeparator
    • inputAnalysisProcesses

      private List<InputAnalysisProcess> inputAnalysisProcesses
    • lineSeparator1

      private char lineSeparator1
    • lineSeparator2

      private char lineSeparator2
    • normalizedLineSeparator

      private final char normalizedLineSeparator
    • lineCount

      private long lineCount
    • charCount

      private long charCount
    • recordStart

      private int recordStart
    • whitespaceRangeStart

      final int whitespaceRangeStart
    • skipping

      private boolean skipping
    • commentProcessing

      private boolean commentProcessing
    • closeOnStop

      protected final boolean closeOnStop
    • i

      public int i
      Current position in the buffer
    • ch

      private char ch
    • buffer

      public char[] buffer
      The buffer itself
    • length

      public int length
      Number of characters available in the buffer.
    • incrementLineCount

      private boolean incrementLineCount
    • normalizeLineEndings

      private boolean normalizeLineEndings
  • Constructor Details

    • AbstractCharInputReader

      public AbstractCharInputReader(char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop)
      Creates a new instance that attempts to detect the newlines used in the input automatically.
      Parameters:
      normalizedLineSeparator - the normalized newline character (as defined in Format.getNormalizedNewline()) that is used to replace any lineSeparator sequence found in the input.
      whitespaceRangeStart - starting range of characters considered to be whitespace.
      closeOnStop - indicates whether to automatically close the input when CharInputReader.stop() is called
    • AbstractCharInputReader

      public AbstractCharInputReader(char[] lineSeparator, char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop)
      Creates a new instance with the mandatory characters for handling newlines transparently.
      Parameters:
      lineSeparator - the sequence of characters that represent a newline, as defined in Format.getLineSeparator()
      normalizedLineSeparator - the normalized newline character (as defined in Format.getNormalizedNewline()) that is used to replace any lineSeparator sequence found in the input.
      whitespaceRangeStart - starting range of characters considered to be whitespace.
      closeOnStop - indicates whether to automatically close the input when CharInputReader.stop() is called
  • Method Details

    • submitLineSeparatorDetector

      private void submitLineSeparatorDetector()
    • setLineSeparator

      private void setLineSeparator(char[] lineSeparator)
    • setReader

      protected abstract void setReader(Reader reader)
      Passes the Reader provided in the start(Reader) method to the extending class so it can begin loading characters from it.
      Parameters:
      reader - the Reader provided in start(Reader)
    • reloadBuffer

      protected abstract void reloadBuffer()
      Informs the extending class that the buffer has been read entirely and requests for another batch of characters. Implementors must assign the new character buffer to the public buffer attribute, as well as the number of characters available to the public length attribute. To notify the input does not have any more characters, length must receive the -1 value
    • unwrapInputStream

      protected final void unwrapInputStream(BomInput.BytesProcessedNotification notification)
    • start

      private void start(Reader reader, boolean resetTmp)
    • start

      public final void start(Reader reader)
      Description copied from interface: CharInputReader
      Initializes the CharInputReader implementation with a Reader which provides access to the input.
      Specified by:
      start in interface CharInputReader
      Parameters:
      reader - A Reader that provides access to the input.
    • updateBuffer

      private void updateBuffer()
      Requests the next batch of characters from the implementing class and updates the character count.

      If there are no more characters in the input, the reading will stop by invoking the CharInputReader.stop() method.

    • addInputAnalysisProcess

      public final void addInputAnalysisProcess(InputAnalysisProcess inputAnalysisProcess)
      Submits a custom InputAnalysisProcess to analyze the input buffer and potentially discover configuration options such as column separators is CSV, data formats, etc. The process will be execute only once.
      Parameters:
      inputAnalysisProcess - a custom process to analyze the contents of the input buffer.
    • throwEOFException

      private void throwEOFException()
    • nextChar

      public final char nextChar()
      Description copied from interface: CharInputReader
      Returns the next character in the input provided by the active Reader.

      If the input contains a sequence of newline characters (defined by Format.getLineSeparator()), this method will automatically converted them to the newline character specified in Format.getNormalizedNewline().

      A subsequent call to this method will return the character after the newline sequence.

      Specified by:
      nextChar in interface CharInput
      Specified by:
      nextChar in interface CharInputReader
      Returns:
      the next character in the input. '\0' if there are no more characters in the input or if the CharInputReader was stopped.
    • getChar

      public final char getChar()
      Description copied from interface: CharInputReader
      Returns the last character returned by the CharInputReader.nextChar() method.
      Specified by:
      getChar in interface CharInput
      Specified by:
      getChar in interface CharInputReader
      Returns:
      the last character returned by the CharInputReader.nextChar() method.'\0' if there are no more characters in the input or if the CharInputReader was stopped.
    • lineCount

      public final long lineCount()
      Description copied from interface: CharInputReader
      Returns the number of newlines read so far.
      Specified by:
      lineCount in interface CharInputReader
      Returns:
      the number of newlines read so far.
    • skipLines

      public final void skipLines(long lines)
      Description copied from interface: CharInputReader
      Skips characters in the input until the given number of lines is discarded.
      Specified by:
      skipLines in interface CharInputReader
      Parameters:
      lines - the number of lines to skip from the current location in the input
    • readComment

      public String readComment()
      Description copied from interface: CharInputReader
      Collects the comment line found on the input.
      Specified by:
      readComment in interface CharInputReader
      Returns:
      the text found in the comment from the current position.
    • charCount

      public final long charCount()
      Description copied from interface: CharInputReader
      Returns the number of characters returned by CharInputReader.nextChar() at any given time.
      Specified by:
      charCount in interface CharInputReader
      Returns:
      the number of characters returned by CharInputReader.nextChar()
    • enableNormalizeLineEndings

      public final void enableNormalizeLineEndings(boolean normalizeLineEndings)
      Description copied from interface: CharInputReader
      Indicates to the input reader that the parser is running in "escape" mode and new lines should be returned as-is to prevent modifying the content of the parsed value.
      Specified by:
      enableNormalizeLineEndings in interface CharInputReader
      Parameters:
      normalizeLineEndings - flag indicating that the parser is escaping values and line separators are to be returned as-is.
    • getLineSeparator

      public char[] getLineSeparator()
      Description copied from interface: CharInputReader
      Returns the line separator by this character input reader. This could be the line separator defined in the Format.getLineSeparator() configuration, or the line separator sequence identified automatically when CommonParserSettings.isLineSeparatorDetectionEnabled() evaluates to true.
      Specified by:
      getLineSeparator in interface CharInputReader
      Returns:
      the line separator in use.
    • skipWhitespace

      public final char skipWhitespace(char ch, char stopChar1, char stopChar2)
      Description copied from interface: CharInputReader
      Skips characters from the current input position, until a non-whitespace character, or a stop character is found
      Specified by:
      skipWhitespace in interface CharInputReader
      Parameters:
      ch - the current character of the input
      stopChar1 - the first stop character (which can be a whitespace)
      stopChar2 - the second character (which can be a whitespace)
      Returns:
      the first non-whitespace character (or delimiter) found in the input.
    • currentParsedContentLength

      public final int currentParsedContentLength()
      Description copied from interface: CharInputReader
      Returns the length of the character sequence parsed to produce the current record.
      Specified by:
      currentParsedContentLength in interface CharInputReader
      Returns:
      the length of the text content parsed for the current input record
    • currentParsedContent

      public final String currentParsedContent()
      Description copied from interface: CharInputReader
      Returns a String with the input character sequence parsed to produce the current record.
      Specified by:
      currentParsedContent in interface CharInputReader
      Returns:
      the text content parsed for the current input record.
    • lastIndexOf

      public final int lastIndexOf(char ch)
      Description copied from interface: CharInputReader
      Returns the last index of a given character in the current parsed content
      Specified by:
      lastIndexOf in interface CharInputReader
      Parameters:
      ch - the character to look for
      Returns:
      the last position of the given character in the current parsed content, or -1 if not found.
    • markRecordStart

      public final void markRecordStart()
      Description copied from interface: CharInputReader
      Marks the start of a new record in the input, used internally to calculate the result of CharInputReader.currentParsedContent()
      Specified by:
      markRecordStart in interface CharInputReader
    • skipString

      public final boolean skipString(char ch, char stop)
      Description copied from interface: CharInputReader
      Attempts to skip a String from the current position until a stop character is found on the input, or a line ending is reached. If the String can be skipped, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will return false and the current position of the buffer will remain unchanged.
      Specified by:
      skipString in interface CharInputReader
      Parameters:
      ch - the current character to be considered. If equal to the stop character false will be returned
      stop - the stop character that identifies the end of the content to be collected
      Returns:
      true if an entire String value was found on the input and skipped, or false if the buffer needs to reloaded.
    • getString

      public final String getString(char ch, char stop, boolean trim, String nullValue, int maxLength)
      Description copied from interface: CharInputReader
      Attempts to collect a String from the current position until a stop character is found on the input, or a line ending is reached. If the String can be obtained, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will return null and the current position of the buffer will remain unchanged.
      Specified by:
      getString in interface CharInputReader
      Parameters:
      ch - the current character to be considered. If equal to the stop character the nullValue will be returned
      stop - the stop character that identifies the end of the content to be collected
      trim - flag indicating whether or not trailing whitespaces should be discarded
      nullValue - value to return when the length of the content to be returned is 0.
      maxLength - the maximum length of the String to be returned. If the length exceeds this limit, null will be returned
      Returns:
      the String found on the input, or null if the buffer needs to reloaded or the maximum length has been exceeded.
    • getQuotedString

      public final String getQuotedString(char quote, char escape, char escapeEscape, int maxLength, char stop1, char stop2, boolean keepQuotes, boolean keepEscape, boolean trimLeading, boolean trimTrailing)
      Description copied from interface: CharInputReader
      Attempts to collect a quoted String from the current position until a closing quote or stop character is found on the input, or a line ending is reached. If the String can be obtained, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will return null and the current position of the buffer will remain unchanged.
      Specified by:
      getQuotedString in interface CharInputReader
      Parameters:
      quote - the quote character
      escape - the quote escape character
      escapeEscape - the escape of the quote escape character
      maxLength - the maximum length of the String to be returned. If the length exceeds this limit, null will be returned
      stop1 - the first stop character that identifies the end of the content to be collected
      stop2 - the second stop character that identifies the end of the content to be collected
      keepQuotes - flag to indicate the quotes that wrap the resulting String should be kept.
      keepEscape - flag to indicate that escape sequences should be kept
      trimLeading - flag to indicate leading whitespaces should be trimmed
      trimTrailing - flag to indicate that trailing whitespaces should be trimmed
      Returns:
      the String found on the input, or null if the buffer needs to reloaded or the maximum length has been exceeded.
    • skipQuotedString

      public final boolean skipQuotedString(char quote, char escape, char stop1, char stop2)
      Description copied from interface: CharInputReader
      Attempts to skip a quoted String from the current position until a stop character is found on the input, or a line ending is reached. If the String can be skipped, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will return false and the current position of the buffer will remain unchanged.
      Specified by:
      skipQuotedString in interface CharInputReader
      Parameters:
      quote - the quote character
      escape - the quote escape character
      stop1 - the first stop character that identifies the end of the content to be collected
      stop2 - the second stop character that identifies the end of the content to be collected
      Returns:
      true if an entire String value was found on the input and skipped, or false if the buffer needs to reloaded.