Class AbstractCharInputReader
- All Implemented Interfaces:
CharInput
,CharInputReader
- Direct Known Subclasses:
ConcurrentCharInputReader
,DefaultCharInputReader
CharInputReader
.
It provides the essential conversion of sequences of newline characters defined by Format.getLineSeparator()
into the normalized newline character provided in Format.getNormalizedNewline()
.
It also provides a default implementation for most of the methods specified by the CharInputReader
interface.
Extending classes must essentially read characters from a given Reader
and assign it to the public buffer
when requested (in the reloadBuffer()
method).
-
Field Summary
FieldsModifier and TypeFieldDescriptionchar[]
The buffer itselfprivate char
private long
protected final boolean
private boolean
private final boolean
int
Current position in the bufferprivate boolean
private List<InputAnalysisProcess>
int
Number of characters available in the buffer.private long
private char
private char
private boolean
private final char
private boolean
private int
private boolean
private final ExpandingCharAppender
(package private) final int
-
Constructor Summary
ConstructorsConstructorDescriptionAbstractCharInputReader
(char[] lineSeparator, char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop) Creates a new instance with the mandatory characters for handling newlines transparently.AbstractCharInputReader
(char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop) Creates a new instance that attempts to detect the newlines used in the input automatically. -
Method Summary
Modifier and TypeMethodDescriptionfinal void
addInputAnalysisProcess
(InputAnalysisProcess inputAnalysisProcess) Submits a customInputAnalysisProcess
to analyze the input buffer and potentially discover configuration options such as column separators is CSV, data formats, etc.final long
Returns the number of characters returned byCharInputReader.nextChar()
at any given time.final String
Returns a String with the input character sequence parsed to produce the current record.final int
Returns the length of the character sequence parsed to produce the current record.final void
enableNormalizeLineEndings
(boolean normalizeLineEndings) Indicates to the input reader that the parser is running in "escape" mode and new lines should be returned as-is to prevent modifying the content of the parsed value.final char
getChar()
Returns the last character returned by theCharInputReader.nextChar()
method.char[]
Returns the line separator by this character input reader.final String
getQuotedString
(char quote, char escape, char escapeEscape, int maxLength, char stop1, char stop2, boolean keepQuotes, boolean keepEscape, boolean trimLeading, boolean trimTrailing) Attempts to collect a quotedString
from the current position until a closing quote or stop character is found on the input, or a line ending is reached.final String
Attempts to collect aString
from the current position until a stop character is found on the input, or a line ending is reached.final int
lastIndexOf
(char ch) Returns the last index of a given character in the current parsed contentfinal long
Returns the number of newlines read so far.final void
Marks the start of a new record in the input, used internally to calculate the result ofCharInputReader.currentParsedContent()
final char
nextChar()
Returns the next character in the input provided by the activeReader
.Collects the comment line found on the input.protected abstract void
Informs the extending class that the buffer has been read entirely and requests for another batch of characters.private void
setLineSeparator
(char[] lineSeparator) protected abstract void
Passes theReader
provided in thestart(Reader)
method to the extending class so it can begin loading characters from it.final void
skipLines
(long lines) Skips characters in the input until the given number of lines is discarded.final boolean
skipQuotedString
(char quote, char escape, char stop1, char stop2) Attempts to skip a quotedString
from the current position until a stop character is found on the input, or a line ending is reached.final boolean
skipString
(char ch, char stop) Attempts to skip aString
from the current position until a stop character is found on the input, or a line ending is reached.final char
skipWhitespace
(char ch, char stopChar1, char stopChar2) Skips characters from the current input position, until a non-whitespace character, or a stop character is foundfinal void
Initializes the CharInputReader implementation with aReader
which provides access to the input.private void
private void
private void
protected final void
unwrapInputStream
(BomInput.BytesProcessedNotification notification) private void
Requests the next batch of characters from the implementing class and updates the character count.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface com.univocity.parsers.common.input.CharInputReader
stop
-
Field Details
-
tmp
-
lineSeparatorDetected
private boolean lineSeparatorDetected -
detectLineSeparator
private final boolean detectLineSeparator -
inputAnalysisProcesses
-
lineSeparator1
private char lineSeparator1 -
lineSeparator2
private char lineSeparator2 -
normalizedLineSeparator
private final char normalizedLineSeparator -
lineCount
private long lineCount -
charCount
private long charCount -
recordStart
private int recordStart -
whitespaceRangeStart
final int whitespaceRangeStart -
skipping
private boolean skipping -
commentProcessing
private boolean commentProcessing -
closeOnStop
protected final boolean closeOnStop -
i
public int iCurrent position in the buffer -
ch
private char ch -
buffer
public char[] bufferThe buffer itself -
length
public int lengthNumber of characters available in the buffer. -
incrementLineCount
private boolean incrementLineCount -
normalizeLineEndings
private boolean normalizeLineEndings
-
-
Constructor Details
-
AbstractCharInputReader
public AbstractCharInputReader(char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop) Creates a new instance that attempts to detect the newlines used in the input automatically.- Parameters:
normalizedLineSeparator
- the normalized newline character (as defined inFormat.getNormalizedNewline()
) that is used to replace any lineSeparator sequence found in the input.whitespaceRangeStart
- starting range of characters considered to be whitespace.closeOnStop
- indicates whether to automatically close the input whenCharInputReader.stop()
is called
-
AbstractCharInputReader
public AbstractCharInputReader(char[] lineSeparator, char normalizedLineSeparator, int whitespaceRangeStart, boolean closeOnStop) Creates a new instance with the mandatory characters for handling newlines transparently.- Parameters:
lineSeparator
- the sequence of characters that represent a newline, as defined inFormat.getLineSeparator()
normalizedLineSeparator
- the normalized newline character (as defined inFormat.getNormalizedNewline()
) that is used to replace any lineSeparator sequence found in the input.whitespaceRangeStart
- starting range of characters considered to be whitespace.closeOnStop
- indicates whether to automatically close the input whenCharInputReader.stop()
is called
-
-
Method Details
-
submitLineSeparatorDetector
private void submitLineSeparatorDetector() -
setLineSeparator
private void setLineSeparator(char[] lineSeparator) -
setReader
Passes theReader
provided in thestart(Reader)
method to the extending class so it can begin loading characters from it.- Parameters:
reader
- theReader
provided instart(Reader)
-
reloadBuffer
protected abstract void reloadBuffer()Informs the extending class that the buffer has been read entirely and requests for another batch of characters. Implementors must assign the new character buffer to the publicbuffer
attribute, as well as the number of characters available to the publiclength
attribute. To notify the input does not have any more characters,length
must receive the -1 value -
unwrapInputStream
-
start
-
start
Description copied from interface:CharInputReader
Initializes the CharInputReader implementation with aReader
which provides access to the input.- Specified by:
start
in interfaceCharInputReader
- Parameters:
reader
- AReader
that provides access to the input.
-
updateBuffer
private void updateBuffer()Requests the next batch of characters from the implementing class and updates the character count.If there are no more characters in the input, the reading will stop by invoking the
CharInputReader.stop()
method. -
addInputAnalysisProcess
Submits a customInputAnalysisProcess
to analyze the input buffer and potentially discover configuration options such as column separators is CSV, data formats, etc. The process will be execute only once.- Parameters:
inputAnalysisProcess
- a custom process to analyze the contents of the input buffer.
-
throwEOFException
private void throwEOFException() -
nextChar
public final char nextChar()Description copied from interface:CharInputReader
Returns the next character in the input provided by the activeReader
.If the input contains a sequence of newline characters (defined by
Format.getLineSeparator()
), this method will automatically converted them to the newline character specified inFormat.getNormalizedNewline()
.A subsequent call to this method will return the character after the newline sequence.
- Specified by:
nextChar
in interfaceCharInput
- Specified by:
nextChar
in interfaceCharInputReader
- Returns:
- the next character in the input. '\0' if there are no more characters in the input or if the CharInputReader was stopped.
-
getChar
public final char getChar()Description copied from interface:CharInputReader
Returns the last character returned by theCharInputReader.nextChar()
method.- Specified by:
getChar
in interfaceCharInput
- Specified by:
getChar
in interfaceCharInputReader
- Returns:
- the last character returned by the
CharInputReader.nextChar()
method.'\0' if there are no more characters in the input or if the CharInputReader was stopped.
-
lineCount
public final long lineCount()Description copied from interface:CharInputReader
Returns the number of newlines read so far.- Specified by:
lineCount
in interfaceCharInputReader
- Returns:
- the number of newlines read so far.
-
skipLines
public final void skipLines(long lines) Description copied from interface:CharInputReader
Skips characters in the input until the given number of lines is discarded.- Specified by:
skipLines
in interfaceCharInputReader
- Parameters:
lines
- the number of lines to skip from the current location in the input
-
readComment
Description copied from interface:CharInputReader
Collects the comment line found on the input.- Specified by:
readComment
in interfaceCharInputReader
- Returns:
- the text found in the comment from the current position.
-
charCount
public final long charCount()Description copied from interface:CharInputReader
Returns the number of characters returned byCharInputReader.nextChar()
at any given time.- Specified by:
charCount
in interfaceCharInputReader
- Returns:
- the number of characters returned by
CharInputReader.nextChar()
-
enableNormalizeLineEndings
public final void enableNormalizeLineEndings(boolean normalizeLineEndings) Description copied from interface:CharInputReader
Indicates to the input reader that the parser is running in "escape" mode and new lines should be returned as-is to prevent modifying the content of the parsed value.- Specified by:
enableNormalizeLineEndings
in interfaceCharInputReader
- Parameters:
normalizeLineEndings
- flag indicating that the parser is escaping values and line separators are to be returned as-is.
-
getLineSeparator
public char[] getLineSeparator()Description copied from interface:CharInputReader
Returns the line separator by this character input reader. This could be the line separator defined in theFormat.getLineSeparator()
configuration, or the line separator sequence identified automatically whenCommonParserSettings.isLineSeparatorDetectionEnabled()
evaluates totrue
.- Specified by:
getLineSeparator
in interfaceCharInputReader
- Returns:
- the line separator in use.
-
skipWhitespace
public final char skipWhitespace(char ch, char stopChar1, char stopChar2) Description copied from interface:CharInputReader
Skips characters from the current input position, until a non-whitespace character, or a stop character is found- Specified by:
skipWhitespace
in interfaceCharInputReader
- Parameters:
ch
- the current character of the inputstopChar1
- the first stop character (which can be a whitespace)stopChar2
- the second character (which can be a whitespace)- Returns:
- the first non-whitespace character (or delimiter) found in the input.
-
currentParsedContentLength
public final int currentParsedContentLength()Description copied from interface:CharInputReader
Returns the length of the character sequence parsed to produce the current record.- Specified by:
currentParsedContentLength
in interfaceCharInputReader
- Returns:
- the length of the text content parsed for the current input record
-
currentParsedContent
Description copied from interface:CharInputReader
Returns a String with the input character sequence parsed to produce the current record.- Specified by:
currentParsedContent
in interfaceCharInputReader
- Returns:
- the text content parsed for the current input record.
-
lastIndexOf
public final int lastIndexOf(char ch) Description copied from interface:CharInputReader
Returns the last index of a given character in the current parsed content- Specified by:
lastIndexOf
in interfaceCharInputReader
- Parameters:
ch
- the character to look for- Returns:
- the last position of the given character in the current parsed content, or
-1
if not found.
-
markRecordStart
public final void markRecordStart()Description copied from interface:CharInputReader
Marks the start of a new record in the input, used internally to calculate the result ofCharInputReader.currentParsedContent()
- Specified by:
markRecordStart
in interfaceCharInputReader
-
skipString
public final boolean skipString(char ch, char stop) Description copied from interface:CharInputReader
Attempts to skip aString
from the current position until a stop character is found on the input, or a line ending is reached. If theString
can be skipped, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnfalse
and the current position of the buffer will remain unchanged.- Specified by:
skipString
in interfaceCharInputReader
- Parameters:
ch
- the current character to be considered. If equal to the stop characterfalse
will be returnedstop
- the stop character that identifies the end of the content to be collected- Returns:
true
if an entireString
value was found on the input and skipped, orfalse
if the buffer needs to reloaded.
-
getString
Description copied from interface:CharInputReader
Attempts to collect aString
from the current position until a stop character is found on the input, or a line ending is reached. If theString
can be obtained, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnnull
and the current position of the buffer will remain unchanged.- Specified by:
getString
in interfaceCharInputReader
- Parameters:
ch
- the current character to be considered. If equal to the stop character thenullValue
will be returnedstop
- the stop character that identifies the end of the content to be collectedtrim
- flag indicating whether or not trailing whitespaces should be discardednullValue
- value to return when the length of the content to be returned is0
.maxLength
- the maximum length of theString
to be returned. If the length exceeds this limit,null
will be returned- Returns:
- the
String
found on the input, ornull
if the buffer needs to reloaded or the maximum length has been exceeded.
-
getQuotedString
public final String getQuotedString(char quote, char escape, char escapeEscape, int maxLength, char stop1, char stop2, boolean keepQuotes, boolean keepEscape, boolean trimLeading, boolean trimTrailing) Description copied from interface:CharInputReader
Attempts to collect a quotedString
from the current position until a closing quote or stop character is found on the input, or a line ending is reached. If theString
can be obtained, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnnull
and the current position of the buffer will remain unchanged.- Specified by:
getQuotedString
in interfaceCharInputReader
- Parameters:
quote
- the quote characterescape
- the quote escape characterescapeEscape
- the escape of the quote escape charactermaxLength
- the maximum length of theString
to be returned. If the length exceeds this limit,null
will be returnedstop1
- the first stop character that identifies the end of the content to be collectedstop2
- the second stop character that identifies the end of the content to be collectedkeepQuotes
- flag to indicate the quotes that wrap the resultingString
should be kept.keepEscape
- flag to indicate that escape sequences should be kepttrimLeading
- flag to indicate leading whitespaces should be trimmedtrimTrailing
- flag to indicate that trailing whitespaces should be trimmed- Returns:
- the
String
found on the input, ornull
if the buffer needs to reloaded or the maximum length has been exceeded.
-
skipQuotedString
public final boolean skipQuotedString(char quote, char escape, char stop1, char stop2) Description copied from interface:CharInputReader
Attempts to skip a quotedString
from the current position until a stop character is found on the input, or a line ending is reached. If theString
can be skipped, the current position of the parser will be updated to the last consumed character. If the internal buffer needs to be reloaded, this method will returnfalse
and the current position of the buffer will remain unchanged.- Specified by:
skipQuotedString
in interfaceCharInputReader
- Parameters:
quote
- the quote characterescape
- the quote escape characterstop1
- the first stop character that identifies the end of the content to be collectedstop2
- the second stop character that identifies the end of the content to be collected- Returns:
true
if an entireString
value was found on the input and skipped, orfalse
if the buffer needs to reloaded.
-