Package com.univocity.parsers.csv
Class CsvFormatDetector
java.lang.Object
com.univocity.parsers.csv.CsvFormatDetector
- All Implemented Interfaces:
InputAnalysisProcess
An
InputAnalysisProcess
to detect column delimiters, quotes and quote escapes in a CSV input.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate char[]
private final char
private char[]
private final int
private final char
private final char
private final char
private final char
private final int
-
Constructor Summary
ConstructorsConstructorDescriptionCsvFormatDetector
(int maxRowSamples, CsvParserSettings settings, int whitespaceRangeStart) Builds a newCsvFormatDetector
-
Method Summary
Modifier and TypeMethodDescriptionprotected abstract void
apply
(char delimiter, char quote, char quoteEscape) Applies the discovered CSV format elements to theCsvParser
calculateTotals
(List<Map<Character, Integer>> symbolsPerRow) void
execute
(char[] characters, int length) A sequence of characters of the input buffer to be analyzed.protected char
Returns the character with the highest or lowest associated number.protected void
Increments the number associated with a character in a map by 1protected void
Increments the number associated with a character in a mapprotected boolean
isAllowedDelimiter
(char ch) protected boolean
isSymbol
(char ch) protected char
Returns the character with the highest associated number.protected char
Returns the character with the lowest associated number.protected char
-
Field Details
-
MAX_ROW_SAMPLES
private final int MAX_ROW_SAMPLES -
comment
private final char comment -
suggestedDelimiter
private final char suggestedDelimiter -
normalizedNewLine
private final char normalizedNewLine -
whitespaceRangeStart
private final int whitespaceRangeStart -
allowedDelimiters
private char[] allowedDelimiters -
delimiterPreference
private char[] delimiterPreference -
suggestedQuote
private final char suggestedQuote -
suggestedQuoteEscape
private final char suggestedQuoteEscape
-
-
Constructor Details
-
CsvFormatDetector
Builds a newCsvFormatDetector
- Parameters:
maxRowSamples
- the number of row samples to collect before analyzing the statisticssettings
- the configuration provided by the user with potential defaults in case the detection is unable to discover the proper column delimiter or quote character.whitespaceRangeStart
- starting range of characters considered to be whitespace.
-
-
Method Details
-
calculateTotals
-
execute
public void execute(char[] characters, int length) Description copied from interface:InputAnalysisProcess
A sequence of characters of the input buffer to be analyzed.- Specified by:
execute
in interfaceInputAnalysisProcess
- Parameters:
characters
- the input bufferlength
- the last character position loaded into the buffer.
-
pickDelimiter
-
increment
Increments the number associated with a character in a map by 1- Parameters:
map
- the map of characters and their numberssymbol
- the character whose number should be increment
-
increment
Increments the number associated with a character in a map- Parameters:
map
- the map of characters and their numberssymbol
- the character whose number should be incrementincrementSize
- the size of the increment
-
min
Returns the character with the lowest associated number.- Parameters:
map
- the map of characters and their numbersdefaultChar
- the default character to return in case the map is empty- Returns:
- the character with the lowest number associated.
-
max
Returns the character with the highest associated number.- Parameters:
map
- the map of characters and their numbersdefaultChar
- the default character to return in case the map is empty- Returns:
- the character with the highest number associated.
-
getChar
protected char getChar(Map<Character, Integer> map, Map<Character, Integer> totals, char defaultChar, boolean min) Returns the character with the highest or lowest associated number.- Parameters:
map
- the map of characters and their numbersdefaultChar
- the default character to return in case the map is emptymin
- a flag indicating whether to return the character associated with the lowest number in the map. Iffalse
then the character associated with the highest number found will be returned.- Returns:
- the character with the highest/lowest number associated.
-
isSymbol
protected boolean isSymbol(char ch) -
isAllowedDelimiter
protected boolean isAllowedDelimiter(char ch) -
apply
protected abstract void apply(char delimiter, char quote, char quoteEscape) Applies the discovered CSV format elements to theCsvParser
- Parameters:
delimiter
- the discovered delimiter characterquote
- the discovered quote characterquoteEscape
- the discovered quote escape character.
-