Class BatchedColumnProcessor

All Implemented Interfaces:
BatchedColumnReader<String>, ColumnReader<String>, Processor<ParsingContext>, RowProcessor

public abstract class BatchedColumnProcessor extends AbstractBatchedColumnProcessor<ParsingContext> implements RowProcessor
A RowProcessor implementation that stores values of columns in batches. Use this implementation in favor of ColumnProcessor when processing large inputs to avoid running out of memory. Values parsed in each row will be split into columns of Strings. Each column has its own list of values.

During the execution of the process, the AbstractBatchedColumnProcessor.batchProcessed(int) method will be invoked after a given number of rows has been processed.

The user can access the lists with values parsed for all columns using the methods AbstractBatchedColumnProcessor.getColumnValuesAsList(), AbstractBatchedColumnProcessor.getColumnValuesAsMapOfIndexes() and AbstractBatchedColumnProcessor.getColumnValuesAsMapOfNames().

After AbstractBatchedColumnProcessor.batchProcessed(int) is invoked, all values will be discarded and the next batch of column values will be accumulated. This process will repeat until there's no more rows in the input.

See Also:
  • Constructor Details

    • BatchedColumnProcessor

      public BatchedColumnProcessor(int rowsPerBatch)
      Constructs a batched column processor configured to invoke the AbstractBatchedColumnProcessor.batchesProcessed method after a given number of rows has been processed.
      Parameters:
      rowsPerBatch - the number of rows to process in each batch.