Interface BatchedColumnReader<T>

Type Parameters:
T - the type of the data stored by the columns.
All Superinterfaces:
ColumnReader<T>
All Known Implementing Classes:
AbstractBatchedColumnProcessor, AbstractBatchedObjectColumnProcessor, BatchedColumnProcessor, BatchedObjectColumnProcessor

interface BatchedColumnReader<T> extends ColumnReader<T>
A common interface for Processors that collect the values parsed from each column in a row and store values of columns in batches.

Use implementations of this interface implementation in favor of ColumnReader when processing large inputs to avoid running out of memory.

During the execution of the process, the batchProcessed(int) method will be invoked after a given number of rows has been processed.

The user can access the lists with values parsed for all columns using the methods ColumnReader.getColumnValuesAsList(), ColumnReader.getColumnValuesAsMapOfIndexes() and ColumnReader.getColumnValuesAsMapOfNames().

After batchProcessed(int) is invoked, all values will be discarded and the next batch of column values will be accumulated. This process will repeat until there's no more rows in the input.

See Also:
  • Method Details

    • getRowsPerBatch

      int getRowsPerBatch()
      Returns the number of rows processed in each batch
      Returns:
      the number of rows per batch
    • getBatchesProcessed

      int getBatchesProcessed()
      Returns the number of batches already processed
      Returns:
      the number of batches already processed
    • batchProcessed

      void batchProcessed(int rowsInThisBatch)
      Callback to the user, where the lists with values parsed for all columns can be accessed using the methods ColumnReader.getColumnValuesAsList(), ColumnReader.getColumnValuesAsMapOfIndexes() and ColumnReader.getColumnValuesAsMapOfNames().
      Parameters:
      rowsInThisBatch - the number of rows processed in the current batch. This corresponds to the number of elements of each list of each column.