Stream Reader¶

Vinum Stream Reader class.

class vinum.StreamReader(reader)¶

StreamReader represents a stream of data which is used is an input for query processor.

Since input file may not fit into memory, StreamReader is the recommended way to execute queries on large files.

StreamReader instances are created by vinum.stream_* functions, for example: vinum.stream_csv().

Parameters

reader (pa.RecordBatchFileReader) – Arrow Stream Reader

Attributes

reader

Methods

sql(query)

Executes SQL SELECT query on an input stream and return the result as a Table materialized in memory.

sql(query: str)¶

Executes SQL SELECT query on an input stream and return the result as a Table materialized in memory.

Parameters: query (str) – SQL SELECT query.
Returns: Vinum Table instance.
Return type: vinum.Table

See also

sql_pd: Executes SQL SELECT query on a Table and returns the result of the query as a Pandas DataFrame.

Notes

Only SELECT statements are supported. For SELECT statements, JOINs and subqueries are currently not supported. However, optimizations aside, one can run a subsequent query on the result of a query, to model the behaviour of subqueries.

Table name in ‘select * from table’ clause is ignored. The table of the underlying DataFrame is used to run a query.

By default, all the Numpy functions are available via ‘np.*’ namespace.

User Defined Function can be registered via vinum.register_python() or vinum.register_numpy()

Examples

Run aggregation query on a csv stream:

>>> import vinum as vn
>>> query = 'select passenger_count pc, count(*) from t group by pc'
>>> vn.stream_csv('taxi.csv').sql(query).to_pandas()
   pc  count
0   0    165
1   5   3453
2   6    989
3   1  34808
4   2   7386
5   3   2183
6   4   1016

Table Input/Output functions