vinum
Vinum Stream Reader class.
StreamReader represents a stream of data which is used is an input for query processor.
Since input file may not fit into memory, StreamReader is the recommended way to execute queries on large files.
StreamReader instances are created by vinum.stream_* functions, for example: vinum.stream_csv().
vinum.stream_csv()
reader (pa.RecordBatchFileReader) – Arrow Stream Reader
Methods
sql(query)
sql
Executes SQL SELECT query on an input stream and return the result as a Table materialized in memory.
query (str) – SQL SELECT query.
Vinum Table instance.
vinum.Table
See also
sql_pd
Executes SQL SELECT query on a Table and returns the result of the query as a Pandas DataFrame.
Notes
Only SELECT statements are supported. For SELECT statements, JOINs and subqueries are currently not supported. However, optimizations aside, one can run a subsequent query on the result of a query, to model the behaviour of subqueries.
Table name in ‘select * from table’ clause is ignored. The table of the underlying DataFrame is used to run a query.
By default, all the Numpy functions are available via ‘np.*’ namespace.
User Defined Function can be registered via vinum.register_python() or vinum.register_numpy()
vinum.register_python()
vinum.register_numpy()
Examples
Run aggregation query on a csv stream:
>>> import vinum as vn >>> query = 'select passenger_count pc, count(*) from t group by pc' >>> vn.stream_csv('taxi.csv').sql(query).to_pandas() pc count 0 0 165 1 5 3453 2 6 989 3 1 34808 4 2 7386 5 3 2183 6 4 1016