User Defined Functions

Functions to define UDFs.

register_numpy

vinum.register_numpy(function_name: str, function)None

Register Numpy function as a User Defined Function (UDF). UDF can perform vectorized operations on arrays passed as arguments.

Parameters
  • function_name (str) – Name of the User Defined Function.

  • function (callable) – Function to be used as a UDF. Function has to operate on vectorized numpy arrays. Numpy arrays will be passed as input arguments to the function and it should return numpy array.

See also

register_python

Register Python function as a User Defined Function.

Notes

Numpy package is imported under np namespace. You can invoke any function from the np.* namespace.

Arguments of the function would be numpy arrays of provided columns. UDF can perform vectorized operations on arrays passed as arguments. The function would be called only once.

Function names are case insensitive.

Examples

Define a function operating with Numpy arrays. Numpy function perform vectorized operations on input numpy arrays.

>>> import numpy as np
>>> import vinum as vn
>>> vn.register_numpy('cube', lambda x: np.power(x, 3))
>>> tbl = vn.Table.from_pydict({'len': [1, 2, 3], 'size': [7, 13, 17]})
>>> tbl.sql_pd('SELECT cube(size) from t ORDER BY cube(size) DESC')
   cube
0  4913
1  2197
2   343
>>> import numpy as np
>>> import vinum as vn
>>> vn.register_numpy('distance',
...                   lambda x, y: np.sqrt(np.square(x) + np.square(y)))
>>> tbl = vn.Table.from_pydict({'x': [1, 2, 3], 'y': [7, 13, 17]})
>>> tbl.sql_pd('select x, y, distance(x, y) as dist from t')
   x   y       dist
0  1   7   7.071068
1  2  13  13.152946
2  3  17  17.262677

Please note that x and y arguments are of np.array type. In both of the cases function perform vectorized operations on input numpy arrays.

>>> import numpy as np
>>> import vinum as vn
>>> def z_score(x: np.array):
...     """Compute Standard Score"""
...     mean = np.mean(x)
...     std = np.std(x)
...     return (x - mean) / std
...
>>> vn.register_numpy('score', z_score)
>>> tbl = vn.Table.from_pydict({'x': [1, 2, 3], 'y': [7, 13, 17]})
>>> tbl.sql_pd('select x, score(x), y, score(y) from t')
   x     score   y   score_1
0  1 -1.224745   7 -1.297771
1  2  0.000000  13  0.162221
2  3  1.224745  17  1.135550

Please note that x argument is of np.array type.

register_python

vinum.register_python(function_name: str, function)None

Register Python function as a User Defined Function (UDF).

Parameters
  • function_name (str) – Name of the User Defined Function.

  • function (callable, python function) – Function to be used as a UDF.

See also

register_numpy

Register Numpy function as a User Defined Function.

Notes

Python functions are “vectorized” before use, via numpy.vectorize. For better performance, please try to use numpy UDFs, operating in terms of numpy arrays. See vinum.register_numpy().

Function would be invoked for individual rows of the Table.

Any python packages used inside of the function should be imported before the invocation.

Function names are case insensitive.

Examples

Using lambda as a UDF:

>>> import vinum as vn
>>> vn.register_python('cube', lambda x: x**3)
>>> tbl = vn.Table.from_pydict({'len': [1, 2, 3], 'size': [7, 13, 17]})
>>> tbl.sql_pd('SELECT cube(size) from t ORDER BY cube(size) DESC')
   cube
0  4913
1  2197
2   343
>>> import math
>>> import vinum as vn
>>> vn.register_python('distance', lambda x, y: math.sqrt(x**2 + y**2))
>>> tbl = vn.Table.from_pydict({'x': [1, 2, 3], 'y': [7, 13, 17]})
>>> tbl.sql_pd('select x, y, distance(x, y) as dist from t')
   x   y       dist
0  1   7   7.071068
1  2  13  13.152946
2  3  17  17.262677

Using regular python function:

>>> import vinum as vn
>>> def sin_taylor(x):
...     "Taylor series approximation of the sine trig function around 0."
...     return x - x**3/6 + x**5/120 - x**7/5040
...
>>> vn.register_python('sin', sin_taylor)
>>> tbl = vn.Table.from_pydict({'x': [1, 2, 3], 'y': [7, 13, 17]})
>>> tbl.sql_pd('select sin(x) as sin_x, sin(y) as sin_y from t '
...            'order by sin_y')
      sin_x     sin_y
0  0.141120 -0.961397
1  0.909297  0.420167
2  0.841471  0.656987