pinotdb
Python DB-API and SQLAlchemy dialect for Pinot.
Description
Python DB-API and SQLAlchemy dialect for Pinot
This module allows accessing Pinot via its SQL API.
Current supported Pinot version: 1.1.0.
Usage
Using the DB API to query Pinot Broker directly:
from pinotdb import connect
# this assumes 8000 is the broker port
conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
SELECT place,
CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
FROM places
LIMIT 10
""")
for row in curs:
print(row)
For HTTPS:
from pinotdb import connect
# this assumes that 443 is the broker secure https port
conn = connect(host='localhost', port=443, path='/query/sql', scheme='https')
curs = conn.cursor()
curs.execute("""
SELECT place,
CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
FROM places
LIMIT 10
""")
for row in curs:
print(row)
Pinot also supports basic auth, e.g.
conn = connect(host="localhost", port=443, path="/query/sql", scheme="https", username="my-user", password="my-password", verify_ssl=True)
To pass in additional query parameters (such as useMultistageEngine=true) you may pass
them in as part of the execute method. For example:
curs.execute("select * from airlineStats air limit 10", queryOptions="useMultistageEngine=true")
Broker query stats are exposed after execute() on cursor.query_stats:
curs.execute("select * from airlineStats air limit 10")
print(curs.query_stats.get("numServersQueried"))
print(curs.query_stats.get("numDocsScanned"))
print(curs.timeUsedMs) # Backward compatible shorthand
cursor.query_stats contains scalar top-level metrics returned by the broker
for the latest execute() call (works for both sync and async cursors).
Common keys include:
numServersQueriednumServersRespondednumSegmentsQueriednumSegmentsProcessednumSegmentsMatchednumConsumingSegmentsQueriednumDocsScannednumEntriesScannedInFilternumEntriesScannedPostFilternumGroupsLimitReachedtotalDocstimeUsedMsminConsumingFreshnessTimeMsnumSegmentsPrunedByBroker
If you need the full broker payload (including nested sections such as
resultTable, exceptions, and tracing information), use
cursor.raw_query_response.
Pass the Pinot database context
[!IMPORTANT] This feature is only available from 5.1.5
from pinotdb import connect
# this assumes that 443 is the broker secure https port
conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http', database='dbName')
curs = conn.cursor()
curs.execute("""
SELECT col1 from table1 LIMIT 10
""")
for row in curs:
print(row)
where,
dbName: the database context that needs to be passedtable1: table under thedbNamedatabase
If database is not specified the connection will use the default database context.
Using SQLAlchemy:
Since db engine requires more information beyond Pinot Broker, you need to provide pinot controller for table and schema information.
The db engine connection string is format as:
pinot+<pinot-broker-protocol>://<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=<pinot-controller-protocol>://<pinot-controller-host>:<pinot-controller-port>/
Default scheme is HTTP so you can ignore it. e.g. pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/ and pinot://localhost:8099/query/sql?controller=localhost:9000/ work in same way.
For HTTPS, you have to specify the https scheme explicitly along with the port.
pinot+https://<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=https://<pinot-controller-host>:<pinot-controller-port>/
E.g. pinot+https://pinot-broker.pinot.live:443/query/sql?controller=https://pinot-controller.pinot.live/.
Please note that the broker port 443 has to be explicitly put there.
This can be used as Superset to Pinot connection:
<img title="Superset Pinot Connection" src="assets/images/screenshots/superset-connection.png"/>If you have basic auth:
pinot+https://<my-user>:<my-password>@<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=https://<pinot-controller-host>:<pinot-controller-port>/[&&verify_ssl=<true/false>]
E.g.
pinot+https://my-user:my-password@my-secure-pinot-broker:443/query/sql?controller=https://my-secure-pinot-controller/&&verify_ssl=true.
Below are some sample scripts to query pinot using SQLAlchemy 2.x:
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine('pinot://localhost:8099/query/sql?controller=http://localhost:9000/') # uses HTTP by default :(
# or, using explicit HTTP:
# engine = create_engine('pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/')
# or, using explicit HTTPS:
# engine = create_engine('pinot+https://localhost:8099/query/sql?controller=https://localhost:9000/')
# or, provide extra argument to connect with multi-stage engine enabled:
# engine = create_engine(
# "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
# connect_args={"use_multistage_engine": "true"}
# )
metadata = MetaData()
places = Table('places', metadata, autoload_with=engine)
query = select(func.count()).select_from(places)
with engine.connect() as connection:
print(connection.execute(query).scalar())
To configure query parameters (such as timeoutMs=10000) at the engine level
you may pass them while creating the engine. For example:
engine = create_engine(
"pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
connect_args={"query_options": "use_multistage_engine=true;timeoutMs=10000"})
To support multi-stage engine, you can pass the use_multistage_engine parameter in the connect_args dictionary.
E.g. In SuperSet Engine Parameters, you can put the following JSON:
{"connect_args":{"use_multistage_engine":"true"}}
Pass the Pinot database context
[!IMPORTANT] This feature is only available from 5.1.5
Each connection should only query one Pinot Database, hence we provide that context through connection string itself.
The db engine connection string is format as:
pinot+http://pinot-broker:8099/query/sql?controller=http://pinot-controller:9000/&database=dbName
where dbName is the database context that needs to be passed.
If not specified the connection will use the default database context while querying.
Examples with Pinot Quickstart
Start Pinot Batch Quickstart
docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type batch
Once pinot batch quickstart is up, you can run below sample code snippet to query Pinot:
python3 examples/pinot_quickstart_batch.py
Sample Output:
Sending SQL to Pinot: SELECT * FROM baseballStats LIMIT 5
[0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 11, 11, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SFN', 0, 2004]
[2, 45, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 45, 43, 'aardsda01', 'David Allan', 1, 0, 0, 0, 1, 0, 0, 'CHN', 0, 2006]
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 25, 2, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'CHA', 0, 2007]
[1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 47, 5, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 1, 'BOS', 0, 2008]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 73, 3, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SEA', 0, 2009]
Sending SQL to Pinot: SELECT playerName, sum(runs) FROM baseballStats WHERE yearID>=2000 GROUP BY playerName LIMIT 5
['Scott Michael', 26.0]
['Justin Morgan', 0.0]
['Jason Andre', 0.0]
['Jeffrey Ellis', 0.0]
['Maximiliano R.', 16.0]
Sending SQL to Pinot: SELECT playerName,sum(runs) AS sum_runs FROM baseballStats WHERE yearID>=2000 GROUP BY playerName ORDER BY sum_runs DESC LIMIT 5
['Adrian', 1820.0]
['Jose Antonio', 1692.0]
['Rafael', 1565.0]
['Brian Michael', 1500.0]
['Alexander Emmanuel', 1426.0]
Start Pinot Hybrid Quickstart
docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type hybrid
Below is an example against Pinot Quickstart Hybrid:
python3 examples/pinot_quickstart_hybrid.py
Sending SQL to Pinot: SELECT * FROM airlineStats LIMIT 5
[171, 153, 19393, 0, 8, 8, 1433, '1400-1459', 0, 1425, 1240, 165, 'null', 0, 'WN', -2147483648, 1, 27, 17540, 0, 2, 2, 1242, '1200-1259', 0, 'MDW', 13232, 1323202, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 861, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 402, 1, -2147483648, -2147483648, 1, -2147483648, 'BOS', 10721, 1072102, 30721, 'Boston, MA', 'MA', 25, 'Massachusetts', 13, 1, ['null'], -2147483648, 'N556WN', 6, 12, -2147483648, 'WN', -2147483648, 1254, 1427, 2014]
[183, 141, 20398, 1, 17, 17, 1302, '1200-1259', 1, 1245, 1005, 160, 'null', 0, 'MQ', 0, 1, 27, 17540, 0, -6, 0, 959, '1000-1059', -1, 'CMH', 11066, 1106603, 31066, 'Columbus, OH', 'OH', 39, 'Ohio', 44, 990, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 3574, 1, 0, -2147483648, 1, 17, 'MIA', 13303, 1330303, 32467, 'Miami, FL', 'FL', 12, 'Florida', 33, 1, ['null'], 0, 'N605MQ', 13, 29, -2147483648, 'MQ', 0, 1028, 1249, 2014]
[-2147483648, -2147483648,