Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

pinotdb

Python DB-API and SQLAlchemy dialect for Pinot.

Downloads: 0 (30 days)

Description

Python DB-API and SQLAlchemy dialect for Pinot

This module allows accessing Pinot via its SQL API.

Current supported Pinot version: 1.1.0.

Usage

Using the DB API to query Pinot Broker directly:

from pinotdb import connect

# this assumes 8000 is the broker port
conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

For HTTPS:

from pinotdb import connect

# this assumes that 443 is the broker secure https port
conn = connect(host='localhost', port=443, path='/query/sql', scheme='https')
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

Pinot also supports basic auth, e.g.

conn = connect(host="localhost", port=443, path="/query/sql", scheme="https", username="my-user", password="my-password", verify_ssl=True)

To pass in additional query parameters (such as useMultistageEngine=true) you may pass them in as part of the execute method. For example:

curs.execute("select * from airlineStats air limit 10", queryOptions="useMultistageEngine=true")

Broker query stats are exposed after execute() on cursor.query_stats:

curs.execute("select * from airlineStats air limit 10")
print(curs.query_stats.get("numServersQueried"))
print(curs.query_stats.get("numDocsScanned"))
print(curs.timeUsedMs)  # Backward compatible shorthand

cursor.query_stats contains scalar top-level metrics returned by the broker for the latest execute() call (works for both sync and async cursors). Common keys include:

  • numServersQueried
  • numServersResponded
  • numSegmentsQueried
  • numSegmentsProcessed
  • numSegmentsMatched
  • numConsumingSegmentsQueried
  • numDocsScanned
  • numEntriesScannedInFilter
  • numEntriesScannedPostFilter
  • numGroupsLimitReached
  • totalDocs
  • timeUsedMs
  • minConsumingFreshnessTimeMs
  • numSegmentsPrunedByBroker

If you need the full broker payload (including nested sections such as resultTable, exceptions, and tracing information), use cursor.raw_query_response.

Pass the Pinot database context

[!IMPORTANT] This feature is only available from 5.1.5

from pinotdb import connect

# this assumes that 443 is the broker secure https port
conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http', database='dbName')
curs = conn.cursor()
curs.execute("""
    SELECT col1 from table1 LIMIT 10
""")
for row in curs:
    print(row)

where,

  • dbName : the database context that needs to be passed
  • table1 : table under the dbName database

If database is not specified the connection will use the default database context.

Using SQLAlchemy:

Since db engine requires more information beyond Pinot Broker, you need to provide pinot controller for table and schema information.

The db engine connection string is format as:

pinot+<pinot-broker-protocol>://<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=<pinot-controller-protocol>://<pinot-controller-host>:<pinot-controller-port>/

Default scheme is HTTP so you can ignore it. e.g. pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/ and pinot://localhost:8099/query/sql?controller=localhost:9000/ work in same way.

For HTTPS, you have to specify the https scheme explicitly along with the port.

pinot+https://<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=https://<pinot-controller-host>:<pinot-controller-port>/

E.g. pinot+https://pinot-broker.pinot.live:443/query/sql?controller=https://pinot-controller.pinot.live/.

Please note that the broker port 443 has to be explicitly put there.

This can be used as Superset to Pinot connection:

<img title="Superset Pinot Connection" src="assets/images/screenshots/superset-connection.png"/>

If you have basic auth:

pinot+https://<my-user>:<my-password>@<pinot-broker-host>:<pinot-broker-port><pinot-broker-path>?controller=https://<pinot-controller-host>:<pinot-controller-port>/[&&verify_ssl=<true/false>]

E.g. pinot+https://my-user:my-password@my-secure-pinot-broker:443/query/sql?controller=https://my-secure-pinot-controller/&&verify_ssl=true.

Below are some sample scripts to query pinot using SQLAlchemy 2.x:

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('pinot://localhost:8099/query/sql?controller=http://localhost:9000/')  # uses HTTP by default :(
# or, using explicit HTTP:
# engine = create_engine('pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/')
# or, using explicit HTTPS:
# engine = create_engine('pinot+https://localhost:8099/query/sql?controller=https://localhost:9000/')
# or, provide extra argument to connect with multi-stage engine enabled:
# engine = create_engine(
#     "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
#     connect_args={"use_multistage_engine": "true"}
# )

metadata = MetaData()
places = Table('places', metadata, autoload_with=engine)
query = select(func.count()).select_from(places)
with engine.connect() as connection:
    print(connection.execute(query).scalar())

To configure query parameters (such as timeoutMs=10000) at the engine level you may pass them while creating the engine. For example:

engine = create_engine(
        "pinot://localhost:8000/query/sql?controller=http://localhost:9000/",
        connect_args={"query_options": "use_multistage_engine=true;timeoutMs=10000"})

To support multi-stage engine, you can pass the use_multistage_engine parameter in the connect_args dictionary.

E.g. In SuperSet Engine Parameters, you can put the following JSON:

{"connect_args":{"use_multistage_engine":"true"}}

Pass the Pinot database context

[!IMPORTANT] This feature is only available from 5.1.5

Each connection should only query one Pinot Database, hence we provide that context through connection string itself.

The db engine connection string is format as:

pinot+http://pinot-broker:8099/query/sql?controller=http://pinot-controller:9000/&database=dbName

where dbName is the database context that needs to be passed. If not specified the connection will use the default database context while querying.

Examples with Pinot Quickstart

Start Pinot Batch Quickstart

docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type batch

Once pinot batch quickstart is up, you can run below sample code snippet to query Pinot:

python3 examples/pinot_quickstart_batch.py

Sample Output:

Sending SQL to Pinot: SELECT * FROM baseballStats LIMIT 5
[0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 11, 11, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SFN', 0, 2004]
[2, 45, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 45, 43, 'aardsda01', 'David Allan', 1, 0, 0, 0, 1, 0, 0, 'CHN', 0, 2006]
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 25, 2, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'CHA', 0, 2007]
[1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 47, 5, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 1, 'BOS', 0, 2008]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 73, 3, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SEA', 0, 2009]

Sending SQL to Pinot: SELECT playerName, sum(runs) FROM baseballStats WHERE yearID>=2000 GROUP BY playerName LIMIT 5
['Scott Michael', 26.0]
['Justin Morgan', 0.0]
['Jason Andre', 0.0]
['Jeffrey Ellis', 0.0]
['Maximiliano R.', 16.0]

Sending SQL to Pinot: SELECT playerName,sum(runs) AS sum_runs FROM baseballStats WHERE yearID>=2000 GROUP BY playerName ORDER BY sum_runs DESC LIMIT 5
['Adrian', 1820.0]
['Jose Antonio', 1692.0]
['Rafael', 1565.0]
['Brian Michael', 1500.0]
['Alexander Emmanuel', 1426.0]

Start Pinot Hybrid Quickstart

docker run --name pinot-quickstart -p 2123:2123 -p 9000:9000 -p 8000:8000 -d apachepinot/pinot:latest QuickStart -type hybrid

Below is an example against Pinot Quickstart Hybrid:

python3 examples/pinot_quickstart_hybrid.py
Sending SQL to Pinot: SELECT * FROM airlineStats LIMIT 5
[171, 153, 19393, 0, 8, 8, 1433, '1400-1459', 0, 1425, 1240, 165, 'null', 0, 'WN', -2147483648, 1, 27, 17540, 0, 2, 2, 1242, '1200-1259', 0, 'MDW', 13232, 1323202, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 861, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 402, 1, -2147483648, -2147483648, 1, -2147483648, 'BOS', 10721, 1072102, 30721, 'Boston, MA', 'MA', 25, 'Massachusetts', 13, 1, ['null'], -2147483648, 'N556WN', 6, 12, -2147483648, 'WN', -2147483648, 1254, 1427, 2014]
[183, 141, 20398, 1, 17, 17, 1302, '1200-1259', 1, 1245, 1005, 160, 'null', 0, 'MQ', 0, 1, 27, 17540, 0, -6, 0, 959, '1000-1059', -1, 'CMH', 11066, 1106603, 31066, 'Columbus, OH', 'OH', 39, 'Ohio', 44, 990, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 3574, 1, 0, -2147483648, 1, 17, 'MIA', 13303, 1330303, 32467, 'Miami, FL', 'FL', 12, 'Florida', 33, 1, ['null'], 0, 'N605MQ', 13, 29, -2147483648, 'MQ', 0, 1028, 1249, 2014]
[-2147483648, -2147483648,