redshift-connector

Name: redshift-connector
Author: Amazon Web Services
Redshift interface library
Downloads: 0 (30 days)
Description

=======================================================
redshift_connector
=======================================================

|Python Version| |PyPi|

.. |PyPi| image:: https://img.shields.io/pypi/v/redshift_connector.svg?maxAge=432000&style=flat-square
   :target: https://pypi.org/project/redshift_connector/

.. |Python Version| image:: https://img.shields.io/badge/python->=3.6-brightgreen.svg
   :target: https://pypi.org/project/redshift_connector/

``redshift_connector`` is the Amazon Redshift connector for
Python. Easy integration with `pandas <https://github.com/pandas-dev/pandas>`_ and `numpy <https://github.com/numpy/numpy>`_, as well as support for numerous Amazon Redshift specific features help you get the most out of your data

Supported Amazon Redshift features include:

- IAM authentication
- Identity provider (IdP) authentication
- Redshift specific data types


This pure Python connector implements `Python Database API Specification 2.0 <https://www.python.org/dev/peps/pep-0249/>`_.


Getting Started
---------------

Install from Binary
~~~~~~~~~~~~~~~~~~~

+----------------------------------------------------------------+--------------------+-----------------------------------------------------+
| Package Manager                                                | Downloads          | Installation Command                                |
+================================================================+====================+=====================================================+
| `PyPi <https://pypi.org/project/redshift-connector/>`_         |  |PyPi Downloads|  | ``pip install redshift_connector``                  |
+----------------------------------------------------------------+--------------------+-----------------------------------------------------+
| `Conda <https://anaconda.org/conda-forge/redshift_connector>`_ |  |Conda Downloads| | ``conda install -c conda-forge redshift_connector`` |
+----------------------------------------------------------------+--------------------+-----------------------------------------------------+

.. |PyPi Downloads| image:: https://pepy.tech/badge/redshift_connector
.. |Conda Downloads| image:: https://img.shields.io/conda/dn/conda-forge/redshift_connector.svg


Install from Source
~~~~~~~~~~~~~~~~~~~
You may install from source by cloning this repository.

.. code-block:: sh

    $ git clone https://github.com/aws/amazon-redshift-python-driver.git
    $ cd redshift_connector
    $ pip install .

Tutorials
~~~~~~~~~
- `001 - Connecting to Amazon Redshift <https://github.com/aws/amazon-redshift-python-driver/blob/master/tutorials/001%20-%20Connecting%20to%20Amazon%20Redshift.ipynb>`_
- `002 - Data Science Library Integrations <https://github.com/aws/amazon-redshift-python-driver/blob/master/tutorials/002%20-%20Data%20Science%20Library%20Integrations.ipynb>`_
- `003 - Amazon Redshift Feature Support <https://github.com/aws/amazon-redshift-python-driver/blob/master/tutorials/003%20-%20Amazon%20Redshift%20Feature%20Support.ipynb>`_
- `004 - Amazon Redshift Datatypes <https://github.com/aws/amazon-redshift-python-driver/blob/master/tutorials/004%20-%20Amazon%20Redshift%20Datatypes.ipynb>`_

We are working to add more documentation and would love your feedback. Please reach out to the team by `opening an issue <https://github.com/aws/amazon-redshift-python-driver/issues/new/choose>`__ or `starting a discussion <https://github.com/aws/amazon-redshift-python-driver/discussions/new>`_ to help us fill in the gaps in our documentation.

Integrations
~~~~~~~~~~~~
``redshift_connector`` integrates with various open source projects to provide an interface to Amazon Redshift. Please `open an issue <https://github.com/aws/amazon-redshift-python-driver/issues/new/choose>`__ with our project to request new integrations or get support for a ``redshift_connector`` issue seen in an existing integration.

- `apache-airflow <https://github.com/apache/airflow>`_
- `querybook <https://github.com/pinterest/querybook>`_
- `sqlalchemy-redshift <https://github.com/sqlalchemy-redshift/sqlalchemy-redshift>`_

Basic Example
~~~~~~~~~~~~~
.. code-block:: python

    import redshift_connector

    # Connects to Redshift cluster using AWS credentials
    conn = redshift_connector.connect(
        host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
        database='dev',
        user='awsuser',
        password='my_password'
     )

    cursor: redshift_connector.Cursor = conn.cursor()
    cursor.execute("create Temp table book(bookname varchar,author varchar)")
    cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                        [
                            ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                            ('A Brief History of Time', 'Stephen Hawking')
                        ]
                      )
    cursor.execute("select * from book")

    result: tuple = cursor.fetchall()
    print(result)
    >> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Enabling autocommit
~~~~~~~~~~~~~~~~~~~
**Following the DB-API specification, autocommit is off by default**. It can be turned on by using the autocommit property of the connection.

.. code-block:: py3

    # Make sure we're not in a transaction
    conn.rollback()
    conn.autocommit = True
    conn.run("VACUUM")
    conn.autocommit = False


Configuring paramstyle
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Paramstyle can be set on both a module and cursor level. When paramstyle is set on a module level e.g. ``redshift_connector.paramstyle = 'qmark'``, the user specified paramstyle is used for all subsequent cursors unless set on the cursor.
When paramstyle is set on the cursor e.g. ``cursor.paramstyle = 'qmark'`` the user specified paramstyle is only used for that cursor object.

.. code-block:: python

    # setting paramstyle to qmark on a module level
    redshift_connector.paramstyle = 'qmark'


    with redshift_connector.connect() as conn1:
        with conn1.cursor() as cursor1: # this cursor will use qmark paramstyle as it's been set on the module level
            pass

        with conn1.cursor() as cursor2:
            # setting paramstyle to numeric on the cursor level only this cursor will use numeric paramstyle
            cursor.paramstyle = 'numeric'

        with conn1.cursor() as cursor3: # this cursor will use qmark paramstyle as it's been set on the module level
            pass

     with redshift_connector.connect() as conn2:
        with conn2.cursor() as cursor1: # this cursor will use qmark paramstyle as it's been set on the module level
            pass


The module level default paramstyle used is ``format``. Valid values for ``paramstyle`` include ``qmark, numeric, named, format, pyformat``. The below example shows how to use various paramstyles after the paramstyle is set on the cursor.

When paramstyle is set to ``named`` or ``pyformat``, parameters must be passed as a Python dictionary to the ``execute()`` method. Other paramstyles require parameters to be passed as a Python tuple or list.

.. code-block:: python

    # qmark
    cursor.paramstyle = 'qmark'
    sql = 'insert into foo(bar, jar) VALUES(?, ?)'
    cursor.execute(sql, (1, "hello world"))

    # numeric
    cursor.paramstyle = 'numeric'
    sql = 'insert into foo(bar, jar) VALUES(:1, :2)'
    cursor.execute(sql, (1, "hello world"))

    # named
    cursor.paramstyle = 'named'
    sql = 'insert into foo(bar, jar) VALUES(:p1, :p2)'
    cursor.execute(sql, {"p1":1, "p2":"hello world"})

    # format
    cursor.paramstyle = 'format'
    sql = 'insert into foo(bar, jar) VALUES(%s, %s)'
    cursor.execute(sql, (1, "hello world"))

    # pyformat
    cursor.paramstyle = 'pyformat'
    sql = 'insert into foo(bar, jar) VALUES(%(bar)s, %(jar)s)'
    cursor.execute(sql, {"bar": 1, "jar": "hello world"})


Exception Handling
~~~~~~~~~~~~~~~~~~~
``redshift_connector`` uses the guideline for exception handling specified in the `Python DB-API <https://www.python.org/dev/peps/pep-0249/#exceptions>`_. For exception definitions, please see `redshift_connector/error.py <https://github.com/aws/amazon-redshift-python-driver/blob/master/redshift_connector/error.py>`_

Example using IAM Credentials
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IAM Credentials can be supplied directly to ``connect(...)`` using an AWS profile as shown below:

.. code-block:: python

    import redshift_connector

    # Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
    conn = redshift_connector.connect(
        iam=True,
        database='dev',
        db_user='awsuser',
        password='',
        user='',
        cluster_identifier='examplecluster',
        profile='default'
     )

.. code-block:: bash

    # ~/.aws/credentials
    [default]
    aws_access_key_id="my_aws_access_key_id"
    aws_secret_access_key="my_aws_secret_access_key"
    aws_session_token="my_aws_session_token"

    # ~/.aws/config
    [default]
    region=us-west-2

If a region is not provided in `~/.aws/config` or you would like to override its value, `region` may be passed to ``connect(...)``.

Alternatively, IAM credentials can be supplied directly to ``connect(...)`` using AWS credentials as shown below:

.. code-block:: python

    import redshift_connector

    # Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
    conn = redshift_connector.connect(
        iam=True,
        database='dev',
        db_user='awsuser',
        password='',
        user='',
        cluster_identifier='examplecluster',
        access_key_id="my_aws_access_key_id",
        secret_access_key="my_aws_secret_access_key",
        session_token="my_aws_session_token",
        region="us-east-2"
     )

Integration with pandas
~~~~~~~~~~~~~~~~~~~~~~~

Retrieving query results as a ``pandas.DataFrame``