azure-eventhub
Microsoft Azure Event Hubs Client Library for Python
Description
Azure Event Hubs client library for Python
Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them to multiple consumers. This lets you process and analyze the massive amounts of data produced by your connected devices and applications. Once Event Hubs has collected the data, you can retrieve, transform, and store it by using any real-time analytics provider or with batching/storage adapters. If you would like to know more about Azure Event Hubs, you may wish to review: What is Event Hubs?
The Azure Event Hubs client library allows for publishing and consuming of Azure Event Hubs events and may be used to:
- Emit telemetry about your application for business intelligence and diagnostic purposes.
- Publish facts about the state of your application which interested parties may observe and use as a trigger for taking action.
- Observe interesting operations and interactions happening within your business or other ecosystem, allowing loosely coupled systems to interact without the need to bind them together.
- Receive events from one or more publishers, transform them to better meet the needs of your ecosystem, then publish the transformed events to a new stream for consumers to observe.
Source code | Package (PyPi) | Package (Conda) | [API reference documentation][api_reference] | Product documentation | Samples
Getting started
Prerequisites
-
Python 3.9 or later.
-
Microsoft Azure Subscription: To use Azure services, including Azure Event Hubs, you'll need a subscription. If you do not have an existing Azure account, you may sign up for a free trial or use your MSDN subscriber benefits when you create an account.
-
Event Hubs namespace with an Event Hub: To interact with Azure Event Hubs, you'll also need to have a namespace and Event Hub available. If you are not familiar with creating Azure resources, you may wish to follow the step-by-step guide for creating an Event Hub using the Azure portal. There, you can also find detailed instructions for using the Azure CLI, Azure PowerShell, or Azure Resource Manager (ARM) templates to create an Event Hub.
Install the package
Install the Azure Event Hubs client library for Python with pip:
$ pip install azure-eventhub
Authenticate the client
Interaction with Event Hubs starts with an instance of EventHubConsumerClient or EventHubProducerClient class. You need either the host name, SAS/AAD credential and event hub name or a connection string to instantiate the client object.
Create client from connection string:
For the Event Hubs client library to interact with an Event Hub, the easiest means is to use a connection string, which is created automatically when creating an Event Hubs namespace. If you aren't familiar with shared access policies in Azure, you may wish to follow the step-by-step guide to get an Event Hubs connection string.
- The
from_connection_stringmethod takes the connection string of the formEndpoint=sb://<yournamespace>.servicebus.windows.net/;SharedAccessKeyName=<yoursharedaccesskeyname>;SharedAccessKey=<yoursharedaccesskey>and entity name to your Event Hub instance. You can get the connection string from the Azure portal.
Create client using the azure-identity library:
Alternately, one can use a Credential object to authenticate via AAD with the azure-identity package.
- This constructor demonstrated in the sample linked above takes the host name and entity name of your Event Hub instance and credential that implements the
TokenCredential
protocol. There are implementations of the
TokenCredentialprotocol available in the azure-identity package. The host name is of the format<yournamespace.servicebus.windows.net>. - To use the credential types provided by
azure-identity, please install the package:pip install azure-identity - Additionally, to use the async API, you must first install an async transport, such as
aiohttp:pip install aiohttp - When using Azure Active Directory, your principal must be assigned a role which allows access to Event Hubs, such as the Azure Event Hubs Data Owner role. For more information about using Azure Active Directory authorization with Event Hubs, please refer to the associated documentation.
Key concepts
-
An EventHubProducerClient is a source of telemetry data, diagnostics information, usage logs, or other log data, as part of an embedded device solution, a mobile device application, a game title running on a console or other device, some client or server based business solution, or a web site.
-
An EventHubConsumerClient picks up such information from the Event Hub and processes it. Processing may involve aggregation, complex computation, and filtering. Processing may also involve distribution or storage of the information in a raw or transformed fashion. Event Hub consumers are often robust and high-scale platform infrastructure parts with built-in analytics capabilities, like Azure Stream Analytics, Apache Spark, or Apache Storm.
-
A partition is an ordered sequence of events that is held in an Event Hub. Azure Event Hubs provides message streaming through a partitioned consumer pattern in which each consumer only reads a specific subset, or partition, of the message stream. As newer events arrive, they are added to the end of this sequence. The number of partitions is specified at the time an Event Hub is created and cannot be changed.
-
A consumer group is a view of an entire Event Hub. Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at their own pace and from their own position. There can be at most 5 concurrent readers on a partition per consumer group; however it is recommended that there is only one active consumer for a given partition and consumer group pairing. Each active reader receives all of the events from its partition; if there are multiple readers on the same partition, then they will receive duplicate events.
For more concepts and deeper discussion, see: Event Hubs Features. Also, the concepts for AMQP are well documented in OASIS Advanced Messaging Queuing Protocol (AMQP) Version 1.0.
Thread safety
We do not guarantee that the EventHubProducerClient or EventHubConsumerClient are thread-safe or coroutine-safe. We do not recommend reusing these instances across threads or sharing them between coroutines. It is up to the running application to use these classes in a concurrency-safe manner.
The data model type, EventDataBatch is not thread-safe or coroutine-safe. It should not be shared across threads nor used concurrently with client methods.
For scenarios requiring concurrent sending from multiple threads, ensure proper thread-safety management using mechanisms like threading.Lock(). Note: Native async APIs should be used instead of running in a ThreadPoolExecutor, if possible.
import threading
from concurrent.futures import ThreadPoolExecutor
from azure.eventhub import EventHubProducerClient, EventData
from azure.identity import DefaultAzureCredential
EVENTHUB_NAMESPACE = "<your-namespace>.servicebus.windows.net"
EVENTHUB_NAME = "<your-eventhub-name>"
# Create a global lock
producer_lock = threading.Lock()
def send_batch(producer_id, producer):
with producer_lock:
event_data_batch = producer.create_batch()
for i in range(10):
event_data_batch.add(EventData(f"Message {i} from producer {producer_id}"))
producer.send_batch(event_data_batch)
print(f"Producer {producer_id} sent batch.")
credential = DefaultAzureCredential()
producer = EventHubProducerClient(
fully_qualified_namespace=EVENTHUB_NAMESPACE,
eventhub_name=EVENTHUB_NAME,
credential=credential
)
with producer:
with ThreadPoolExecutor(max_workers=5) as executor:
for i in range(5): # Launch 5 threads
executor.submit(send_batch, i, producer)
For scenarios requiring concurrent sending in asyncio applications, ensure proper coroutine-safety management using mechanisms like asyncio.Lock()
import asyncio
from azure.eventhub.aio import EventHubProducerClient
from azure.eventhub import EventData
from azure.identity.aio import DefaultAzureCredential
EVENTHUB_NAMESPACE = "<your-namespace>.servicebus.windows.net"
EVENTHUB_NAME = "<your-eventhub-name>"
# Shared lock for coroutine-safe access
producer_lock = asyncio.Lock()
async def send_batch(producer_id, producer):