Skip to main content

SageMaker transport for the Deepgram Python SDK

Project description

Deepgram SageMaker Transport

PyPI version Python 3.12+ MIT License

SageMaker transport for the Deepgram Python SDK. Uses AWS SageMaker's HTTP/2 bidirectional streaming API as an alternative to WebSocket, allowing transparent switching between Deepgram Cloud and Deepgram on SageMaker.

Requires Python 3.12+ (due to AWS SDK dependencies).

Installation

pip install deepgram-sagemaker

This installs aws-sdk-sagemaker-runtime-http2 and boto3 automatically.

Usage

The SageMaker transport is async-only and must be used with AsyncDeepgramClient:

import asyncio
from deepgram import AsyncDeepgramClient
from deepgram.core.events import EventType
from deepgram_sagemaker import SageMakerTransportFactory

factory = SageMakerTransportFactory(
    endpoint_name="my-deepgram-endpoint",
    region="us-west-2",
)

# SageMaker uses AWS credentials (not Deepgram API keys)
client = AsyncDeepgramClient(api_key="unused", transport_factory=factory)

async def main():
    async with client.listen.v1.connect(model="nova-3") as connection:
        connection.on(EventType.MESSAGE, lambda msg: print(msg))
        await connection.start_listening()

asyncio.run(main())

Configuration

For burst-tuned timeouts and retry behavior, build a SageMakerConfig and pass it via config=:

from deepgram_sagemaker import SageMakerConfig, SageMakerTransportFactory

config = SageMakerConfig(
    endpoint_name="my-deepgram-endpoint",
    region="us-east-2",
    connection_timeout=5.0,
    connection_acquire_timeout=15.0,
)
factory = SageMakerTransportFactory(config=config)

All time-based fields are float seconds (matching asyncio convention).

Parameter Required Default Description
endpoint_name Yes SageMaker endpoint name
region No us-west-2 AWS region
connection_timeout No 30.0 Max time for the underlying TCP/TLS connect (AWS default is ~2 s — bumped here so cold-start endpoints under burst load have time to accept TLS handshakes).
connection_acquire_timeout No 60.0 Max time to acquire a connection from the underlying HTTP/2 pool (AWS default is ~10 s — bumped so a 200–500-stream burst doesn't drain the acquire pool).
subscription_timeout No 60.0 Max time the transport waits for the SageMaker bidi stream to open before failing. A timeout here is treated as a transient connect failure and counts against max_retries / retry_budget.
max_concurrency No 500 Cap on simultaneous in-flight HTTP/2 streams. Advisory in Python today (the underlying smithy HTTP/2 stack does not expose a hard cap), but kept for surface parity with the Java transport.
max_retries No 5 Max retries on transient AWS errors (throttling, pool-exhausted, transient connect/timeout). Set to 0 to disable internal retry. Terminal errors (auth, validation) bypass this.
initial_backoff No 0.1 First backoff delay applied after the initial failure.
max_backoff No 5.0 Cap on the per-attempt backoff delay regardless of multiplier.
backoff_multiplier No 2.0 Exponential growth factor between retry attempts. Must be >= 1.0.
retry_budget No 30.0 Total wall-clock cap across all retry attempts before giving up and surfacing the error to the application.
max_replay_buffer_bytes No 8 * 1024 * 1024 Cap on the in-memory replay buffer that holds sent-but-unacked stream events. Set to 0 to disable replay (sent events are dropped on internal reset).

High-concurrency notes

The transport's defaults are tuned for high-burst workloads (large numbers of streams opened in a tight loop against an endpoint that may need to scale up). If you're opening 200–500 streams simultaneously against a cold endpoint, the AWS SDK's general-purpose defaults (~2 s connect, ~10 s acquire) will fire before the load balancer has accepted all of the inbound TLS handshakes — you will see a wave of acquire / connect timeouts that look like server-side problems but are really client-side fail-fast tripping early.

This transport ships with more lenient defaults (30 s / 60 s) so the common high-concurrency path works out of the box. Tighten them if you need fail-fast behavior in low-latency pipelines:

config = SageMakerConfig(
    endpoint_name="my-deepgram-endpoint",
    region="us-east-2",
    connection_timeout=5.0,
    connection_acquire_timeout=15.0,
)

Retry & storm absorption

Transient AWS-side failures (ThrottlingException, connection-pool exhaustion, transient connect/timeout failures) are absorbed by the transport itself: classified as retryable, retried with jittered exponential backoff up to max_retries and retry_budget, with messages buffered during the reset window replayed onto the new stream so audio isn't dropped. Only terminal errors (auth, validation, resource-not-found) and budget-exhausted retryable errors propagate to the application.

config = SageMakerConfig(
    endpoint_name="my-deepgram-endpoint",
    max_retries=10,
    initial_backoff=0.2,
    max_backoff=10.0,
    retry_budget=60.0,
)

Set max_retries=0 to disable internal retry entirely (every transient AWS error then surfaces immediately to the application).

AWS Credentials

The transport resolves AWS credentials using boto3's credential chain:

  • Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  • Shared credentials file (~/.aws/credentials)
  • IAM role (EC2, ECS, Lambda)

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepgram_sagemaker-0.3.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepgram_sagemaker-0.3.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file deepgram_sagemaker-0.3.0.tar.gz.

File metadata

  • Download URL: deepgram_sagemaker-0.3.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deepgram_sagemaker-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7d6510558f995b8f27c60f35244e23d3ba3566fe055af7cdb6dfb81d895923e6
MD5 f13c0f96b088f4b68be3fc5744d425f7
BLAKE2b-256 be9f41b8eec4c7d2e81d768db8218c313c7179e2166f9466d05407b27c4e7f30

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepgram_sagemaker-0.3.0.tar.gz:

Publisher: release-please.yml on deepgram/deepgram-python-sdk-transport-sagemaker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepgram_sagemaker-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for deepgram_sagemaker-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b33c3229b35aa431244388c20b5514fd28ac99fe679278d537d2501c77fe659e
MD5 810d0cd1c1f893e1d7c68f4d0510ea55
BLAKE2b-256 3bd42c4b876b06e111f71451d10609d0d1fc575b1c763bc237c84ced623cb1fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepgram_sagemaker-0.3.0-py3-none-any.whl:

Publisher: release-please.yml on deepgram/deepgram-python-sdk-transport-sagemaker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page