Documentation — SpinDynamics | SDK Reference & Developer Guides

Installation

Install the SpinDynamics SDK for your language of choice. All SDKs are published to their respective package registries and receive weekly releases aligned with platform updates.

Python

$ pip install spindynamics

Node.js

$ npm install @spindynamics/sdk

Requirements

Python 3.9+ or Node.js 18+. All SDKs are generated from our OpenAPI spec and ship with full type annotations.

Authentication

All API requests require a valid API key. You can generate keys from the SpinDynamics dashboard under Settings → API Keys. Keys are scoped to your organization and support granular permission controls.

auth.py
from spindynamics import Cortex

# Option 1: Pass the key directly
client = Cortex(api_key="sd_live_...")

# Option 2: Set the SPINDYNAMICS_API_KEY environment variable
# export SPINDYNAMICS_API_KEY=sd_live_...
client = Cortex()  # reads from env automatically

API keys follow the format sd_live_* for production and sd_test_* for sandbox environments. Test keys route to a simulated inference backend and are free to use during development.

Quick Start

Deploy a model with adaptive routing and run your first inference request in under 30 seconds. This example uses the Python SDK, but the same flow applies to all languages.

quickstart.py
from spindynamics import Cortex

client = Cortex()

# Deploy a model with adaptive routing
deployment = client.deploy(
    model="llama-3.1-70b",
    strategy="adaptive",
    regions=["us-east-1", "eu-west-1"],
    constraints={
        "max_latency_ms": 50,
        "data_residency": "eu-gdpr"
    }
)

# Run inference — routing is automatic
response = client.infer(
    deployment_id=deployment.id,
    prompt="Explain quantum computing",
    max_tokens=512
)

print(response.text)
print(f"Routed to: {response.region}")
print(f"Latency: {response.latency_ms}ms")

The strategy="adaptive" parameter tells the RL router to continuously optimize placement across the specified regions. The constraints object enforces hard limits -- in this case, 50ms max latency and EU GDPR data residency compliance.

Deployments

A Deployment represents a model that has been registered, compiled, and distributed across your infrastructure fleet. Deployments have a lifecycle: they are created, become active, can be updated, and eventually deleted.

Creating a deployment

Use client.deploy() to create a new deployment. The platform handles model compilation, quantization, and distribution to the target regions automatically.

deployments.py
# Create a deployment
deployment = client.deploy(
    model="llama-3.1-70b",
    strategy="adaptive",
    regions=["us-east-1", "eu-west-1", "ap-southeast-1"],
    autoscale={
        "min_replicas": 1,
        "max_replicas": 64,
        "target_p99_ms": 40
    }
)
print(deployment.id)       # "dep_3kx9f2a..."
print(deployment.status)   # "provisioning" -> "active"

Managing deployments

manage_deployments.py
# List all deployments
deployments = client.deployments.list()
for d in deployments:
    print(f"{d.id} - {d.model} - {d.status}")

# Get a specific deployment
dep = client.deployments.get("dep_3kx9f2a...")
print(dep.regions)       # ["us-east-1", "eu-west-1", ...]
print(dep.replicas)      # current active replica count
print(dep.metrics.p99)  # current p99 latency

# Delete a deployment
client.deployments.delete("dep_3kx9f2a...")

Deployments transition through the following states: provisioning → compiling → distributing → active. The full lifecycle typically completes in under 90 seconds for supported model architectures.

Routing Policies

Routing policies define how the RL engine makes placement decisions. Each policy specifies optimization objectives, relative weights, and hard constraints. The RL agent learns an optimal policy within these bounds.

routing.py
# Create a custom routing policy
policy = client.routing.create_policy(
    objectives=["minimize_latency", "minimize_cost"],
    weights=[0.7, 0.3],
    constraints={
        "excluded_regions": ["cn-*"],
        "preferred_providers": ["aws", "gcp"]
    }
)

# Attach policy to a deployment
client.deployments.update(
    "dep_3kx9f2a...",
    routing_policy=policy.id
)

# List active policies
policies = client.routing.list_policies()
for p in policies:
    print(f"{p.id} - {p.objectives} - {p.convergence_status}")

Available objectives include minimize_latency, minimize_cost, maximize_throughput, and maximize_availability. Weights are normalized and must sum to 1.0. The RL agent typically converges within 200 requests of a policy change.

Regions & Constraints

SpinDynamics supports 200+ regions across edge PoPs, cloud providers, and on-prem clusters. Region identifiers follow the pattern provider-geography-zone and support glob matching for flexible targeting.

us-east-1, eu-west-1, ap-southeast-1 -- standard cloud regions
edge-na-*, edge-eu-* -- edge point-of-presence groups
onprem-* -- on-premises clusters registered via the SpinDynamics agent

Constraints are hard limits enforced at the routing layer before the RL agent evaluates placement options. Available constraint types:

Constraint	Type	Description
max_latency_ms	`int`	Maximum acceptable p99 latency in milliseconds
data_residency	`string`	Compliance profile: `eu-gdpr`, `us-hipaa`, `ca-pipeda`, etc.
excluded_regions	`string[]`	Glob patterns for regions to exclude from routing
preferred_providers	`string[]`	Prefer specific cloud providers: `aws`, `gcp`, `azure`
encryption	`string`	Encryption standard: `aes-256-gcm`, `fips-140-3`

Python SDK Reference

The Python SDK (spindynamics) is the primary interface for interacting with the SpinDynamics platform. It is fully typed, async-native with sync wrappers, and supports streaming responses out of the box.

Cortex

The main client class. All platform operations are accessed through a Cortex instance.

Member	Kind	Description
Cortex(api_key=None)	class	Initialize the client. Reads `SPINDYNAMICS_API_KEY` from environment if `api_key` is not provided.
.deploy(model, strategy, regions, constraints, autoscale)	method	Create a new deployment. Returns a `Deployment` object.
.infer(deployment_id, prompt, max_tokens, stream)	method	Run inference against a deployment. Returns `InferenceResponse`.
.deployments	property	Access the deployments manager: `.list()`, `.get(id)`, `.update(id)`, `.delete(id)`.
.routing	property	Access the routing manager: `.create_policy()`, `.list_policies()`, `.get_policy(id)`.
.telemetry	property	Access telemetry data: `.query()`, `.export()`, `.alerts()`.

Deployment

Represents a live model deployment on the SpinDynamics platform.

Member	Kind	Description
.id	property	Unique deployment identifier (e.g. `dep_3kx9f2a...`).
.model	property	Model name string.
.status	property	Current lifecycle state: `provisioning`, `active`, `draining`, `deleted`.
.regions	property	List of active regions where the model is deployed.
.replicas	property	Current replica count across all regions.
.metrics	property	Live metrics: `.p50`, `.p99`, `.rps`, `.cost_per_1k`.

InferenceResponse

Member	Kind	Description
.text	property	The generated text output.
.region	property	Region where the inference was executed.
.latency_ms	property	End-to-end latency in milliseconds.
.tokens_used	property	Total tokens consumed (prompt + completion).
.trace_id	property	Distributed trace ID for observability.

RoutingPolicy

Member	Kind	Description
.id	property	Unique policy identifier.
.objectives	property	List of optimization objectives.
.weights	property	Objective weights (normalized to sum to 1.0).
.constraints	property	Hard constraints dictionary.
.convergence_status	property	RL convergence state: `exploring`, `converging`, `converged`.

Node.js SDK

The Node.js SDK (@spindynamics/sdk) provides a TypeScript-first interface with full type definitions. It uses native fetch under the hood and supports both ESM and CommonJS.

quickstart.ts
import { Cortex } from '@spindynamics/sdk';

const client = new Cortex({
    apiKey: process.env.SPINDYNAMICS_API_KEY,
});

const deployment = await client.deploy({
    model: 'llama-3.1-70b',
    strategy: 'adaptive',
    regions: ['us-east-1', 'eu-west-1'],
});

const response = await client.infer({
    deploymentId: deployment.id,
    prompt: 'Explain quantum computing',
    maxTokens: 512,
});

console.log(response.text);
console.log(`Routed to: ${response.region}`);

The Node.js SDK mirrors the Python SDK API surface. All methods return typed promises and support both callback and async/await patterns. Streaming responses use ReadableStream natively.

REST API

All platform capabilities are available via the REST API at https://api.spindynamics.net/v1. Authenticate by passing your API key in the Authorization header as a Bearer token.

Endpoints

POST /v1/deployments Create a new deployment

GET /v1/deployments/:id Get deployment details

GET /v1/deployments List all deployments

PATCH /v1/deployments/:id Update a deployment

DELETE /v1/deployments/:id Delete a deployment

POST /v1/infer Run inference

GET /v1/routing/policies List routing policies

POST /v1/routing/policies Create a routing policy

Example: cURL

terminal
# Create a deployment
curl -X POST https://api.spindynamics.net/v1/deployments \
  -H "Authorization: Bearer sd_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "strategy": "adaptive",
    "regions": ["us-east-1", "eu-west-1"],
    "constraints": {
      "max_latency_ms": 50,
      "data_residency": "eu-gdpr"
    }
  }'

# Run inference
curl -X POST https://api.spindynamics.net/v1/infer \
  -H "Authorization: Bearer sd_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "deployment_id": "dep_3kx9f2a...",
    "prompt": "Explain quantum computing",
    "max_tokens": 512
  }'

Rate Limits

The REST API enforces rate limits per API key: 1,000 requests/second for inference, 100 requests/second for management endpoints. Contact us for higher limits on Enterprise plans.

Custom Routing

Beyond the default adaptive strategy, SpinDynamics supports fully custom routing configurations. You can define multi-objective policies, set exploration rates for the RL agent, and create region-pinned deployments for strict compliance scenarios.

custom_routing.py
# Advanced: multi-objective policy with exploration
policy = client.routing.create_policy(
    name="latency-first-eu",
    objectives=[
        "minimize_latency",
        "minimize_cost",
        "maximize_availability"
    ],
    weights=[0.6, 0.2, 0.2],
    constraints={
        "excluded_regions": ["cn-*", "ru-*"],
        "preferred_providers": ["aws", "gcp"],
        "data_residency": "eu-gdpr",
        "encryption": "aes-256-gcm"
    },
    exploration_rate=0.05  # 5% exploration for continuous learning
)

# Pin to specific region (no RL — static routing)
pinned = client.routing.create_policy(
    name="eu-pinned",
    strategy="pinned",
    pin_region="eu-west-1"
)

The exploration_rate parameter controls how often the RL agent explores non-optimal routes to discover better placements. Set to 0 for pure exploitation (production-safe), or up to 0.1 for aggressive exploration during testing.

Auto-scaling

SpinDynamics provides predictive auto-scaling powered by time-series forecasting. The system provisions capacity 2-5 minutes ahead of demand spikes, eliminating cold-start cascades and over-provisioning.

autoscale.py
# Configure auto-scaling on a deployment
deployment = client.deploy(
    model="mixtral-8x7b",
    strategy="adaptive",
    regions=["us-*", "eu-*"],
    autoscale={
        "min_replicas": 0,        # scale to zero when idle
        "max_replicas": 256,
        "target_p99_ms": 40,
        "predictive": True,
        "gpu_type": "a100",
        "spot_fallback": True
    }
)

# Query current scaling state
state = client.deployments.get(deployment.id)
print(f"Replicas: {state.replicas}")
print(f"GPU utilization: {state.metrics.gpu_util}%")
print(f"Predicted demand (5m): {state.metrics.predicted_rps} rps")

When spot_fallback is enabled, the scheduler automatically falls back to spot instances during demand spikes, reducing compute costs by up to 70% without impacting latency SLAs.

Monitoring

SpinDynamics exposes full-stack observability through the telemetry API. Query latency distributions, cost attribution, model drift scores, and routing decisions programmatically.

monitoring.py
# Query telemetry for a deployment
metrics = client.telemetry.query(
    deployment_id="dep_3kx9f2a...",
    metrics=["p99_latency", "cost_per_1k", "rps", "drift_score"],
    interval="5m",
    range="24h"
)

for point in metrics.datapoints:
    print(f"{point.timestamp} | p99={point.p99_latency}ms | ${point.cost_per_1k}")

# Export to Prometheus / OpenTelemetry
client.telemetry.export(
    format="otlp",
    endpoint="https://otel-collector.internal:4317"
)

# Set up alerts
client.telemetry.create_alert(
    name="p99-breach",
    condition="p99_latency > 100",
    window="5m",
    notify=["slack://alerts", "pagerduty://prod"]
)

SpinDynamics natively integrates with Datadog, Prometheus, Grafana, and OpenTelemetry. All metrics are also available through the REST API at /v1/telemetry for custom dashboards and automation workflows.

Developer Documentation

Installation

Python

Node.js

Authentication

Quick Start

Deployments

Creating a deployment

Managing deployments

Routing Policies

Regions & Constraints

Python SDK Reference

Cortex

Deployment

InferenceResponse

RoutingPolicy

Node.js SDK

REST API

Endpoints

Example: cURL

Custom Routing

Auto-scaling

Monitoring