Platform Pricing Solutions People Docs Log in Request Demo
Documentation

Developer Documentation

Everything you need to deploy, route, and optimize inference with SpinDynamics. Python and Node.js SDKs, plus a full REST API.

Installation

Install the SpinDynamics SDK for your language of choice. All SDKs are published to their respective package registries and receive weekly releases aligned with platform updates.

Python

$ pip install spindynamics

Node.js

$ npm install @spindynamics/sdk
Requirements

Python 3.9+ or Node.js 18+. All SDKs are generated from our OpenAPI spec and ship with full type annotations.

Authentication

All API requests require a valid API key. You can generate keys from the SpinDynamics dashboard under Settings → API Keys. Keys are scoped to your organization and support granular permission controls.

auth.py
from spindynamics import Cortex # Option 1: Pass the key directly client = Cortex(api_key="sd_live_...") # Option 2: Set the SPINDYNAMICS_API_KEY environment variable # export SPINDYNAMICS_API_KEY=sd_live_... client = Cortex() # reads from env automatically

API keys follow the format sd_live_* for production and sd_test_* for sandbox environments. Test keys route to a simulated inference backend and are free to use during development.

Quick Start

Deploy a model with adaptive routing and run your first inference request in under 30 seconds. This example uses the Python SDK, but the same flow applies to all languages.

quickstart.py
from spindynamics import Cortex client = Cortex() # Deploy a model with adaptive routing deployment = client.deploy( model="llama-3.1-70b", strategy="adaptive", regions=["us-east-1", "eu-west-1"], constraints={ "max_latency_ms": 50, "data_residency": "eu-gdpr" } ) # Run inference — routing is automatic response = client.infer( deployment_id=deployment.id, prompt="Explain quantum computing", max_tokens=512 ) print(response.text) print(f"Routed to: {response.region}") print(f"Latency: {response.latency_ms}ms")

The strategy="adaptive" parameter tells the RL router to continuously optimize placement across the specified regions. The constraints object enforces hard limits -- in this case, 50ms max latency and EU GDPR data residency compliance.

Deployments

A Deployment represents a model that has been registered, compiled, and distributed across your infrastructure fleet. Deployments have a lifecycle: they are created, become active, can be updated, and eventually deleted.

Creating a deployment

Use client.deploy() to create a new deployment. The platform handles model compilation, quantization, and distribution to the target regions automatically.

deployments.py
# Create a deployment deployment = client.deploy( model="llama-3.1-70b", strategy="adaptive", regions=["us-east-1", "eu-west-1", "ap-southeast-1"], autoscale={ "min_replicas": 1, "max_replicas": 64, "target_p99_ms": 40 } ) print(deployment.id) # "dep_3kx9f2a..." print(deployment.status) # "provisioning" -> "active"

Managing deployments

manage_deployments.py
# List all deployments deployments = client.deployments.list() for d in deployments: print(f"{d.id} - {d.model} - {d.status}") # Get a specific deployment dep = client.deployments.get("dep_3kx9f2a...") print(dep.regions) # ["us-east-1", "eu-west-1", ...] print(dep.replicas) # current active replica count print(dep.metrics.p99) # current p99 latency # Delete a deployment client.deployments.delete("dep_3kx9f2a...")

Deployments transition through the following states: provisioningcompilingdistributingactive. The full lifecycle typically completes in under 90 seconds for supported model architectures.

Routing Policies

Routing policies define how the RL engine makes placement decisions. Each policy specifies optimization objectives, relative weights, and hard constraints. The RL agent learns an optimal policy within these bounds.

routing.py
# Create a custom routing policy policy = client.routing.create_policy( objectives=["minimize_latency", "minimize_cost"], weights=[0.7, 0.3], constraints={ "excluded_regions": ["cn-*"], "preferred_providers": ["aws", "gcp"] } ) # Attach policy to a deployment client.deployments.update( "dep_3kx9f2a...", routing_policy=policy.id ) # List active policies policies = client.routing.list_policies() for p in policies: print(f"{p.id} - {p.objectives} - {p.convergence_status}")

Available objectives include minimize_latency, minimize_cost, maximize_throughput, and maximize_availability. Weights are normalized and must sum to 1.0. The RL agent typically converges within 200 requests of a policy change.

Regions & Constraints

SpinDynamics supports 200+ regions across edge PoPs, cloud providers, and on-prem clusters. Region identifiers follow the pattern provider-geography-zone and support glob matching for flexible targeting.

  • us-east-1, eu-west-1, ap-southeast-1 -- standard cloud regions
  • edge-na-*, edge-eu-* -- edge point-of-presence groups
  • onprem-* -- on-premises clusters registered via the SpinDynamics agent

Constraints are hard limits enforced at the routing layer before the RL agent evaluates placement options. Available constraint types:

ConstraintTypeDescription
max_latency_msintMaximum acceptable p99 latency in milliseconds
data_residencystringCompliance profile: eu-gdpr, us-hipaa, ca-pipeda, etc.
excluded_regionsstring[]Glob patterns for regions to exclude from routing
preferred_providersstring[]Prefer specific cloud providers: aws, gcp, azure
encryptionstringEncryption standard: aes-256-gcm, fips-140-3

Python SDK Reference

The Python SDK (spindynamics) is the primary interface for interacting with the SpinDynamics platform. It is fully typed, async-native with sync wrappers, and supports streaming responses out of the box.

Cortex

The main client class. All platform operations are accessed through a Cortex instance.

MemberKindDescription
Cortex(api_key=None)classInitialize the client. Reads SPINDYNAMICS_API_KEY from environment if api_key is not provided.
.deploy(model, strategy, regions, constraints, autoscale)methodCreate a new deployment. Returns a Deployment object.
.infer(deployment_id, prompt, max_tokens, stream)methodRun inference against a deployment. Returns InferenceResponse.
.deploymentspropertyAccess the deployments manager: .list(), .get(id), .update(id), .delete(id).
.routingpropertyAccess the routing manager: .create_policy(), .list_policies(), .get_policy(id).
.telemetrypropertyAccess telemetry data: .query(), .export(), .alerts().

Deployment

Represents a live model deployment on the SpinDynamics platform.

MemberKindDescription
.idpropertyUnique deployment identifier (e.g. dep_3kx9f2a...).
.modelpropertyModel name string.
.statuspropertyCurrent lifecycle state: provisioning, active, draining, deleted.
.regionspropertyList of active regions where the model is deployed.
.replicaspropertyCurrent replica count across all regions.
.metricspropertyLive metrics: .p50, .p99, .rps, .cost_per_1k.

InferenceResponse

MemberKindDescription
.textpropertyThe generated text output.
.regionpropertyRegion where the inference was executed.
.latency_mspropertyEnd-to-end latency in milliseconds.
.tokens_usedpropertyTotal tokens consumed (prompt + completion).
.trace_idpropertyDistributed trace ID for observability.

RoutingPolicy

MemberKindDescription
.idpropertyUnique policy identifier.
.objectivespropertyList of optimization objectives.
.weightspropertyObjective weights (normalized to sum to 1.0).
.constraintspropertyHard constraints dictionary.
.convergence_statuspropertyRL convergence state: exploring, converging, converged.

Node.js SDK

The Node.js SDK (@spindynamics/sdk) provides a TypeScript-first interface with full type definitions. It uses native fetch under the hood and supports both ESM and CommonJS.

quickstart.ts
import { Cortex } from '@spindynamics/sdk'; const client = new Cortex({ apiKey: process.env.SPINDYNAMICS_API_KEY, }); const deployment = await client.deploy({ model: 'llama-3.1-70b', strategy: 'adaptive', regions: ['us-east-1', 'eu-west-1'], }); const response = await client.infer({ deploymentId: deployment.id, prompt: 'Explain quantum computing', maxTokens: 512, }); console.log(response.text); console.log(`Routed to: ${response.region}`);

The Node.js SDK mirrors the Python SDK API surface. All methods return typed promises and support both callback and async/await patterns. Streaming responses use ReadableStream natively.

REST API

All platform capabilities are available via the REST API at https://api.spindynamics.net/v1. Authenticate by passing your API key in the Authorization header as a Bearer token.

Endpoints

POST /v1/deployments Create a new deployment
GET /v1/deployments/:id Get deployment details
GET /v1/deployments List all deployments
PATCH /v1/deployments/:id Update a deployment
DELETE /v1/deployments/:id Delete a deployment
POST /v1/infer Run inference
GET /v1/routing/policies List routing policies
POST /v1/routing/policies Create a routing policy

Example: cURL

terminal
# Create a deployment curl -X POST https://api.spindynamics.net/v1/deployments \ -H "Authorization: Bearer sd_live_..." \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-70b", "strategy": "adaptive", "regions": ["us-east-1", "eu-west-1"], "constraints": { "max_latency_ms": 50, "data_residency": "eu-gdpr" } }' # Run inference curl -X POST https://api.spindynamics.net/v1/infer \ -H "Authorization: Bearer sd_live_..." \ -H "Content-Type: application/json" \ -d '{ "deployment_id": "dep_3kx9f2a...", "prompt": "Explain quantum computing", "max_tokens": 512 }'
Rate Limits

The REST API enforces rate limits per API key: 1,000 requests/second for inference, 100 requests/second for management endpoints. Contact us for higher limits on Enterprise plans.

Custom Routing

Beyond the default adaptive strategy, SpinDynamics supports fully custom routing configurations. You can define multi-objective policies, set exploration rates for the RL agent, and create region-pinned deployments for strict compliance scenarios.

custom_routing.py
# Advanced: multi-objective policy with exploration policy = client.routing.create_policy( name="latency-first-eu", objectives=[ "minimize_latency", "minimize_cost", "maximize_availability" ], weights=[0.6, 0.2, 0.2], constraints={ "excluded_regions": ["cn-*", "ru-*"], "preferred_providers": ["aws", "gcp"], "data_residency": "eu-gdpr", "encryption": "aes-256-gcm" }, exploration_rate=0.05 # 5% exploration for continuous learning ) # Pin to specific region (no RL — static routing) pinned = client.routing.create_policy( name="eu-pinned", strategy="pinned", pin_region="eu-west-1" )

The exploration_rate parameter controls how often the RL agent explores non-optimal routes to discover better placements. Set to 0 for pure exploitation (production-safe), or up to 0.1 for aggressive exploration during testing.

Auto-scaling

SpinDynamics provides predictive auto-scaling powered by time-series forecasting. The system provisions capacity 2-5 minutes ahead of demand spikes, eliminating cold-start cascades and over-provisioning.

autoscale.py
# Configure auto-scaling on a deployment deployment = client.deploy( model="mixtral-8x7b", strategy="adaptive", regions=["us-*", "eu-*"], autoscale={ "min_replicas": 0, # scale to zero when idle "max_replicas": 256, "target_p99_ms": 40, "predictive": True, "gpu_type": "a100", "spot_fallback": True } ) # Query current scaling state state = client.deployments.get(deployment.id) print(f"Replicas: {state.replicas}") print(f"GPU utilization: {state.metrics.gpu_util}%") print(f"Predicted demand (5m): {state.metrics.predicted_rps} rps")

When spot_fallback is enabled, the scheduler automatically falls back to spot instances during demand spikes, reducing compute costs by up to 70% without impacting latency SLAs.

Monitoring

SpinDynamics exposes full-stack observability through the telemetry API. Query latency distributions, cost attribution, model drift scores, and routing decisions programmatically.

monitoring.py
# Query telemetry for a deployment metrics = client.telemetry.query( deployment_id="dep_3kx9f2a...", metrics=["p99_latency", "cost_per_1k", "rps", "drift_score"], interval="5m", range="24h" ) for point in metrics.datapoints: print(f"{point.timestamp} | p99={point.p99_latency}ms | ${point.cost_per_1k}") # Export to Prometheus / OpenTelemetry client.telemetry.export( format="otlp", endpoint="https://otel-collector.internal:4317" ) # Set up alerts client.telemetry.create_alert( name="p99-breach", condition="p99_latency > 100", window="5m", notify=["slack://alerts", "pagerduty://prod"] )

SpinDynamics natively integrates with Datadog, Prometheus, Grafana, and OpenTelemetry. All metrics are also available through the REST API at /v1/telemetry for custom dashboards and automation workflows.