Platform Pricing Solutions People Docs Log in Request Demo
Platform

The inference layer
your AI stack is missing.

SpinDynamics is an adaptive inference orchestration platform that uses reinforcement learning to dynamically route, scale, and optimize model serving across edge, on-prem, and multi-cloud infrastructure.

Architecture

The SpinDynamics Control Plane

A single control plane that sits between your application layer and your heterogeneous infrastructure fleet. Every inference request is observed, routed, and optimized in real-time by a multi-objective RL agent.

APPLICATION LAYER Your App Mobile SDK Internal Tools 3rd-Party HTTPS / gRPC SPINDYNAMICS CONTROL PLANE API Gateway Auth Rate Limit Schema Val RL Router Multi-obj Optimization Exploration Model Registry Versioning Compilation Distribution Telemetry Traces Latency Cost / Drift Scheduler GPU Alloc Queue Depth Autoscale / Spot Compliance Engine Data Residency Encryption Audit Log / ACL Feedback Loop RL Router Optimized Placement INFRASTRUCTURE LAYER Edge PoPs 200+ Regions TensorRT ONNX / GGUF <50ms p99 Cloud AWS / GCP Azure / OCI A100 / H100 Auto-scale On-Prem Clusters Air-gapped NVIDIA DGX Custom Silicon FDE-managed
Core Capabilities

Six pillars. Zero compromises.

Each layer of SpinDynamics is designed to operate independently and compose seamlessly. Here is how every piece works under the hood.

01 / ROUTING

Adaptive RL Routing Engine

Our routing engine is not rule-based — it is learned. A multi-objective reinforcement learning agent observes every inference request and continuously optimizes placement decisions across your infrastructure fleet.

The agent balances latency, throughput, cost, and data residency constraints simultaneously — adapting to traffic patterns, hardware availability, and model performance in real-time. No manual tuning. No static rules. The policy improves with every request.

routing_policy.py
# Custom routing constraints policy = cortex.routing.create_policy( objectives=["minimize_latency", "minimize_cost"], constraints={ "data_residency": ["eu-west", "eu-central"], "max_p99_ms": 50, "prefer_gpu": "a100" }, exploration_rate=0.05 # 5% exploration for continuous learning ) # Policy auto-adapts to fleet state # Avg. convergence: <200 requests # Reward signal: composite(latency, cost, compliance)
02 / EDGE

Edge-Native Model Serving

Models are compiled to optimized runtimes (TensorRT, ONNX, GGUF) and distributed to 200+ edge PoPs worldwide. Inference runs locally at the edge — no round-trip to origin.

Smart prefetching ensures models are warm at the edge before traffic arrives. Our edge compiler automatically quantizes and optimizes models for the target hardware at each PoP. Cold start latency: zero.

<50ms
p99 latency
200+
Edge PoPs
0ms
Cold start
Model Registry PyTorch -> Compiled Smart Prefetch NA-East 43 PoPs TRT+ONNX 12ms p50 EU-West 38 PoPs TRT+GGUF 14ms p50 AP-SE 31 PoPs ONNX 18ms p50 + 90 more regions
03 / ON-PREM

On-Prem Orchestration

The full SpinDynamics platform — control plane, RL engine, observability, and all — deployed on your hardware with zero external dependencies. No phone-home. No telemetry exfiltration. Fully air-gapped.

Designed for environments where data cannot leave the building: defense, healthcare, financial services, and government. Our Field Deployment Engineers handle the installation, tuning, and ongoing optimization — so your team stays focused on the mission.

Learn about our FDE program →
YOUR PRIVATE NETWORK SpinDynamics On-Prem Control Plane RL Engine Observability Registry Compliance Scheduler DGX A100 DGX H100 DGX H100 External dependencies: NONE
04 / SCALE

HyperScale Autoscaling

Predictive autoscaling powered by time-series forecasting and reinforcement learning. SpinDynamics provisions capacity before demand spikes — not after. No more over-provisioning. No more cold-start cascades.

Scale from zero to millions of inferences per second. GPU allocation, model replica management, and queue depth optimization — all automatic. Spot instance fallback, multi-cloud burst, and graceful degradation are built in.

autoscale.py
deployment = cortex.deploy( model="mixtral-8x7b", autoscale={ "min_replicas": 0, "max_replicas": 256, "target_p99_ms": 40, "predictive": True, # RL-based predictive scaling "gpu_type": "a100", "spot_fallback": True # fall back to spot instances } ) # Scale-to-zero when idle # Predictive warm-up: 2-5 min ahead of demand # GPU utilization target: 85%+
05 / COMPLIANCE

Compliance Mesh

Data residency, encryption, audit logging, and access controls — enforced at the routing layer. Not an afterthought. Not a checkbox. A core architectural primitive.

Compliance policies are defined once and applied globally. When a request hits SpinDynamics, the RL router already knows which infrastructure is eligible before it makes a placement decision. Every request is traced, every decision is auditable, and every byte is encrypted at rest and in transit.

COMPLIANCE ENGINE Inbound Request Policy Evaluation residency ✓ pass encryption ✓ pass acl ✓ pass audit ✓ logged Eligible Infra eu-west-1 eu-central-1 us-east-1 ap-south-1 RL Router selects optimal from eligible
06 / OBSERVABILITY

Observability & Telemetry

Full-stack inference observability: request traces, latency histograms, cost attribution per model/team/customer, and real-time drift detection. See exactly where every token goes, what it costs, and how the model is performing.

Export to your existing stack — Datadog, Prometheus, Grafana, OpenTelemetry. Or use the built-in SpinDynamics dashboard. Every metric is available via API for custom alerting and automation.

p99 Latency
47ms
-12ms vs last week
Requests / sec
1.2M
+18% vs last week
Cost / 1K Inferences
$0.003
-$0.001 vs last week
Model Drift Score
0.02
stable
SDK & API

Deploy and manage inference with a few lines of code.

Python, Go, TypeScript, and REST. First-class SDKs with full type safety, streaming support, and async-native design.

from spindynamics import Cortex client = Cortex(api_key="sd_live_...") # Deploy any model deployment = client.deploy( model="llama-3.1-70b", strategy="adaptive", regions=["us-*", "eu-*"], ) # Run inference response = deployment.infer( "Summarize the quarterly earnings report...", stream=True ) for chunk in response: print(chunk.text, end="")

One SDK. Every capability.

The SpinDynamics SDK wraps the entire platform into an ergonomic, well-documented interface. Deploy models, configure routing policies, manage autoscaling, query telemetry, and run inference — all from the same client.

Streaming-first by default. Async-native with sync wrappers. Full type annotations for Python and TypeScript. Generated from our OpenAPI spec — always in sync with the platform.

Python 3.9+
Go 1.21+
TypeScript
REST / cURL
gRPC
Full API reference →

See SpinDynamics in action.

Request a personalized demo with our engineering team.

Request a Demo