Run frontier AI locally.
exo connects your Macs and workstations into one local inference cluster. It finds devices, reads the network, splits models across memory, and gives you normal APIs.
MLX | APPLE SILICON | RDMA | THUNDERBOLT
macOS 26+ · Apache-2.0 · OpenAI, Claude, Responses, and Ollama compatible
Videos
Watch it run.
Real hardware. Real clusters. No cloud account needed.
44,515GitHub stars
3,132GitHub forks
26models on HF exolabs org
58models on 0xSero HF
68,170REAP / quant downloads
8 µslatency with RDMA over Thunderbolt 5
Product
What exo does
Find devices. Read the topology. Place model shards. Serve standard APIs.
01
Find devices
Machines running exo discover each other without hand-built cluster config.
02
Read topology
exo tracks memory, link type, latency, bandwidth, and available compute.
03
Place models
Models can be split across devices instead of fitting on one box.
04
Serve APIs
Use OpenAI, Claude, Responses, or Ollama-compatible clients.
Community
Talk to people who run it
Ask setup questions, share hardware results, and follow new model releases.
Download
GitHub