Run frontier AI locally.

exo connects your Macs and workstations into one local inference cluster. It finds devices, reads the network, splits models across memory, and gives you normal APIs.

DISTRIBUTED INFERENCE | MLX | APPLE SILICON | RDMA | THUNDERBOLT
macOS 26+ · Apache-2.0 · OpenAI, Claude, Responses, and Ollama compatible
Videos

Watch it run.

Real hardware. Real clusters. No cloud account needed.

44,515GitHub stars
3,132GitHub forks
26models on HF exolabs org
58models on 0xSero HF
68,170REAP / quant downloads
8 µslatency with RDMA over Thunderbolt 5
Product

What exo does

Find devices. Read the topology. Place model shards. Serve standard APIs.

01

Find devices

Machines running exo discover each other without hand-built cluster config.

02

Read topology

exo tracks memory, link type, latency, bandwidth, and available compute.

03

Place models

Models can be split across devices instead of fitting on one box.

04

Serve APIs

Use OpenAI, Claude, Responses, or Ollama-compatible clients.

Community

Talk to people who run it

Ask setup questions, share hardware results, and follow new model releases.