Run frontier AI locally.

exo connects your Macs and workstations into one local inference cluster. It finds devices, reads the network, splits models across memory, and gives you normal APIs.

DISTRIBUTED INFERENCE | MLX | APPLE SILICON | RDMA | THUNDERBOLT

Download GitHub Discord Hugging Face X

macOS 26+ · Apache-2.0 · OpenAI, Claude, Responses, and Ollama compatible

Videos

Watch it run.

Real hardware. Real clusters. No cloud account needed.

44,515GitHub stars

3,132GitHub forks

26models on HF exolabs org

58models on 0xSero HF

68,170REAP / quant downloads

8 µslatency with RDMA over Thunderbolt 5

Product

What exo does

Find devices. Read the topology. Place model shards. Serve standard APIs.

Find devices

Machines running exo discover each other without hand-built cluster config.

Read topology

exo tracks memory, link type, latency, bandwidth, and available compute.

Place models

Models can be split across devices instead of fitting on one box.

Serve APIs

Use OpenAI, Claude, Responses, or Ollama-compatible clients.

Community

Talk to people who run it

Ask setup questions, share hardware results, and follow new model releases.

Discord

Join the serverAsk setup questions, share hardware results, and follow new model releases.

@exolabsDemos, release notes, and project updates.

HF org

EXO LabsOfficial model releases and future reviewed artifacts.

HF proof

0xSeroModels, datasets, Spaces, and compression work.