PanGalactic Documentation
PanGalactic is a distributed LLM inference framework that combines multiple consumer-grade GPUs over a standard 10 Gbps Ethernet network. A cluster of RTX 4090s can jointly serve a 70B-parameter model that no single card could fit.
Architecture
Understand how PanGalactic works internally.
| Document | What it covers |
|---|---|
| System Overview | Three-process model (Hub / Node / Gateway), control plane vs data plane, design philosophy |
| Pipeline Parallelism | Why pipeline (not tensor) parallelism; bandwidth math; continuous batching |
| StarStream Protocol | Binary frame format; HELLO handshake; Nebula compression negotiation |
| Core Algorithms | Shard assignment, Quasar routing, Supernova failover, Pulsar scheduling |
| KV Cache | PagedAttention block design; LRU eviction; multi-turn reuse |
Operator Runbook
Everything you need to deploy and operate a cluster.
| Document | What it covers |
|---|---|
| Quickstart | Stand up a local dev cluster in under 5 minutes |
| Node Sizing | VRAM requirements by model; minimum node count; GPU labelling |
| Production Deployment | Kubernetes walkthrough; etcd HA; HPA; Prometheus |
| Model Management | Download, load, unload models; GGUF storage layout; CLI reference |
| Troubleshooting | Common failures, log signatures, health endpoints |
Quick Reference
pangalactic hub start # Start the Hub orchestrator
pangalactic node start # Start a Node agent
pangalactic gateway start # Start the OpenAI-compatible API gateway
pangalactic model download <id> # Download a GGUF model
pangalactic cluster status # Show cluster health
pangalactic cluster topology # Show shard-to-node assignments
API endpoint: POST http://localhost:8080/v1/chat/completions
Grafana dashboard: http://localhost:3000 (admin / pangalactic)
Prometheus: http://localhost:9090