PanGalactic Documentation

PanGalactic is a distributed LLM inference framework that combines multiple consumer-grade GPUs over a standard 10 Gbps Ethernet network. A cluster of RTX 4090s can jointly serve a 70B-parameter model that no single card could fit.


Architecture

Understand how PanGalactic works internally.

Document What it covers
System Overview Three-process model (Hub / Node / Gateway), control plane vs data plane, design philosophy
Pipeline Parallelism Why pipeline (not tensor) parallelism; bandwidth math; continuous batching
StarStream Protocol Binary frame format; HELLO handshake; Nebula compression negotiation
Core Algorithms Shard assignment, Quasar routing, Supernova failover, Pulsar scheduling
KV Cache PagedAttention block design; LRU eviction; multi-turn reuse

Operator Runbook

Everything you need to deploy and operate a cluster.

Document What it covers
Quickstart Stand up a local dev cluster in under 5 minutes
Node Sizing VRAM requirements by model; minimum node count; GPU labelling
Production Deployment Kubernetes walkthrough; etcd HA; HPA; Prometheus
Model Management Download, load, unload models; GGUF storage layout; CLI reference
Troubleshooting Common failures, log signatures, health endpoints

Quick Reference

pangalactic hub start            # Start the Hub orchestrator
pangalactic node start           # Start a Node agent
pangalactic gateway start        # Start the OpenAI-compatible API gateway
pangalactic model download <id>  # Download a GGUF model
pangalactic cluster status       # Show cluster health
pangalactic cluster topology     # Show shard-to-node assignments

API endpoint: POST http://localhost:8080/v1/chat/completions Grafana dashboard: http://localhost:3000 (admin / pangalactic) Prometheus: http://localhost:9090