PanGalactic Documentation

PanGalactic is a distributed LLM inference framework that combines multiple consumer-grade GPUs over a standard 10 Gbps Ethernet network. A cluster of RTX 4090s can jointly serve a 70B-parameter model that no single card could fit.

Architecture

Understand how PanGalactic works internally.

Document	What it covers
System Overview	Three-process model (Hub / Node / Gateway), control plane vs data plane, design philosophy
Pipeline Parallelism	Why pipeline (not tensor) parallelism; bandwidth math; continuous batching
StarStream Protocol	Binary frame format; HELLO handshake; Nebula compression negotiation
Core Algorithms	Shard assignment, Quasar routing, Supernova failover, Pulsar scheduling
KV Cache	PagedAttention block design; LRU eviction; multi-turn reuse

Operator Runbook

Everything you need to deploy and operate a cluster.

Document	What it covers
Quickstart	Stand up a local dev cluster in under 5 minutes
Node Sizing	VRAM requirements by model; minimum node count; GPU labelling
Production Deployment	Kubernetes walkthrough; etcd HA; HPA; Prometheus
Model Management	Download, load, unload models; GGUF storage layout; CLI reference
Troubleshooting	Common failures, log signatures, health endpoints

Quick Reference

pangalactic hub start            # Start the Hub orchestrator
pangalactic node start           # Start a Node agent
pangalactic gateway start        # Start the OpenAI-compatible API gateway
pangalactic model download <id>  # Download a GGUF model
pangalactic cluster status       # Show cluster health
pangalactic cluster topology     # Show shard-to-node assignments

API endpoint: POST http://localhost:8080/v1/chat/completions Grafana dashboard: http://localhost:3000 (admin / pangalactic) Prometheus: http://localhost:9090