Troubleshooting
Quick Health Check
Before investigating any specific issue, run the full health sweep:
# Gateway health
curl http://localhost:8080/health
# Cluster status
pangalactic cluster status
# Node details
pangalactic node info --node-addr localhost:7200
# Recent Hub logs
docker compose logs hub --tail=50
# or in K8s:
kubectl logs deployment/pangalactic-hub -n pangalactic --tail=50
Hub Unreachable
Symptom: Nodes log node.registration_failed or hub_client.route_plan_error. Gateway returns 503.
Log signature:
{"event": "node.registration_failed", "error": "StatusCode.UNAVAILABLE: failed to connect to all addresses"}
Causes and fixes:
| Cause | Fix |
|---|---|
| Hub not started | docker compose up hub or kubectl rollout restart deployment/pangalactic-hub |
Wrong NODE_HUB_ADDR |
Verify env var matches Hub's bind address and port |
| Firewall blocking port 7100 | Open TCP 7100 between nodes and Hub |
| Hub crashed (check logs) | docker compose logs hub — look for Python traceback |
| gRPC stubs not generated | Run make proto and rebuild images |
gRPC Stubs Not Generated
Symptom: Log warning node.grpc_stubs_missing or hub.grpc_stubs_missing. The system falls back to a limited mode.
Log signature:
{"event": "node.grpc_stubs_missing", "note": "Run 'make proto' to enable gRPC"}
Fix:
pip install grpcio-tools
make proto
This runs scripts/gen_proto.sh which generates Python stubs into pangalactic/_grpc/. Rebuild and restart all containers if running in Docker.
Without generated stubs, the Hub and Nodes can still start but will not communicate over gRPC. Most operations are no-ops.
Node Registration Fails / No Shards Assigned
Symptom: Node starts but pangalactic cluster topology shows no assignments.
Steps:
Check the node registered successfully:
docker compose logs node-0 --tail=20 | grep "node.registered"Expected:
{"event": "node.registered", "hub": "hub:7100"}Check the Hub received the registration:
docker compose logs hub --tail=20 | grep "hub.node_registered"Check if a model has been loaded:
pangalactic model listShard assignment only happens after a model is registered.
Check VRAM: if the node reports 0 VRAM (simulated mode not enabled, no real GPU), the Hub may reject the assignment.
docker compose logs node-0 | grep "vram"
VRAM Fragmentation / InsufficientVRAMError
Symptom: pangalactic model load fails. Hub log shows fragmentation error.
Log signature:
{"event": "hub.shard_assignment_failed", "error": "VRAM fragmentation: shard llama3-70b:0 requires 5.2 GB but best node has only 3.1 GB free"}
Causes:
- Another model is already loaded and consuming most VRAM
- Node count is too low for the model size (see Node Sizing)
VRAM_FILL_FACTORis too conservative for your workload
Fixes:
# Unload another model to free VRAM
pangalactic model unload <other-model-id>
# Add more nodes
pangalactic node start --simulated-vram-gb 24 # for testing
# Reduce fill factor (not recommended for production)
export HUB_VRAM_FILL_FACTOR=0.90
StarStream Connection Refused / Tensor Not Delivered
Symptom: Requests time out. Node logs show connection errors. No tokens are generated.
Log signature:
{"event": "starstream.connect_failed", "addr": "192.168.1.5:7201", "error": "Connection refused"}
Causes and fixes:
| Cause | Fix |
|---|---|
| StarStream port not exposed | Check NODE_STARSTREAM_BIND and host firewall on TCP port 7201 |
| Wrong public addr | Set NODE_PUBLIC_ADDR to the node's externally reachable IP |
| Transport not started | Check node startup logs for node.starstream_ready |
| Docker network isolation | Ensure node-0 and node-1 are on the same Docker network |
Verify connectivity:
# From node-0's container, test node-1's StarStream port
docker compose exec node-0 nc -zv node-1 7201
Hot-Swap Drain Timeout
Symptom: HotSwapShard completes with stage DONE but in_flight_requests was still > 0 at swap time.
Log signature:
{"event": "hot_swap.drain_timeout", "in_flight": 3, "shard_id": "llama3-8b:0"}
This is a warning, not an error. It means the drain timeout elapsed with requests still active, so the swap proceeded anyway. The in-flight requests may produce slightly inconsistent output for the generation steps that straddle the shard swap.
Fix: Increase drain_timeout_s in the HotSwapShard request, or reduce the traffic rate before triggering a swap.
Gateway Returns 503 "No Route Available"
Symptom: Chat completions fail with:
{"error": {"message": "No route available — cluster may be loading", "type": "server_error"}}
Causes:
- No model is loaded yet — run
pangalactic model load <model-id> - Hub is unreachable from Gateway — check
GATEWAY_HUB_ADDR - All pipelines for the requested model are degraded — check
pangalactic cluster topology - gRPC stubs not generated — Gateway falls back to a synthetic route plan in dev mode; check logs
Gateway Returns 401 Unauthorized
{"error": {"message": "Missing or invalid Authorization header", "type": "invalid_request_error"}}
Include the Authorization: Bearer <key> header. In dev mode, the API key is dev-key-not-for-production (set in docker-compose.yml). In production, use the key from the pangalactic-gateway-secret Kubernetes secret.
To disable auth entirely (dev only), set GATEWAY_API_KEY="".
GPU Not Detected
Symptom: Node logs show node.using_cpu_fallback or VRAM shows 0 GB.
Log signature:
{"event": "utils.gpu_detection", "result": "cpu_fallback", "reason": "CUDA not available"}
Fixes:
- Verify CUDA is installed:
nvidia-smi - Check PyTorch sees the GPU:
python -c "import torch; print(torch.cuda.device_count())" - In Docker: pass
--gpus allor use thenvidia/cudabase image with the NVIDIA container runtime - In Kubernetes: verify NVIDIA GPU operator is installed and the node has
nvidia.com/gpu: 1resource
For development without a GPU, use simulated mode:
export NODE_SIMULATED_VRAM_GB=24
export NODE_GPU_COMPUTE_TYPE=cpu
export NODE_INFERENCE_BACKEND=mock
High GPU Temperature Warning
Symptom: Log warning telemetry.gpu_high_temp or telemetry.gpu_critical_temp.
{"event": "telemetry.gpu_critical_temp", "node_id": "node-0", "gpu_index": 0, "temp_celsius": 96}
Thresholds:
- Warning: ≥ 85°C
- Critical: ≥ 95°C
Actions:
- Check case airflow and GPU cooler
- Reduce
max_concurrent_requestson the Gateway to lower GPU load - Consider thermal throttling — most cards auto-throttle at 83–87°C
Logs Reference
All components use structured JSON logging. Key fields:
| Field | Description |
|---|---|
event |
Dot-separated event name (e.g. node.registered) |
node_id |
8-character truncated UUID of the emitting node |
shard_id |
<model_id>:<shard_index> |
error |
Error message (present on warning/error events) |
timestamp |
ISO 8601 UTC timestamp |
To filter logs by event prefix:
docker compose logs hub | grep '"event": "hub.failover'
To get all error/warning events:
docker compose logs --no-log-prefix 2>&1 | python3 -c "
import sys, json
for line in sys.stdin:
try:
e = json.loads(line)
if e.get('level') in ('error', 'warning'):
print(json.dumps(e))
except: pass
"
See Also
- Quickstart — basic setup verification
- Deployment — production configuration reference
- System Overview — understand what each component does