Troubleshooting

Quick Health Check

Before investigating any specific issue, run the full health sweep:

# Gateway health
curl http://localhost:8080/health

# Cluster status
pangalactic cluster status

# Node details
pangalactic node info --node-addr localhost:7200

# Recent Hub logs
docker compose logs hub --tail=50
# or in K8s:
kubectl logs deployment/pangalactic-hub -n pangalactic --tail=50

Hub Unreachable

Symptom: Nodes log node.registration_failed or hub_client.route_plan_error. Gateway returns 503.

Log signature:

{"event": "node.registration_failed", "error": "StatusCode.UNAVAILABLE: failed to connect to all addresses"}

Causes and fixes:

Cause	Fix
Hub not started	`docker compose up hub` or `kubectl rollout restart deployment/pangalactic-hub`
Wrong `NODE_HUB_ADDR`	Verify env var matches Hub's bind address and port
Firewall blocking port 7100	Open TCP 7100 between nodes and Hub
Hub crashed (check logs)	`docker compose logs hub` — look for Python traceback
gRPC stubs not generated	Run `make proto` and rebuild images

gRPC Stubs Not Generated

Symptom: Log warning node.grpc_stubs_missing or hub.grpc_stubs_missing. The system falls back to a limited mode.

Log signature:

{"event": "node.grpc_stubs_missing", "note": "Run 'make proto' to enable gRPC"}

Fix:

pip install grpcio-tools
make proto

This runs scripts/gen_proto.sh which generates Python stubs into pangalactic/_grpc/. Rebuild and restart all containers if running in Docker.

Without generated stubs, the Hub and Nodes can still start but will not communicate over gRPC. Most operations are no-ops.

Node Registration Fails / No Shards Assigned

Symptom: Node starts but pangalactic cluster topology shows no assignments.

Steps:

Check the node registered successfully:
```
docker compose logs node-0 --tail=20 | grep "node.registered"
```
Expected: {"event": "node.registered", "hub": "hub:7100"}

Check the Hub received the registration:

docker compose logs hub --tail=20 | grep "hub.node_registered"

Check if a model has been loaded:
```
pangalactic model list
```
Shard assignment only happens after a model is registered.
Check VRAM: if the node reports 0 VRAM (simulated mode not enabled, no real GPU), the Hub may reject the assignment.
```
docker compose logs node-0 | grep "vram"
```

VRAM Fragmentation / InsufficientVRAMError

Symptom: pangalactic model load fails. Hub log shows fragmentation error.

Log signature:

{"event": "hub.shard_assignment_failed", "error": "VRAM fragmentation: shard llama3-70b:0 requires 5.2 GB but best node has only 3.1 GB free"}

Causes:

Another model is already loaded and consuming most VRAM
Node count is too low for the model size (see Node Sizing)
VRAM_FILL_FACTOR is too conservative for your workload

Fixes:

# Unload another model to free VRAM
pangalactic model unload <other-model-id>

# Add more nodes
pangalactic node start --simulated-vram-gb 24  # for testing

# Reduce fill factor (not recommended for production)
export HUB_VRAM_FILL_FACTOR=0.90

StarStream Connection Refused / Tensor Not Delivered

Symptom: Requests time out. Node logs show connection errors. No tokens are generated.

Log signature:

{"event": "starstream.connect_failed", "addr": "192.168.1.5:7201", "error": "Connection refused"}

Causes and fixes:

Cause	Fix
StarStream port not exposed	Check `NODE_STARSTREAM_BIND` and host firewall on TCP port 7201
Wrong public addr	Set `NODE_PUBLIC_ADDR` to the node's externally reachable IP
Transport not started	Check node startup logs for `node.starstream_ready`
Docker network isolation	Ensure node-0 and node-1 are on the same Docker network

Verify connectivity:

# From node-0's container, test node-1's StarStream port
docker compose exec node-0 nc -zv node-1 7201

Hot-Swap Drain Timeout

Symptom: HotSwapShard completes with stage DONE but in_flight_requests was still > 0 at swap time.

Log signature:

{"event": "hot_swap.drain_timeout", "in_flight": 3, "shard_id": "llama3-8b:0"}

This is a warning, not an error. It means the drain timeout elapsed with requests still active, so the swap proceeded anyway. The in-flight requests may produce slightly inconsistent output for the generation steps that straddle the shard swap.

Fix: Increase drain_timeout_s in the HotSwapShard request, or reduce the traffic rate before triggering a swap.

Gateway Returns 503 "No Route Available"

Symptom: Chat completions fail with:

{"error": {"message": "No route available — cluster may be loading", "type": "server_error"}}

Causes:

No model is loaded yet — run pangalactic model load <model-id>
Hub is unreachable from Gateway — check GATEWAY_HUB_ADDR
All pipelines for the requested model are degraded — check pangalactic cluster topology
gRPC stubs not generated — Gateway falls back to a synthetic route plan in dev mode; check logs

Gateway Returns 401 Unauthorized

{"error": {"message": "Missing or invalid Authorization header", "type": "invalid_request_error"}}

Include the Authorization: Bearer <key> header. In dev mode, the API key is dev-key-not-for-production (set in docker-compose.yml). In production, use the key from the pangalactic-gateway-secret Kubernetes secret.

To disable auth entirely (dev only), set GATEWAY_API_KEY="".

GPU Not Detected

Symptom: Node logs show node.using_cpu_fallback or VRAM shows 0 GB.

Log signature:

{"event": "utils.gpu_detection", "result": "cpu_fallback", "reason": "CUDA not available"}

Fixes:

Verify CUDA is installed: nvidia-smi
Check PyTorch sees the GPU: python -c "import torch; print(torch.cuda.device_count())"
In Docker: pass --gpus all or use the nvidia/cuda base image with the NVIDIA container runtime
In Kubernetes: verify NVIDIA GPU operator is installed and the node has nvidia.com/gpu: 1 resource

For development without a GPU, use simulated mode:

export NODE_SIMULATED_VRAM_GB=24
export NODE_GPU_COMPUTE_TYPE=cpu
export NODE_INFERENCE_BACKEND=mock

High GPU Temperature Warning

Symptom: Log warning telemetry.gpu_high_temp or telemetry.gpu_critical_temp.

{"event": "telemetry.gpu_critical_temp", "node_id": "node-0", "gpu_index": 0, "temp_celsius": 96}

Thresholds:

Warning: ≥ 85°C
Critical: ≥ 95°C

Actions:

Check case airflow and GPU cooler
Reduce max_concurrent_requests on the Gateway to lower GPU load
Consider thermal throttling — most cards auto-throttle at 83–87°C

Logs Reference

All components use structured JSON logging. Key fields:

Field	Description
`event`	Dot-separated event name (e.g. `node.registered`)
`node_id`	8-character truncated UUID of the emitting node
`shard_id`	`<model_id>:<shard_index>`
`error`	Error message (present on warning/error events)
`timestamp`	ISO 8601 UTC timestamp

To filter logs by event prefix:

docker compose logs hub | grep '"event": "hub.failover'

To get all error/warning events:

docker compose logs --no-log-prefix 2>&1 | python3 -c "
import sys, json
for line in sys.stdin:
    try:
        e = json.loads(line)
        if e.get('level') in ('error', 'warning'):
            print(json.dumps(e))
    except: pass
"

Troubleshooting

Quick Health Check

Hub Unreachable

gRPC Stubs Not Generated

Node Registration Fails / No Shards Assigned

VRAM Fragmentation / InsufficientVRAMError

StarStream Connection Refused / Tensor Not Delivered

Hot-Swap Drain Timeout

Gateway Returns 503 "No Route Available"

Gateway Returns 401 Unauthorized

GPU Not Detected

High GPU Temperature Warning

Logs Reference

See Also