Troubleshooting


Quick Health Check

Before investigating any specific issue, run the full health sweep:

# Gateway health
curl http://localhost:8080/health

# Cluster status
pangalactic cluster status

# Node details
pangalactic node info --node-addr localhost:7200

# Recent Hub logs
docker compose logs hub --tail=50
# or in K8s:
kubectl logs deployment/pangalactic-hub -n pangalactic --tail=50

Hub Unreachable

Symptom: Nodes log node.registration_failed or hub_client.route_plan_error. Gateway returns 503.

Log signature:

{"event": "node.registration_failed", "error": "StatusCode.UNAVAILABLE: failed to connect to all addresses"}

Causes and fixes:

Cause Fix
Hub not started docker compose up hub or kubectl rollout restart deployment/pangalactic-hub
Wrong NODE_HUB_ADDR Verify env var matches Hub's bind address and port
Firewall blocking port 7100 Open TCP 7100 between nodes and Hub
Hub crashed (check logs) docker compose logs hub — look for Python traceback
gRPC stubs not generated Run make proto and rebuild images

gRPC Stubs Not Generated

Symptom: Log warning node.grpc_stubs_missing or hub.grpc_stubs_missing. The system falls back to a limited mode.

Log signature:

{"event": "node.grpc_stubs_missing", "note": "Run 'make proto' to enable gRPC"}

Fix:

pip install grpcio-tools
make proto

This runs scripts/gen_proto.sh which generates Python stubs into pangalactic/_grpc/. Rebuild and restart all containers if running in Docker.

Without generated stubs, the Hub and Nodes can still start but will not communicate over gRPC. Most operations are no-ops.


Node Registration Fails / No Shards Assigned

Symptom: Node starts but pangalactic cluster topology shows no assignments.

Steps:

  1. Check the node registered successfully:

    docker compose logs node-0 --tail=20 | grep "node.registered"
    

    Expected: {"event": "node.registered", "hub": "hub:7100"}

  2. Check the Hub received the registration:

    docker compose logs hub --tail=20 | grep "hub.node_registered"
    
  3. Check if a model has been loaded:

    pangalactic model list
    

    Shard assignment only happens after a model is registered.

  4. Check VRAM: if the node reports 0 VRAM (simulated mode not enabled, no real GPU), the Hub may reject the assignment.

    docker compose logs node-0 | grep "vram"
    

VRAM Fragmentation / InsufficientVRAMError

Symptom: pangalactic model load fails. Hub log shows fragmentation error.

Log signature:

{"event": "hub.shard_assignment_failed", "error": "VRAM fragmentation: shard llama3-70b:0 requires 5.2 GB but best node has only 3.1 GB free"}

Causes:

  • Another model is already loaded and consuming most VRAM
  • Node count is too low for the model size (see Node Sizing)
  • VRAM_FILL_FACTOR is too conservative for your workload

Fixes:

# Unload another model to free VRAM
pangalactic model unload <other-model-id>

# Add more nodes
pangalactic node start --simulated-vram-gb 24  # for testing

# Reduce fill factor (not recommended for production)
export HUB_VRAM_FILL_FACTOR=0.90

StarStream Connection Refused / Tensor Not Delivered

Symptom: Requests time out. Node logs show connection errors. No tokens are generated.

Log signature:

{"event": "starstream.connect_failed", "addr": "192.168.1.5:7201", "error": "Connection refused"}

Causes and fixes:

Cause Fix
StarStream port not exposed Check NODE_STARSTREAM_BIND and host firewall on TCP port 7201
Wrong public addr Set NODE_PUBLIC_ADDR to the node's externally reachable IP
Transport not started Check node startup logs for node.starstream_ready
Docker network isolation Ensure node-0 and node-1 are on the same Docker network

Verify connectivity:

# From node-0's container, test node-1's StarStream port
docker compose exec node-0 nc -zv node-1 7201

Hot-Swap Drain Timeout

Symptom: HotSwapShard completes with stage DONE but in_flight_requests was still > 0 at swap time.

Log signature:

{"event": "hot_swap.drain_timeout", "in_flight": 3, "shard_id": "llama3-8b:0"}

This is a warning, not an error. It means the drain timeout elapsed with requests still active, so the swap proceeded anyway. The in-flight requests may produce slightly inconsistent output for the generation steps that straddle the shard swap.

Fix: Increase drain_timeout_s in the HotSwapShard request, or reduce the traffic rate before triggering a swap.


Gateway Returns 503 "No Route Available"

Symptom: Chat completions fail with:

{"error": {"message": "No route available — cluster may be loading", "type": "server_error"}}

Causes:

  1. No model is loaded yet — run pangalactic model load <model-id>
  2. Hub is unreachable from Gateway — check GATEWAY_HUB_ADDR
  3. All pipelines for the requested model are degraded — check pangalactic cluster topology
  4. gRPC stubs not generated — Gateway falls back to a synthetic route plan in dev mode; check logs

Gateway Returns 401 Unauthorized

{"error": {"message": "Missing or invalid Authorization header", "type": "invalid_request_error"}}

Include the Authorization: Bearer <key> header. In dev mode, the API key is dev-key-not-for-production (set in docker-compose.yml). In production, use the key from the pangalactic-gateway-secret Kubernetes secret.

To disable auth entirely (dev only), set GATEWAY_API_KEY="".


GPU Not Detected

Symptom: Node logs show node.using_cpu_fallback or VRAM shows 0 GB.

Log signature:

{"event": "utils.gpu_detection", "result": "cpu_fallback", "reason": "CUDA not available"}

Fixes:

  • Verify CUDA is installed: nvidia-smi
  • Check PyTorch sees the GPU: python -c "import torch; print(torch.cuda.device_count())"
  • In Docker: pass --gpus all or use the nvidia/cuda base image with the NVIDIA container runtime
  • In Kubernetes: verify NVIDIA GPU operator is installed and the node has nvidia.com/gpu: 1 resource

For development without a GPU, use simulated mode:

export NODE_SIMULATED_VRAM_GB=24
export NODE_GPU_COMPUTE_TYPE=cpu
export NODE_INFERENCE_BACKEND=mock

High GPU Temperature Warning

Symptom: Log warning telemetry.gpu_high_temp or telemetry.gpu_critical_temp.

{"event": "telemetry.gpu_critical_temp", "node_id": "node-0", "gpu_index": 0, "temp_celsius": 96}

Thresholds:

  • Warning: ≥ 85°C
  • Critical: ≥ 95°C

Actions:

  • Check case airflow and GPU cooler
  • Reduce max_concurrent_requests on the Gateway to lower GPU load
  • Consider thermal throttling — most cards auto-throttle at 83–87°C

Logs Reference

All components use structured JSON logging. Key fields:

Field Description
event Dot-separated event name (e.g. node.registered)
node_id 8-character truncated UUID of the emitting node
shard_id <model_id>:<shard_index>
error Error message (present on warning/error events)
timestamp ISO 8601 UTC timestamp

To filter logs by event prefix:

docker compose logs hub | grep '"event": "hub.failover'

To get all error/warning events:

docker compose logs --no-log-prefix 2>&1 | python3 -c "
import sys, json
for line in sys.stdin:
    try:
        e = json.loads(line)
        if e.get('level') in ('error', 'warning'):
            print(json.dumps(e))
    except: pass
"

See Also