24Nerd Dashboard

Loading cluster...

Chat →

Backend & queue

Jobs are held in the backend queue until a worker is free, then the backend assigns one job at a time. Workers do not hold a queue.

Pod:

Status
Pending (queue)
Total jobs
Errors
Workers
Uptime

Pipeline mesh (A3)

Live view of slice stages and stations. Set mesh rig order (head → tail) to match how you assigned layers: early layers on the first rig, tail on the last. The broker suggests mismatches vs this order.

Settings (Chat)

Chat page reads these from localStorage when you send a message.

Model

vLLM options (tensor parallel, memory, quantization)
No model active

Configurations

Assign each GPU (from any system) to a configuration. A configuration can use GPUs from multiple workers. Save, then load a model onto each configuration (one load per worker in that config).

Assign GPUs to configurations

Each row is one GPU. Assign it to a configuration or leave unassigned. One config can span multiple workers.

WorkerGPUAssign to

Test inference

Run a single prompt against the loaded model. Same as Chat but inline.

Result appears here.

Stats

Pending (queue)
Total jobs
Errors
Uptime
Workers

Last jobs

TaskTypeStatusDurationtok/sCompleted

Systems & GPUs

User = optional grouping (BROKER_USER_ID on worker). System = one rig (worker_id). Each system can have multiple GPUs. One row per GPU. Model and GPU lists update only on load or "Refresh workers".

User System GPU GPU ID VRAM (used/total) Loaded model Chunk Current work

Worker ping (v2)

Test connectivity. Single ping: one worker responds. Roundtrip: one worker pings another (direct or relay). Requires v2 workers and broker ping support.

or
Result appears here.