24Nerd Dashboard

Backend & queue

Jobs are held in the backend queue until a worker is free, then the backend assigns one job at a time. Workers do not hold a queue.

Pod: —

—

Status

—

Pending (queue)

—

Total jobs

—

Errors

—

Workers

—

Uptime

Pipeline mesh (A3)

Live view of slice stages and stations. Set mesh rig order (head → tail) to match how you assigned layers: early layers on the first rig, tail on the last. The broker suggests mismatches vs this order.

mesh_rig_order (one station ID per line, STATION_ID env) Notes (optional)

Settings (Chat)

Default max tokens Brief thinking / fast mode (skip long blocks, faster replies)

Chat page reads these from localStorage when you send a message.

Model

vLLM options (tensor parallel, memory, quantization)

Tensor parallel size GPU memory util (0–1) Max model len Quantization vLLM stats logging

No model active

Configurations

Assign each GPU (from any system) to a configuration. A configuration can use GPUs from multiple workers. Save, then load a model onto each configuration (one load per worker in that config).

Assign GPUs to configurations

Each row is one GPU. Assign it to a configuration or leave unassigned. One config can span multiple workers.

Worker	GPU	Assign to

Test inference

Run a single prompt against the loaded model. Same as Chat but inline.

Max tokens

Result appears here.

Stats

—

Pending (queue)

—

Total jobs

—

Errors

—

Uptime

—

Workers

Last jobs

Task	Type	Status	Duration	tok/s	Completed

Systems & GPUs

User = optional grouping (BROKER_USER_ID on worker). System = one rig (worker_id). Each system can have multiple GPUs. One row per GPU. Model and GPU lists update only on load or "Refresh workers".

User	System	GPU	GPU ID	VRAM (used/total)	Loaded model	Chunk	Current work

Worker ping (v2)

Test connectivity. Single ping: one worker responds. Roundtrip: one worker pings another (direct or relay). Requires v2 workers and broker ping support.

or Force relay

Result appears here.