> For the complete documentation index, see [llms.txt](https://asus-isg-aidc.gitbook.io/guide/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://asus-isg-aidc.gitbook.io/guide/v1.3.0/guide/overview/status-monitor.md).

# Status Monitor

| Developer | Last modified |
| --------- | ------------- |
| AIDC Team | 2026/01/23    |

## Board Introduction

Status Monitor presents all deployment task statuses in a board format, allowing you to grasp deployment progress at a glance.

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status.png)

## Status Type Descriptions

The system defines the following six statuses:

\| Status | Color | Description | Next Action | |-----------|---------------|-----------| | **Created** | 🔵 Blue | Task created, waiting to be queued | System handles automatically | | **Queued** | ⚪ Gray | Task in queue waiting for execution | System handles automatically | | **In Progress** | 🟡 Yellow | Task is executing | Wait for completion or check progress | | **Failed** | 🔴 Red | Task execution failed | 1. Check Node Status → 2. View Error Log → 3. Edit & Redeploy | | **Completed** | 🟢 Green | Task completed successfully | Deployed service is ready to use | | **Timeout** | 🟠 Orange | Task execution timed out after 1.25 hours | 1. Check if node is reachable via SSH → 2. Check network latency → 3. Edit & Redeploy |

### Status Flow Diagram

{% @mermaid/diagram content="graph TD
%% Status definitions with icons
Created\["Created"] -- "Auto-queued" --> Queued\["Queued"]
Queued -- "Agent Picked Up" --> InProgress\["In Progress"]

```
%% Terminal states
InProgress -- "All Tasks Success" --> Completed["Completed"]
InProgress -- "Any Task Error" --> Failed["Failed"]
InProgress -- "> 1.25 hrs" --> Timeout["Timeout"]

%% Recovery flow
Failed -- "User Action" --> Edit["Edit & Redeploy"]
Timeout -- "User Action" --> Edit
Edit --> Created

%% Styling
style Created stroke:#909399,stroke-width:2px
style Queued stroke:#73c0de,stroke-width:2px
style InProgress stroke:#5470c6,stroke-width:2px
style Completed stroke:#91cc75,stroke-width:2px
style Failed stroke:#ee6666,stroke-width:2px
style Timeout stroke:#fac858,stroke-width:2px
style Edit stroke:#3b82f6,stroke-width:2px" %}
```

## Status Card Features

### Expand/Collapse

Each status card can be expanded to view detailed information:

* **Collapsed state**: Shows task name, service type, status label
* **Expanded state**: Shows complete information, node list, action buttons

### View Detailed Information

After expanding the card, you can view:

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-card.png)

| Information          | Description                                |
| -------------------- | ------------------------------------------ |
| **Task Name**        | Identification name of the deployment task |
| **Task Description** | Task description text                      |
| **Service Type**     | Slurm, MariaDB, Docker, etc.               |
| **Start Time**       | Time when task started executing           |
| **Duration**         | Total duration of task execution           |
| **Update Time**      | Last status update time                    |
| **Progress**         | Task completion percentage                 |
| **Triggered By**     | User who initiated the deployment          |

#### Node Status Icons

\| Icon | Status | Description | |-----------|---------------| | ⚪ Gray | Pending, In Progress | Waiting for execution | | 🟢 Green | Completed | Completed | | 🔴 Red | Failed | Failed |

#### Viewing Error Logs

For tasks in **Complete** or **Failed** status, the card expands to show individual task execution results with status labels (OK/FAILED/CHANGED/SKIPPING) for each node.

> 🔍 **Debug Tip**: Click on a **FAILED** task label to directly view the specific error log for that step. This is the most efficient way to diagnose deployment issues and identify the root cause of failures.

### Edit Deployment (Edit)

For tasks in Created or Failed status, deployment settings can be edited:

* Click the **Edit** button
* Reopens the deployment dialog
* **Automatically pre-fills all previous settings** (saves time from re-entering parameters)
* Modify only the necessary fields
* Submit again after modifications

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-edit.png)

> 💡 **Tip**: The Edit function preserves all your previous configuration, including node selections, group assignments, and custom parameters. This significantly reduces the effort needed to retry a failed deployment.

> ⚠️ **Note**: Only tasks in Created or Failed status can be edited.

### Delete Task (Delete)

Unnecessary tasks can be deleted:

1. Click the **Delete** button
2. Confirm deletion dialog
3. Task will be removed after confirmation

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-delete.png)

> ⚠️ **Warning**: Deleting a task **only removes the task record** from Status Monitor. It does **NOT uninstall** the deployed software or services from target nodes. If you need to remove the installed service, please perform the uninstallation manually on the respective nodes before deleting the task record.

***

## Service Card Displays

> Cards in Created, Queued, In Progress status will display the Groups and Nodes information selected by the user.\
> Complete, Failed status will display Nodes Tasks information.

### Slurm Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-slurm.png)

\| Display Information | Description | Example | |-----------|---------------| | Headnode Group | Responsible for job scheduling and cluster management | `computing_node` | | Compute Group | Executes actual computing tasks | `datatransfer_node` | | Database Group | Stores job records and account information | `service_node` | | Username | Slurm database username | `slurmadmin` | | Node List | Node names within each Group | `865235000205-2` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

### MariaDB Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-mariaDB.png)

\| Display Information | Description | Example | |-----------|---------------| | Database Group | Group for MariaDB Galera cluster nodes | `db_group` | | Node List | Names of each node in the cluster | `865235000205-2` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

### UFM Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-UFM.png)

\| Display Information | Description | Example | |-----------|---------------| | License Path | Full path where UFM license file is stored | `/root/mlnx-ufm-xxx.lic` | | Interface | Network interface name used for InfiniBand network management | `eth0` | | Installation Node | Target node for UFM service installation | `8652350002103` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

### NMX-M Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-NMX.png)

\| Display Information | Description | Example | |-----------|---------------| | Installation Node | Physical node name for VM installation, shows number of VMs deployed on that node | `nmx-host` `3 VMs` | | VM Name and Role | Each VM's name and role label (master/worker), with running status displayed | `nmx-vm1` `master` 🟢 Running | | Network Configuration | Network configuration information for each VM | - | | ↳ IP | VM's IP address | `10.10.29.1` | | ↳ Mask | Subnet mask bits | `22` | | ↳ Gateway | Network gateway address | `10.10.30.1` | | ↳ DNS | DNS server address | `8.8.8.8` | | ↳ Bridge | Network bridge name | `br0` | | QCOW Base Path | Storage path for VM image file QCOW2 | `/root/netq-4.15.0-ubuntu-24.04-ts-qemu.qcow2` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

> 💡 **Tip**: NMX-M deployment involves VM image distribution, which typically takes longer than other service deployments. Please be patient and wait for the status to update to **Running**. The deployment time depends on network bandwidth and the size of the QCOW2 image file.

### Docker Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-docker.png)

\| Display Information | Description | Example | |-----------|---------------| | Image | Container image name used for deployment | `alpine:latest` | | Container Count | Number of Container instances created on each node | `4` | | Installation Group | Target Group for Docker Engine installation (multiple selection allowed) | `computing_node` | | Installation Node | Target Node for Docker Engine installation (multiple selection allowed) | `865235000205-2` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

### K8s Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-k8s.png)

\| Display Information | Description | Example | |-----------|---------------| | Gateway IP | Network gateway IP address used by Kubernetes cluster | `192.168.2.1` | | Interface | Network interface name used for Kubernetes network communication | `eth0`, `enp3s0` | | Control Plane Group | Group for Kubernetes Control Plane nodes | `computing_node` | | Data Nodes Group | Group for Kubernetes Worker Nodes | `datatransfer_node` | | Node List | Node names within each Group | `TEST-001`, `865235000205-2` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

### Podman Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-podman.png)

\| Display Information | Description | Example | |-----------|---------------| | Image | Container image name used for deployment | `alpine:latest` | | Container Count | Number of Container instances created on each node | `2` | | Installation Group | Target Group for Podman Engine installation | `service_node` | | Installation Node | Target Node for Podman Engine installation (multiple selection allowed) | `865235000209-4` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

### WEKA Status Card

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-monitor-weka.png)

\| Display Information | Description | Example | |-----------|---------------| | ISO Name | ISO image file name used for WEKA installation | `weka-4.4.10.150-2.0.2.iso` | | Installation Group | Group for WEKA cluster nodes | `datatransfer_node` | | Installation Node | Target Node for WEKA installation (multiple selection allowed) | `865235000210-4` | | Node List | Names of each node in the cluster | `weka224`, `weka222` | | Tasks | When in Complete/Failed status, expands to show execution tasks and status labels (OK/FAILED/CHANGED/SKIPPING) for each node | - |

***

## Search and Filter

### Service Type Filter

Filter tasks by service type:

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/service-filter.png)

### Status Filter

Filter displayed columns by status:

![](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/status-filter.png)

* **All**: Show all statuses
* Single status: Only show columns of selected status

> 💡 **Usage Scenario**: When you only want to focus on failed tasks for troubleshooting, uncheck **Completed** and other statuses to display only the **Failed** column. This helps you quickly identify and address problematic deployments.

### Keyword Search

Supports searching the following fields:

* Task name
* Task description
* Service type

> 📝 **Note**: The search function supports **fuzzy matching**. For example, searching for `Slur` will match tasks with `Slurm` service type. This makes it easier to find tasks even with partial keywords.

***

## Interactive Features

### Horizontal Scrolling

When status columns are too many to fully display on screen:

* Use mouse scroll wheel for horizontal scrolling
* Automatically switches to horizontal scrolling when vertical scroll reaches boundary
* Supports touchpad gestures

![Horizontal scrolling](https://pub-f334ff01208c4e6195b80133ac6e6030.r2.dev/portal/marketplace/horizontal-scrolling.png)

## Next Steps

👉 [Technical Documentation](https://gitlab.com/AIDCTeam/gitbook-docs-portal/-/blob/release/v1.3.0/projects/AIDC/guide/Marketplace/technical-docs.md) - Developer technical documentation


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://asus-isg-aidc.gitbook.io/guide/v1.3.0/guide/overview/status-monitor.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
