This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| servst [2025/07/09 13:42] – zhangyk | servst [2025/07/17 04:34] (current) – zhangyk | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| # servst | # servst | ||
| - | [Gitlab-servst](https:// | + | [Gitlab](https:// |
| - | ## Description | + | ## 1. Description |
| - | This is a set of tools to inspect the utilization of each server. | + | `servst` |
| + | status | ||
| + | them conveniently and thoroughly. | ||
| - | - `gpust` and `cpust` | + | - `gpust` and `cpust` |
| - | - `gpurf` and `gpurf` will refresh | + | - `gpurf` and `cpurf` refresh |
| + | - `lsmd` | ||
| + | - `killmd` kill target MD tasks | ||
| - | ## Installation | + | ## 2. Installation |
| - | You don't have to install anything actually. I have add the commands as global | + | There is no need for installation. The commands |
| - | aliases. | + | |
| - | BUT, if you want to use the refreshing commands, password-free configuration | + | HOWEVER, password-free configuration |
| - | would be a must. | + | and `lsmds` |
| - | ## Usage | + | ## 3. Usage |
| - | For `gpust` and `cpust`: | + | ### 3.1 For gpust and cpust: |
| - | Everyone | + | These commands |
| - | status of any server's GPU cards. If asked, type the password of `rainbow`. | + | If prompted, enter the password of `rainbow`. |
| - | For `gpurf` and `cpurf`: | + | ``` |
| + | +---------------------------- GPU STATUS ----------------------------+ | ||
| + | | Server: name of the server | ||
| + | | Occupation: * for occupied GPU, 4 for free 4080 card, and so on. | | ||
| + | | processing servers: | ||
| + | | yellow purple orange indigo gold green white red blue | | ||
| + | | 2024-07-10 10:00:01 | | ||
| + | +--------------------------------------------------------------------+ | ||
| - | 1. `gpurf` will update the status of all servers. If not having set up | + | Server |
| - | password-free configuration, | + | blue 33 2024-07-10 10:00:01 |
| + | gold 3 | ||
| + | green | ||
| + | indigo | ||
| + | orange | ||
| + | purple | ||
| + | red | ||
| + | yellow | ||
| + | ``` | ||
| - | 2. `gpurf yellow` will update the status | + | ``` |
| - | shall be able to log in it without password or just type the corresponding | + | +---------------------------- CPU STATUS ----------------------------+ |
| - | password when asked. What about the other servers | + | | Server: name of the server |
| - | status will be shown. | + | | Total: total number of cores in one server |
| + | | Idle: average number of idle (not used) cores in the last 5 seconds| | ||
| + | | processing | ||
| + | | yellow purple orange indigo gold green white violet black | | ||
| + | | 2024-07-10 10: | ||
| + | +--------------------------------------------------------------------+ | ||
| + | Server | ||
| + | black | ||
| + | gold 56 32 2024-07-10 10:00:01 | ||
| + | green | ||
| + | indigo | ||
| + | orange | ||
| + | purple | ||
| + | violet | ||
| + | yellow | ||
| + | ``` | ||
| + | |||
| + | ### 3.2 For gpurf and cpurf: | ||
| + | |||
| + | 1. `gpurf` updates the status of all servers. Without password-free configuration, | ||
| + | you'll need to enter the password for 11 times... So it's highly likely that | ||
| + | you will not use this one. | ||
| + | |||
| + | 2. `gpurf yellow` updates the status of merely the wanted server `yellow`. You | ||
| + | must either have password-free access or provide the password of `yellow` then | ||
| + | `rainbow` when prompted. The statuses of the other servers will remain as | ||
| + | previously displayed. The column `Last Updated` indicates updating time of each | ||
| + | server. | ||
| + | |||
| + | ``` | ||
| + | +---------------------------- GPU STATUS ----------------------------+ | ||
| + | | Server: name of the server | ||
| + | | Occupation: * for occupied GPU, 4 for free 4080 card, and so on. | | ||
| + | | processing servers: | ||
| + | | yellow purple orange indigo gold green white red blue | | ||
| + | | 2024-07-10 10: | ||
| + | +--------------------------------------------------------------------+ | ||
| + | |||
| + | Server | ||
| + | blue 33 2024-07-10 10:00:01 | ||
| + | gold 3 | ||
| + | green | ||
| + | indigo | ||
| + | orange | ||
| + | purple | ||
| + | red | ||
| + | white | ||
| + | yellow | ||
| + | ``` | ||
| + | |||
| + | ### 3.3 For lsmd and lsmds | ||
| + | |||
| + | 1. `lsmd` will get MD tasks' status in the current server. `lsmds` will get the | ||
| + | statuses for several servers. | ||
| + | |||
| + | 2. `lsmds purple` will fetch the information of the wanted server `purple`. | ||
| + | |||
| + | ``` | ||
| + | purple | ||
| + | / | ||
| + | / | ||
| + | / | ||
| + | yellow | ||
| + | |||
| + | red | ||
| + | / | ||
| + | blue | ||
| + | / | ||
| + | / | ||
| + | orange | ||
| + | / | ||
| + | / | ||
| + | indigo | ||
| + | / | ||
| + | / | ||
| + | gold | ||
| + | / | ||
| + | green | ||
| + | / | ||
| + | ``` | ||
| + | |||
| + | ### 3.4 For killmd | ||
| + | |||
| + | Guide can be called by typing `killmd -h`. | ||
| + | ``` | ||
| + | usage: killmd.py [-h] [-a] [-p PID] [-g GPU] | ||
| + | |||
| + | Kill series of md tasks instantly. | ||
| + | |||
| + | options: | ||
| + | -h, --help | ||
| + | -a, --all Kill all md tasks | ||
| + | -p PID, --pid PID Process ID to kill | ||
| + | -g GPU, --gpu GPU GPU id | ||
| + | ``` | ||
| + | |||
| + | - `killmd -a` will kill all md tasks in the current server. | ||
| + | - `killmd -p 12345` will kill the task with pid `12345`, as well as related tasks | ||
| + | - `killmd -g 0` will kill tasks runing on `GPU 0` | ||
| + | |||
| + | ## 4. Acknowledgement | ||
| + | |||
| + | Thanks to Prof. Xue and Zhewei Qiu. I optimized their script | ||
| + | `checkamber.py` to get `lsmd.py`. | ||
| </ | </ | ||