This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| servst [2025/07/16 02:59] – zhangyk | servst [2025/07/17 04:34] (current) – zhangyk | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| < | < | ||
| # servst | # servst | ||
| + | |||
| + | [Gitlab](https:// | ||
| ## 1. Description | ## 1. Description | ||
| - | `servst` is a collection of tools designed to inspect the utilization status of | + | `servst` is primarily |
| - | each server. | + | status of each server. Then I add support for MD tasks, to show them or kill |
| + | them conveniently and thoroughly. | ||
| - `gpust` and `cpust` provide status for servers | - `gpust` and `cpust` provide status for servers | ||
| - `gpurf` and `cpurf` refresh status information for all or specific servers | - `gpurf` and `cpurf` refresh status information for all or specific servers | ||
| - `lsmd` | - `lsmd` | ||
| + | - `killmd` kill target MD tasks | ||
| ## 2. Installation | ## 2. Installation | ||
| Line 129: | Line 133: | ||
| ``` | ``` | ||
| - | ## 4. Design | + | ### 3.4 For killmd |
| - | ### 4.1 For gpust and gpurf | + | Guide can be called by typing `killmd -h`. |
| + | ``` | ||
| + | usage: killmd.py [-h] [-a] [-p PID] [-g GPU] | ||
| - | #### 4.1.1 Outer layer | + | Kill series |
| - | I write some global aliases which will be ready when a shell instance is opened. | + | |
| - | As you can see, the content | + | |
| - | be fetched. | + | |
| - | For `tcsh`: | + | options: |
| - | ```sh | + | -h, --help |
| - | # File location: | + | -a, --all Kill all md tasks |
| - | # / | + | -p PID, --pid PID Process ID to kill |
| - | # / | + | -g GPU, --gpu GPU GPU id |
| - | alias gpust ssh 101.6.120.23 'cat / | + | |
| - | alias cpust ssh 101.6.120.23 'cat / | + | |
| - | alias gpurf ssh 101.6.120.23 '/ | + | |
| - | alias cpurf ssh 101.6.120.23 '/ | + | |
| - | alias lsmds / | + | |
| - | alias lsmd / | + | |
| ``` | ``` | ||
| - | For `bash`: | + | - `killmd -a` will kill all md tasks in the current |
| - | ```sh | + | - `killmd -p 12345` will kill the task with pid `12345`, as well as related tasks |
| - | # File location: | + | - `killmd -g 0` will kill tasks runing on `GPU 0` |
| - | # / | + | |
| - | # /etc/bashrc (for servers with OS Slackware, including orange and violet) | + | |
| - | alias gpust=" | + | |
| - | alias cpust=" | + | |
| - | alias gpurf=" | + | |
| - | alias cpurf=" | + | |
| - | alias lsmds='/ | + | |
| - | alias lsmd='/ | + | |
| - | ``` | + | |
| - | + | ||
| - | #### 4.1.2 Inner layer | + | |
| - | + | ||
| - | The scripts `gpust.py` and `cpust.py` execute hourly, collecting information | + | |
| - | from each server, extracting useful data and storing it in `gpuinfo` and `cpuinfo` | + | |
| - | + | ||
| - | ### 4.2 For lsmd and lsmds | + | |
| - | + | ||
| - | `lsmd` will fetch `PID` of processes | + | |
| - | the user, get the `cwd` (current work directory) then find the latest | + | |
| - | `lsmds` will collect the output of `lsmd` on wanted servers and print them all. | + | ## 4. Acknowledgement |
| + | Thanks to Prof. Xue and Zhewei Qiu. I optimized their script | ||
| + | `checkamber.py` to get `lsmd.py`. | ||
| </ | </ | ||