# servst [Gitlab](https://www.frcbs.tsinghua.edu.cn/gitlab/zhangyuankuan/servst) ## 1. Description `servst` is primarily a collection of tools designed to inspect the utilization status of each server. Then I add support for MD tasks, to show them or kill them conveniently and thoroughly. - `gpust` and `cpust` provide status for servers - `gpurf` and `cpurf` refresh status information for all or specific servers - `lsmd` and `lsmds` fetch the latest progress of MD tasks for servers - `killmd` kill target MD tasks ## 2. Installation There is no need for installation. The commands are available as global aliases. HOWEVER, password-free configuration is recommended for `gpurf`, `cpurf` and `lsmds` ## 3. Usage ### 3.1 For gpust and cpust: These commands can be used on any server to get GPU or CPU status of any server. If prompted, enter the password of `rainbow`. ``` +---------------------------- GPU STATUS ----------------------------+ | Server: name of the server | | Occupation: * for occupied GPU, 4 for free 4080 card, and so on. | | processing servers: | | yellow purple orange indigo gold green white red blue | | 2024-07-10 10:00:01 | +--------------------------------------------------------------------+ Server Occupation Last Updated blue 33 2024-07-10 10:00:01 gold 3 2024-07-10 10:00:01 green 3 2024-07-10 10:00:01 indigo 33 2024-07-10 10:00:01 orange 33 2024-07-10 10:00:01 purple 4******* 2024-07-10 10:00:01 red 33 2024-07-10 10:00:01 yellow 444***** 2024-07-10 10:00:01 ``` ``` +---------------------------- CPU STATUS ----------------------------+ | Server: name of the server | | Total: total number of cores in one server | | Idle: average number of idle (not used) cores in the last 5 seconds| | processing servers: | | yellow purple orange indigo gold green white violet black | | 2024-07-10 10:00:01 | +--------------------------------------------------------------------+ Server Total Idle Last Updated black 56 28 2024-07-10 10:00:01 gold 56 32 2024-07-10 10:00:01 green 24 22 2024-07-10 10:00:01 indigo 56 15 2024-07-10 10:00:01 orange 56 31 2024-07-10 10:00:01 purple 96 10 2024-07-10 10:00:01 violet 56 32 2024-07-10 10:00:01 yellow 96 06 2024-07-10 10:00:01 ``` ### 3.2 For gpurf and cpurf: 1. `gpurf` updates the status of all servers. Without password-free configuration, you'll need to enter the password for 11 times... So it's highly likely that you will not use this one. 2. `gpurf yellow` updates the status of merely the wanted server `yellow`. You must either have password-free access or provide the password of `yellow` then `rainbow` when prompted. The statuses of the other servers will remain as previously displayed. The column `Last Updated` indicates updating time of each server. ``` +---------------------------- GPU STATUS ----------------------------+ | Server: name of the server | | Occupation: * for occupied GPU, 4 for free 4080 card, and so on. | | processing servers: | | yellow purple orange indigo gold green white red blue | | 2024-07-10 10:00:01 | +--------------------------------------------------------------------+ Server Occupation Last Updated blue 33 2024-07-10 10:00:01 gold 3 2024-07-10 10:00:01 green 3 2024-07-10 10:00:01 indigo 33 2024-07-10 10:00:01 orange 33 2024-07-10 10:00:01 purple 4******* 2024-07-10 10:00:01 red 33 2024-07-10 10:00:01 white 3 2024-07-10 10:00:01 yellow 444***** 2024-07-10 10:26:54 ``` ### 3.3 For lsmd and lsmds 1. `lsmd` will get MD tasks' status in the current server. `lsmds` will get the statuses for several servers. 2. `lsmds purple` will fetch the information of the wanted server `purple`. ``` purple /home/zhangyk/tmp9/5_ff14SB_1/8ubuild/5_run/run00094.nc /home/zhangyk/tmp9/4_ff99SBildn_1/8ubuild/5_run/run00094.nc /home/zhangyk/tmp9/6_ff19SB_1/8ubuild/5_run/run00180.nc yellow red /mnt/d4/zhangyk/tmp2/6_ff19SB_1/8ubuild/5_run/run00930.nc blue /home/zhangyk/tmp7/3_ff99SB_1/8ubuild/5_run/run00301.nc /home/zhangyk/tmp6/4_ff99SBildn_1/8ubuild/5_run/run00516.nc orange /home/zhangyk/tmp6/3_ff99SB_1/8ubuild/5_run/run00374.nc /home/zhangyk/tmp6/1_ff94_1/8ubuild/5_run/run00403.nc indigo /home/zhangyk/tmp7/2_ff99_1/8ubuild/5_run/run00378.nc /home/zhangyk/tmp7/1_ff94_1/8ubuild/5_run/run00385.nc gold /home/zhangyk/tmp7/4_ff99SBildn_1/8ubuild/5_run/run00081.nc green /mnt/d8/zhangyk/tmp2/7_charmm22_1/8ubuild/5_run/run00964.nc ``` ### 3.4 For killmd Guide can be called by typing `killmd -h`. ``` usage: killmd.py [-h] [-a] [-p PID] [-g GPU] Kill series of md tasks instantly. options: -h, --help show this help message and exit -a, --all Kill all md tasks -p PID, --pid PID Process ID to kill -g GPU, --gpu GPU GPU id ``` - `killmd -a` will kill all md tasks in the current server. - `killmd -p 12345` will kill the task with pid `12345`, as well as related tasks - `killmd -g 0` will kill tasks runing on `GPU 0` ## 4. Acknowledgement Thanks to Prof. Xue and Zhewei Qiu. I optimized their script `checkamber.py` to get `lsmd.py`.