For internal use
For internal use
servst is primarily a collection of tools designed to inspect the utilization
status of each server. Then I add support for MD tasks, to show them or kill
them conveniently and thoroughly.
gpust and cpust provide status for serversgpurf and cpurf refresh status information for all or specific serverslsmd and lsmds fetch the latest progress of MD tasks for serverskillmd kill target MD tasksThere is no need for installation. The commands are available as global aliases.
HOWEVER, password-free configuration is recommended for gpurf, cpurf
and lsmds
These commands can be used on any server to get GPU or CPU status of any server.
If prompted, enter the password of rainbow.
+---------------------------- GPU STATUS ----------------------------+ | Server: name of the server | | Occupation: * for occupied GPU, 4 for free 4080 card, and so on. | | processing servers: | | yellow purple orange indigo gold green white red blue | | 2024-07-10 10:00:01 | +--------------------------------------------------------------------+ Server Occupation Last Updated blue 33 2024-07-10 10:00:01 gold 3 2024-07-10 10:00:01 green 3 2024-07-10 10:00:01 indigo 33 2024-07-10 10:00:01 orange 33 2024-07-10 10:00:01 purple 4******* 2024-07-10 10:00:01 red 33 2024-07-10 10:00:01 yellow 444***** 2024-07-10 10:00:01
+---------------------------- CPU STATUS ----------------------------+ | Server: name of the server | | Total: total number of cores in one server | | Idle: average number of idle (not used) cores in the last 5 seconds| | processing servers: | | yellow purple orange indigo gold green white violet black | | 2024-07-10 10:00:01 | +--------------------------------------------------------------------+ Server Total Idle Last Updated black 56 28 2024-07-10 10:00:01 gold 56 32 2024-07-10 10:00:01 green 24 22 2024-07-10 10:00:01 indigo 56 15 2024-07-10 10:00:01 orange 56 31 2024-07-10 10:00:01 purple 96 10 2024-07-10 10:00:01 violet 56 32 2024-07-10 10:00:01 yellow 96 06 2024-07-10 10:00:01
gpurf updates the status of all servers. Without password-free configuration,
you'll need to enter the password for 11 times... So it's highly likely that
you will not use this one.
gpurf yellow updates the status of merely the wanted server yellow. You
must either have password-free access or provide the password of yellow then
rainbow when prompted. The statuses of the other servers will remain as
previously displayed. The column Last Updated indicates updating time of each
server.
+---------------------------- GPU STATUS ----------------------------+ | Server: name of the server | | Occupation: * for occupied GPU, 4 for free 4080 card, and so on. | | processing servers: | | yellow purple orange indigo gold green white red blue | | 2024-07-10 10:00:01 | +--------------------------------------------------------------------+ Server Occupation Last Updated blue 33 2024-07-10 10:00:01 gold 3 2024-07-10 10:00:01 green 3 2024-07-10 10:00:01 indigo 33 2024-07-10 10:00:01 orange 33 2024-07-10 10:00:01 purple 4******* 2024-07-10 10:00:01 red 33 2024-07-10 10:00:01 white 3 2024-07-10 10:00:01 yellow 444***** 2024-07-10 10:26:54
lsmd will get MD tasks' status in the current server. lsmds will get the
statuses for several servers.
lsmds purple will fetch the information of the wanted server purple.
purple /home/zhangyk/tmp9/5_ff14SB_1/8ubuild/5_run/run00094.nc /home/zhangyk/tmp9/4_ff99SBildn_1/8ubuild/5_run/run00094.nc /home/zhangyk/tmp9/6_ff19SB_1/8ubuild/5_run/run00180.nc yellow red /mnt/d4/zhangyk/tmp2/6_ff19SB_1/8ubuild/5_run/run00930.nc blue /home/zhangyk/tmp7/3_ff99SB_1/8ubuild/5_run/run00301.nc /home/zhangyk/tmp6/4_ff99SBildn_1/8ubuild/5_run/run00516.nc orange /home/zhangyk/tmp6/3_ff99SB_1/8ubuild/5_run/run00374.nc /home/zhangyk/tmp6/1_ff94_1/8ubuild/5_run/run00403.nc indigo /home/zhangyk/tmp7/2_ff99_1/8ubuild/5_run/run00378.nc /home/zhangyk/tmp7/1_ff94_1/8ubuild/5_run/run00385.nc gold /home/zhangyk/tmp7/4_ff99SBildn_1/8ubuild/5_run/run00081.nc green /mnt/d8/zhangyk/tmp2/7_charmm22_1/8ubuild/5_run/run00964.nc
Guide can be called by typing killmd -h.
usage: killmd.py [-h] [-a] [-p PID] [-g GPU] Kill series of md tasks instantly. options: -h, --help show this help message and exit -a, --all Kill all md tasks -p PID, --pid PID Process ID to kill -g GPU, --gpu GPU GPU id
killmd -a will kill all md tasks in the current server.killmd -p 12345 will kill the task with pid 12345, as well as related taskskillmd -g 0 will kill tasks runing on GPU 0
Thanks to Prof. Xue and Zhewei Qiu. I optimized their script
checkamber.py to get lsmd.py.