User Tools

Site Tools


servst

This is an old revision of the document!


servst

1. Description

servst is a collection of tools designed to inspect the utilization status of each server.

  • gpust and cpust provide status for servers
  • gpurf and cpurf refresh status information for all or specific servers
  • lsmd and lsmds fetch the latest progress of MD tasks for servers

2. Installation

There is no need for installation. The commands are available as global aliases.

HOWEVER, password-free configuration is recommended for gpurf, cpurf and lsmds

3. Usage

3.1 For gpust and cpust:

These commands can be used on any server to get GPU or CPU status of any server. If prompted, enter the password of rainbow.

+---------------------------- GPU STATUS ----------------------------+
| Server: name of the server                                         |
| Occupation: * for occupied GPU, 4 for free 4080 card, and so on.   |
| processing servers:                                                |
| yellow purple orange indigo gold green white red blue              |
| 2024-07-10 10:00:01                                                |
+--------------------------------------------------------------------+

Server  Occupation      Last Updated
blue    33              2024-07-10 10:00:01
gold    3               2024-07-10 10:00:01
green   3               2024-07-10 10:00:01
indigo  33              2024-07-10 10:00:01
orange  33              2024-07-10 10:00:01
purple  4*******        2024-07-10 10:00:01
red     33              2024-07-10 10:00:01
yellow  444*****        2024-07-10 10:00:01
+---------------------------- CPU STATUS ----------------------------+
| Server: name of the server                                         |
| Total: total number of cores in one server                         |
| Idle: average number of idle (not used) cores in the last 5 seconds|
| processing servers:                                                |
| yellow purple orange indigo gold green white violet black          |
| 2024-07-10 10:00:01                                                |
+--------------------------------------------------------------------+

Server  Total   Idle    Last Updated
black   56      28      2024-07-10 10:00:01
gold    56      32      2024-07-10 10:00:01
green   24      22      2024-07-10 10:00:01
indigo  56      15      2024-07-10 10:00:01
orange  56      31      2024-07-10 10:00:01
purple  96      10      2024-07-10 10:00:01
violet  56      32      2024-07-10 10:00:01
yellow  96      06      2024-07-10 10:00:01

3.2 For gpurf and cpurf:

  1. gpurf updates the status of all servers. Without password-free configuration, you'll need to enter the password for 11 times... So it's highly likely that you will not use this one.

  2. gpurf yellow updates the status of merely the wanted server yellow. You must either have password-free access or provide the password of yellow then rainbow when prompted. The statuses of the other servers will remain as previously displayed. The column Last Updated indicates updating time of each server.

+---------------------------- GPU STATUS ----------------------------+
| Server: name of the server                                         |
| Occupation: * for occupied GPU, 4 for free 4080 card, and so on.   |
| processing servers:                                                |
| yellow purple orange indigo gold green white red blue              |
| 2024-07-10 10:00:01                                                |
+--------------------------------------------------------------------+

Server  Occupation      Last Updated
blue    33              2024-07-10 10:00:01
gold    3               2024-07-10 10:00:01
green   3               2024-07-10 10:00:01
indigo  33              2024-07-10 10:00:01
orange  33              2024-07-10 10:00:01
purple  4*******        2024-07-10 10:00:01
red     33              2024-07-10 10:00:01
white   3               2024-07-10 10:00:01
yellow  444*****        2024-07-10 10:26:54

3.3 For lsmd and lsmds

  1. lsmd will get MD tasks' status in the current server. lsmds will get the statuses for several servers.

  2. lsmds purple will fetch the information of the wanted server purple.

purple
/home/zhangyk/tmp9/5_ff14SB_1/8ubuild/5_run/run00094.nc
/home/zhangyk/tmp9/4_ff99SBildn_1/8ubuild/5_run/run00094.nc
/home/zhangyk/tmp9/6_ff19SB_1/8ubuild/5_run/run00180.nc
yellow

red
/mnt/d4/zhangyk/tmp2/6_ff19SB_1/8ubuild/5_run/run00930.nc
blue
/home/zhangyk/tmp7/3_ff99SB_1/8ubuild/5_run/run00301.nc
/home/zhangyk/tmp6/4_ff99SBildn_1/8ubuild/5_run/run00516.nc
orange
/home/zhangyk/tmp6/3_ff99SB_1/8ubuild/5_run/run00374.nc
/home/zhangyk/tmp6/1_ff94_1/8ubuild/5_run/run00403.nc
indigo
/home/zhangyk/tmp7/2_ff99_1/8ubuild/5_run/run00378.nc
/home/zhangyk/tmp7/1_ff94_1/8ubuild/5_run/run00385.nc
gold
/home/zhangyk/tmp7/4_ff99SBildn_1/8ubuild/5_run/run00081.nc
green
/mnt/d8/zhangyk/tmp2/7_charmm22_1/8ubuild/5_run/run00964.nc

4. Design

4.1 For gpust and gpurf

4.1.1 Outer layer

I write some global aliases which will be ready when a shell instance is opened. As you can see, the content of gpuinfo, located at rainbow:/tmp/gpust will be fetched.

For tcsh:

# File location: 
    # /etc/csh.cshrc (for servers with OS Ubuntu and Centos)
    # /etc/tmp.cshrc (for servers with Slackware, including orange and violet)
alias gpust ssh 101.6.120.23 'cat /tmp/gpust/gpuinfo'
alias cpust ssh 101.6.120.23 'cat /tmp/cpust/cpuinfo'
alias gpurf ssh 101.6.120.23 '/home/zhangyk/codelib/cmd/gpust/gpust.py'
alias cpurf ssh 101.6.120.23 '/home/zhangyk/codelib/cmd/gpust/cpust.py'
alias lsmds /home/zhangyk/codelib/cmd/servst/lsmds.py
alias lsmd /home/zhangyk/codelib/cmd/servst/lsmd.py

For bash:

# File location: 
    # /etc/bash.bashrc (for servers with OS Ubuntu and Centos)
    # /etc/bashrc (for servers with OS Slackware, including orange and violet)
alias gpust="ssh 101.6.120.23 'cat /tmp/gpust/gpuinfo'"
alias cpust="ssh 101.6.120.23 'cat /tmp/cpust/cpuinfo'"
alias gpurf="ssh 101.6.120.23 '/home/zhangyk/codelib/cmd/servst/gpust.py'"
alias cpurf="ssh 101.6.120.23 '/home/zhangyk/codelib/cmd/servst/cpust.py'"
alias lsmds='/home/zhangyk/codelib/cmd/servst/lsmds.py'
alias lsmd='/home/zhangyk/codelib/cmd/servst/lsmd.py'

4.1.2 Inner layer

The scripts gpust.py and cpust.py execute hourly, collecting information from each server, extracting useful data and storing it in gpuinfo and cpuinfo

4.2 For lsmd and lsmds

lsmd will fetch PID of processes with pmemd in their names and called by the user, get the cwd (current work directory) then find the latest .nc file.

lsmds will collect the output of lsmd on wanted servers and print them all.

servst.1752634788.txt.gz · Last modified: 2025/07/16 02:59 by zhangyk