Skip to content

πŸ‹ ezpz: PBS Job Management

πŸ€” Determine Details of Currently Active Job

  1. Find all currently running1 jobs owned by the user.
  2. For each of these running jobs, build a dictionary of the form:

    1
    2
    3
    4
    5
    6
    7
    >>> jobs = ezpz.pbs.get_users_running_pbs_jobs()
    >>> jobs
    {
        jobid_A: [host_A0, host_A1, host_A2, ..., host_AN],
        jobid_B: [host_B0, host_B1, host_B2, ..., host_BN],
        ...,
    }
    
  3. Look for our hostname in the list of hosts for each job.

  4. If found, we know we are participating in that job.

  5. Once we have the PBS_JOBID of the job containing our hostname, we can find the hostfile for that job.

  6. The hostfile is located in /var/spool/pbs/aux/.
  7. The filename is of the form jobid.hostname.

  8. βœ… Done!

    Example:

    1
    2
    3
    jobid = ezpz.pbs.get_pbs_jobid_of_active_job()
    num_nodes = len(jobs[jobid])
    world_size = num_nodes * ezpz.get_gpus_per_node()
    

    • Running: Can have multiple PBS jobs running at the same time
    • Active: Can only have one active PBS job at a time
      • This is the job that we are currently running on