Setting Walltimes for Compute JobsBack to documentation index
Setting Walltimes for Compute Jobs
As of July 30th 2017, POD compute jobs will require a walltime request along with your request for nodes, cores per node, and compute node queue.
- Walltimes define the maximum amount of time a job should run.
- You are only billed for the time you consume on the cluster. So, a job that happens to complete in 5 minutes, but was submitted with a maximum walltime of 1 hour, only gets billed for 5 minutes.
- Walltimes are a hard limit, and jobs will be killed if they exceed your walltime request.
Below, we review the advantages of walltime requests and best practices for avoiding premature termination of a compute job.
Walltimes Optimize Scheduling
Walltime definitions optimize your job's scheduling, and ensures that your compute jobs start as quickly as possible. Previously, users submitting without walltime requests would default to a 99 day request from the scheduler. This creates unnecessary wait time in the queue, especially for large core jobs.
As an example, if your 1,000 core job should run no longer than three hours, not setting a walltime resulted in the scheduler ensuring you had 99 straight days of access to the 1,000 cores - trigging undesired wait time in the queue due to POD's "fair share" algorithm for compute resources.
Walltimes for Billing Control
Since walltimes are a hard limit, they enable you to control the maximum a job can be billed. Walltimes can also prevent experimental code or models from running longer than expected. For instance, if you launch a large array of jobs that should only take a maximum of 15 minutes each, setting a walltime of of 20 minutes per job will prevent "run-away" jobs as you experiment with new codes, workflows and models.
For first time users, or new workloads - if you want to ensure your job runs to completion, no matter what time it takes, it is recommended that you grossly overestimate the walltime. For instance, if you think your job should finish in 4 hours, you might want to set the walltime request to 8 or 10 hours the first time around.
Then, once you have a good feeling for your job's walltime needs, it l recommended that you pad walltimes by 10-15% moving forward to allow some buffer. For production runs, you will generally want to request a walltime that is reasonable, but does not request excessive compute time that your job will not require.
Job Request Usage
Walltime requests are noted by -l walltime=HH:MM::SS. An example B30 job for 5 nodes, 28 cores per node, that runs no longer than 24 hours.
#PBS -q B30 #PBS -l nodes=5:ppn=28 #PBS -l walltime=24:00:00
Walltimes for Job Arrays
For job arrays, the walltime is applied to each job element. As an example, this job array of 10 elements, using the job script myscript.sub, will apply a 2 hour walltime per job element of 1 node and 20 cores.
pod$ qsub -t1-10 -q T30 -l nodes=1:ppn=20 -l walltime=02:00:00 myscript.sub
Getting Additional Help
If you have any questions about job walltimes or best practices, please contact POD Support: email@example.com