CPUSETs on cosmosLast update on Friday, 28-Jul-2006 16:20:28 CDT. | |||
|
cosmos has 64 physical nodes, each node contains 2 cpus and 4 GB of memory as shown in Figure 2 in the system description of cosmos. However, some memory is reserved by the memory system of cosmos. The approximate available memory per node is 3850 MB. PBS, the batch system on cosmos, considers this behavior when assigning and using CPUSETs, an operating system construct for restricting processes to using certain cpus and memory. The system is currently configured with a 6 cpu setup for which all interactive processing is restricted to. The remaining 122 cpus are available to PBS. PBS allocates and deallocates two types of cpusets, dedicated and shared, for batch jobs. The type and size of the cpuset is determined by the batch job's cpu and memory requirements. PBS is configured to assign dedicated cpusets only to jobs whose resource requirements exceed 2 cpus and 3850 MB. Jobs whose resource requirements are smaller than this threshold are placed in shared cpusets. A job in a shared cpuset whose resource requirements occupy a majority of a shared cpuset could prevent other candidate jobs for shared cpusets from being assigned to shared cpusets. This leads to cpus being unusable and thus wasted until the shared cpuset job finishes. At the end of your job, you should add the following command to get an estimate of your job's resource usage in your batch job output:
An example of the relevant output about job resource usage from the above command:
This shows that the actual memory usage is only about 1.3 GB. However, the below shows that the job requested 3 GB of memory:
So this job has made a cpu unusable for its entire duration. If you specify more accurate memory requirements, then there is a greater chance that other 1 cpu jobs could fit into a shared cpuset with your jobs. However, if your memory requirement specification is justifiable, then it cannot be helped. |