Basic System Information

Last update on Sunday, 07-Sep-2008 20:06:58 CDT.

Hydra is a 640-processor IBM Cluster system. The processors are IBM's 1.9GHz Power5+'s and these are physically packaged and organized 16 to a node. A node, a p575 node so called, is an SMP, a symmetric multi-processor system, that is, with 16 Power5+ processors and a shared memory of 32 gigabytes. Of these 32 gigabytes only 25 are available for user processing. Keep that in mind when setting batch job memory limits. The 40 nodes are further organized and housed into four physical frames or racks, stacked ten nodes to a rack, and interconnected by a high performance switch (HPS), as well as with a gigabit ethernet. The cluster uses the HPS for parallel processing and communication between the nodes. Each p575 node connects to the HPS network of nodes using two adapters. Each of the two HPS adapters in effect attaches to one of the two available subnetworks via which HPS routes a message packet to another node.

Fig. 1: Power5+ Dual Core Chip. The Basic Building Block

Fig. 2: The P5-575+ Node in Reality

Fig. 3: Internal Node Interconnections Schematic

Fig. 4: Node Roles as Configured on HYDRA

Fig. 5: The DDN Disk Raid Array connections to Hydra

You will find a much more detailed and informative description of hardware issues at Architecture

Login Nodes: hydra & hydra2

The staff has configured the naming of the nodes to reflect their physical location in the four racks. A node name consists of a four- or a five-character string, f[1-4]n[1-10]. For example, f3n9, refers to the 9th node in rack 3, f1n1 is node 1 in rack 1, etc. Node numbers increase from the (physical) bottom up, 1-10. Two of the 40 nodes, f1n1 and f1n10, are allocated to interactive processing. Logins are enabled only to those two nodes. The internet host nanes of f1n1 and f1n10 are hydra and hydra2, respectively. The rest of the nodes are only accessible by the LoadLeveler, the batch facility. You can view this list of nodes using the LoadLeveler command, listnodes (llstatus -f %n will also work). Even more useful for tracking batch jobs is the listnodeusage command which lists the nodes that specific jobs run on. A sample listing follows.

 > listnodeusage
Job ID            Owner        Class  Cpus  ST  Node(Tasks,CCpusPerTask)
------            -----   ----------  ----  --  ------------------------
f1n2.154280.0   y0m4156        mpi32    32   R  f1n3(16,1), f1n4(16,1)
f1n2.154281.0   y0m4156        mpi32    32   R  f1n8(16,1), f3n7(16,1)
f1n2.154724.0   c0s2008     smp_long     8   R  f3n9(8,1)
f1n2.154781.0   hhp0872        mpi32    32   R  f3n4(16,1), f4n7(16,1)
f1n2.155218.0   q0s1711        mpi32    32   R  f1n7(16,1), f2n8(16,1)
f1n2.155232.0      link        mpi64    58   R  f2n3(10,1), f2n6(16,1), f3n10(16,1), f4n5(16,1)
f1n2.155844.0      ryan    geo_group    32   R  f3n2(16,1), f3n5(16,1)
f1n2.156041.0   georget     cs_group     4   R  f2n2(4,1)
f1n2.156042.0   georget     cs_group     8   R  f4n6(8,1)
f1n2.156043.0   georget     cs_group     8   R  f4n2(8,1)
f1n9.153824.0   y0m4156        mpi32    32   R  f3n1(16,1), f4n9(16,1)
f1n9.153825.0   y0m4156        mpi32    32   R  f2n1(16,1), f2n10(16,1)
f1n9.153826.0   y0m4156        mpi32    32   R  f4n3(16,1), f4n4(16,1)
f1n9.154325.0   hhp0872        mpi32    32   R  f3n3(16,1), f3n6(16,1)
f1n9.154763.0   q0s1711        mpi32    32   R  f1n6(16,1), f2n7(16,1)
f1n9.155270.0    rivera        mpi32    32   R  f3n8(16,1), f4n8(16,1)
f1n9.155372.0   yubofan        mpi32    32   R  f2n4(16,1), f4n1(16,1)
f1n9.155586.0   georget     cs_group     4   R  f1n5(4,1)
f1n9.155589.0   georget     cs_group     4   R  f4n10(4,1)
f1n9.155590.0   georget     cs_group     8   R  f2n9(8,1)

Total                                  486      
>