Basic System Information
Hydra is a 52-node, 832-processor IBM Cluster-1600 cluster. The processors are IBM's 1.9GHz RISC Power5+'s and these are physically packaged and organized 16 to a node. A node, a p575 node so called, is a symmetric multi-processor (SMP) system with 16 Power5+ processors and a shared memory of 32 gigabytes. Of these 32 gigabytes only 25 are available for user processing. Keep that in mind when setting batch job memory limits. The 52 nodes are further organized and housed into five physical frames or racks, stacked ten nodes to a rack for frames 1-4 and twelve for frame5. In all, only 48 nodes are interconnected via the HPS, the IBM High-Performance communication Switch: all 40 nodes in frames 1-4 and only 8 nodes in frame5. All 52 nodes are interconnected with gigabit ethernet. The cluster uses the HPS for parallel processing and other communication between the 48 nodes. Each of the 48 p575 nodes connects to the HPS network using two adapters. Each of these in effect attaches to one of the two available subnetworks via which HPS routes a message packet to another (of the 48) node.
Fig. 1: Power5+ Dual Core Chip. The Basic Building Block
Fig. 2: The p5 575 Node at a Glance
Fig. 3: p5 575 Internal Schematic
Fig. 4: Node Roles as Configured on HYDRA
Fig. 5: The DDN Disk Raid Array connections to Hydra
Advanced Hardware and Software Architecture
You will find a much more detailed and informative description of a number of advanced hardware and software architecture issues for the entire Hydra cluster in the Advanced Cluster Architecture section of the user guide.
Login Nodes: hydra1.tamu.edu and hydra2.tamu.edu
The staff has configured the naming of the nodes to reflect their physical location in the five racks. In the first four racks there are ten nodes. A fifth rack, added in February 2009, has 12 nodes. A node name consists of a four- or a five-character string, f[1-5]n[1-10/12]. For example, f3n9, refers to the 9th node in rack 3, f5n12 is node 12 in rack 5, etc. Node numbers increase from the (physical) bottom up, 1-10/12. Two of the 48 HPS-interconnected nodes, f1n9 and f1n10, are allocated to interactive processing. Logins are enabled only to those two nodes. The internet host nanes of f1n9 and f1n10 are hydra1 and hydra2, respectively. The rest of the nodes are only accessible by the LoadLeveler, the batch facility. You can view this list of nodes using the LoadLeveler command, listnodes (llstatus -f %n will also work). Even more useful for tracking batch jobs is the listnodeusage command which lists the nodes that specific jobs run on. A sample listing follows.
hydra# listnodeusage Job ID Owner Class Cpus ST Node(Tasks,CCpusPerTask) ------ ----- ---------- ---- -- ------------------------ f1n2.154280.0 y0m4156 mpi32 32 R f1n3(16,1), f1n4(16,1) f1n2.154281.0 y0m4156 mpi32 32 R f1n8(16,1), f3n7(16,1) f1n2.154724.0 c0s2008 smp_long 8 R f3n9(8,1) f1n2.154781.0 hhp0872 mpi32 32 R f3n4(16,1), f4n7(16,1) f1n2.155218.0 q0s1711 mpi32 32 R f1n7(16,1), f2n8(16,1) f1n2.155232.0 link mpi64 58 R f2n3(10,1), f2n6(16,1), f3n10(16,1), f4n5(16,1) f1n2.155844.0 ryan geo_group 32 R f3n2(16,1), f3n5(16,1) f1n2.156041.0 georget cs_group 4 R f2n2(4,1) f1n2.156042.0 georget cs_group 8 R f4n6(8,1) f1n2.156043.0 georget cs_group 8 R f4n2(8,1) f1n9.153824.0 y0m4156 mpi32 32 R f3n1(16,1), f4n9(16,1) f1n9.153825.0 y0m4156 mpi32 32 R f2n1(16,1), f2n10(16,1) f1n9.153826.0 y0m4156 mpi32 32 R f4n3(16,1), f4n4(16,1) f1n9.154325.0 hhp0872 mpi32 32 R f3n3(16,1), f3n6(16,1) f1n9.154763.0 q0s1711 mpi32 32 R f1n6(16,1), f2n7(16,1) f1n9.155270.0 rivera mpi32 32 R f3n8(16,1), f4n8(16,1) f1n9.155372.0 yubofan mpi32 32 R f2n4(16,1), f4n1(16,1) f1n9.155586.0 georget cs_group 4 R f1n5(4,1) f1n9.155589.0 georget cs_group 4 R f4n10(4,1) f1n9.155590.0 georget cs_group 8 R f2n9(8,1) Total 486