Texas A&M Supercomputing Facility Texas A&M University Texas A&M Supercomputing Facility

Emergency Maintenance for Hydra node f1n10 (hydra2)

Hydra node f1n10 (hydra2.tamu.edu) has suffered H/W malfunction(s) in its "Power and Cooling" subsystem this past Sunday, 08/05/2012. The node is currently operating with reduced power and cooling capabilities and any further failure will take it out completely. Unfortunately, node f1n10 is very important for GPFS operations of the entire Hydra cluster. This node along with f1n9 (hydra1.tamu.edu) is an interactive log-on node.

We will attempt a H/W repair on the node as soon as possible, so expect the node to become unavailable at any time. We advise you start using log-on node hydra1.tamu.edu, save your work and log-out of hydra2 as soon as possible.

The success of our attempt to repair the node relies on the availability of the specific component that has failed. There is a possibility we cannot restore the power and cooling redundancy and that this node will eventually fail and become completely unusable. Also note that we have to power the node off. Unfortunately, power on and off events for this age of the hardware may cause more aging components to fail and thus make the node unusable.

We will post announcements as things progress. In the mean-time save your work, log-out of hydra2.tamu.edu and start using the other interactive log-on node hydra1.tamu.edu.

Posted on: 12:46 PM, August 6, 2012