Texas A&M Supercomputing Facility Texas A&M University Texas A&M Supercomputing Facility

Emergency Maintenance for Hydra -- UPDATE

Dear Hydra users, this is an update on the Hydra cluster resuscitation attempts.

Node hydra2.tamu.edu which suffered H/W failures in its I/O subsystem, has been "holding up" improperly resources on the disk storage DDN9550. This disallows other I/O servers to manually or automatically take over access of these resources and restore proper connectivity to the DDN9550 disk system. The /work and /scratch GPFS file systems have been affected by the I/O outage. /home and /usr/local are accessible.

In the process to manually recover access to the disks on the DDN9550, we will have to be re-booting the other login node hydra1.tamu.edu (and likely another I/O node f1n2) at will.

This means that your login sessions on hydra1.tamu.edu can be cancelled at any time. We will allow hydra1.tamu.edu to remain accessible for at least another hour so that you can save your work or copy files you may need out of your /home directory.

One hydra1.tamu.edu becomes unavailable it will remain so for an un-predetermined duration of time. You should consider the entire Hydra cluster as unavailable for any processing. We will be making login node(s) available for access for users to retrieve their files from their /home directories.

We will make further announcements as things progress.

Posted on: 11:45 AM, August 8, 2012