Architecture

Last update on Friday, 28-Jul-2006 16:20:28 CDT.

Introduction to Altix 3700 Architecture

cosmos.tamu.edu is an SGI Altix 3700 supercomputer, which consists of 128 Itanium-2® processors, and 256 Gigabytes of main memory. It is based on a Distributed Shared Memory (DSM) architecture, where memory is physically distributed among 32 Computation-bricks (``C-bricks''). Each C-brick consists of two pairs of 1.3GHz Itanium-2 64-bit μ-processors, locally attached memory, cache-coherence logic and interconnection fabric. Altix 3000 class of computers provide a global, cache-coherent, Non-Uniform Access Time (cc-NUMA) shared memory.

Each processor can access any memory locations through a high-bandwidth/low-latency system interconnect, based on SGI's NUMAflex architecture. The interconnect is organized in a Dual Plane, Fat-Tree layout and scales with system growth. Initial memory access times vary with the number of router-hops (distance) that data items have to travel.

cosmos is directly attached to a 10 Terabyte TP9500 RAID system via four 2-Gigabit/sec fibre-channels.

The operating system of cosmos is the IA-64 version of Linux with several SMP and NUMA-aware enhancements provided by SGI and the OSS community. The operating system runs as a Single System Image (SSI) on the 128 PEs of cosmos. We are currently using SGI ProPack 4 which is based on the 2.6 distribution of the Linux kernel.

The information contained in this page has been compiled from SGI manuals and papers published in the literature. See also the architecture part of the cosmos user guide.

Table 1. Altix 3700 Configuration -- Summary

Component Specifications
Number of processors 128
Physical memory size 256 GBytes DDR SDRAM with ECC
Memory architecture Cache Coherent, Non-Uniform Memory Access (cc-NUMA) times
Operating system
  • ProPack 4, based on IA-64 Linux 2.6
  • 128 processors single operating system image (SSI) size
  • Processor type, ISA 1.3 GHz Intel Itanium-2® μ-processors (Madison), IA-64 Little Endian
    Cache memories (on processor die)
    Level Total size Block Size Associativity Write policy Load/Store latency
    (Clocks)
    L1 16KB instr 64 bytes 4 N/A 1/
    16KB data 64 bytes 4 write-through 1/3
    L2 256KB unified 128 bytes 8 write-back 5/7
    L3 3MB unified 128 bytes 12 write-back 14/7
    Number of processors / node, FSB 2 [Fig. 1]
    Size of local memory / node 4 Gigabytes [Fig. 1]
    Number of nodes and SHUBs / C-brick 2 (NODE 0, NODE 1) [Fig. 2]
    Number of C-bricks 32
    Cache coherence protocol within node Itanium-2 snoopy bus (FSB)
    Interconnection between nodes for global shared memory SGI® NUMAlink™-3
  • Dual-plane, fat-tree topology [Fig. 3]
  • Router-brick (R-brick) with 8×8 cross-bar switch [Fig. 4]
  • 3.2 GB/sec bidirectional bandwidth per link
  • 400 MB/sec bandwitdh available for each processor
  • path length is O(log N), N = number of processors; 5 routing hops maximum; 50 nanosecs / router
  • Global cache coherence protocol SGI® NUMAflex™ Directory-based, write-invalidation (cache controllers in SHUBs)
    Memory-only modules Not on cosmos
    Number of IX-bricks 2 (1 for each 64 processors)
    Base PCI/PCI-X slots Per IX-brick
  • 5 PCI-X buses (64-bit, 133 MHz), 10 slots available
  • 1 64-bit, 66 MHz PCI slot
  • Networking Six 1-Gigabit ethernet (2 fiber, 4 copper)
    System disks Four 36GB disks (1 for each IX-brick)
    Disk expansion unit TP9500 10 Terabyte RAID disk


    Fig. 1 An Altix node with 2 Itanium-2 μ-processors on the Front-Side Bus (FSB), local DDR SDRAM, SHUB chip and NUMAlink-3 and XIO channels.

    Fig. 2 An Altix C-brick with 2 nodes, 2 NUMAlink-3 and 2 XIO channels.

    Fig. 3 An 128 PE Altix with a Dual-Plane Fat-Tree interconnection topology.

    Fig. 4 Detail of the internals of a Router-Brick (R-brick) with an 8×8 cross-bar switch and 8 bidirectional NUMAlink channels.

    Additional Information

    Other Architecture Related Links

    MIPS Documentation

    MIPS Micro-Processor

    MIPS Cool Applications.

    SPARC Micro-Processor Ultra and previous.

    Information of Intel server architecture.

    Pentium4 and Intel 32-bit architecture (IA-32) reference manuals.

    Xeon and Intel 32-bit architecture (IA-32) reference manuals.

    Itamiun-2 and Intel 64-bit Instruction Set Architecture (IA-64).

    Discussion on the Intel Pentium FDIV bug. A white paper on the Pentium bug

    Great Microprocessors of the Past and Present

    CPU information center

    A nice tutorial on PC buses.

    Icarus site with freeware design automation tools.

    GEDA site with freeware design automation tools.

    OpenCollector site with pointers to open source design automation tools.

    A nice introduction to micro-programming.