Base the decision on what kinds of applications you are going to run on the cluster. Will the applications running on the cluster include larger-than-32-bit floating-point calculations, depend on massive data sets, or require the fastest turnaround times possible? Will the applications and data run from 32-bit addressable memory space? How constrained is the budget?
Answering these questions helps you determine whether nodes should be based on 64-bit Intel® Itanium® 2 processors, 32-bit Intel® Xeon® processors, or Intel Xeon processors MP. An Intel Xeon processor or Intel Xeon processor MP can be used for high-performance clusters when:
You need high-performance, 32-bit computing.
32 bits of addressable memory is adequate.
Price-performance is critical.
Applications contain mostly integer (versus floating-point) calculations.
The application code can benefit from streaming SIMD (i.e., it contains small, repetitive loops that operate on sequential arrays of integers).
Intel Xeon processor-based platforms offer cost-effective, high-performance computing for HPC applications. With the Intel Xeon processor’s Intel NetBurst microarchitecture, applications can take advantage of a large, 20-stage pipeline that provides increased throughput from instruction-level parallelism (ILP) by retiring up to three instructions per clock. For applications that depend on very large data sets and to shorten turnaround times, Intel Xeon processors MP in multiprocessor platforms provide an integrated three-level (iL3) on-chip cache for fast access to data. Most high-performance processors utilize only two levels of cache. With the Intel Xeon processor MP iL3 cache, more data can be stored closer to the execution units in the processor for faster access to needed data, resulting in higher system throughput and shorter turnaround times.
An Intel Itanium 2 processor-based cluster is recommended when:
The problem requires enormous floating-point performance.
The problem is very large and relies on very large data sets.
You need massive amounts of addressable memory (64-bit addressability).
The Intel Itanium processor provides two separate floating-point execution units for fast processing of large, complex numeric calculations. Each floating-point unit is capable of executing two calculations per clock. The processor is built around a 6.4 GB/s system bus, and it offers up to 3 MB on-die cache configurations that allow large sets of critical data to remain close to the execution units for a very fast path to memory. An Intel Itanium 2 processor is based on EPIC architecture, which speeds throughput from instruction-level parallelism and allows up to eight instructions to be retired per clock. Very sophisticated branch prediction and predication algorithms increase application efficiency and performance.