Show article without pictures

New quad core server CPUs: AMD Barcelona vs. Intel Harpertown

Which quad core server CPU wins the day?

Author: Koen Crijns

Introduction

Last week AMD finally, after months of postponing, introduced its quad core Opteron processors, better known by their codename Barcelona. However, AMD is not the only one with a new product; on the first day of the 2007 Fall IDF, Intel will be unveiling its second generation quad core Xeon server processors, which so far has been referred to as Harpertown. Both manufacturers promise better performance than ever before, and both emphasise especially the improvements made when it comes to performance-per-watt. For the past few weeks we have been testing a Dell pre-production Barcelona server and a Supermicro pre-production Harpertown server, in order to find out which of the competing CPU manufacturers has the best offer in the very popular dual socket server market.

xeonopteronlogos

The current situation

Before we go into the new processors' specifics, let us first have a look at the current market for this type of product. Last year Intel was the first to switch from dual to quad core CPUs in the server market, with the Xeon 5300 series, codenamed Clovertown. Here is a review of this processor. Just like Intel's current quad core CPUs for the desktop, Clovertown consists of two dual core processors, packaged together. These CPUs are manufactured using Intel's by now proven 65 nanometre process and available with clock frequencies up to 3.0 GHz. Clovertown CPUs come with a total of 8 MB L2 cache, with each dual core sharing 4 MB. This processor is based on the Core Micro-architecture, which is proving to be very successful for Intel in just about every part of the market ranging from notebooks to highend servers. The so called Bensley platform combines the Clovertown processors with Intel's 5000X chipset, which offers two separate 1333 MHz front side busses for the two CPUs in a dual processor server, and offers support for quad channel FB-DIMM memory running at 667 MHz. Intel chose FB-DIMM technology to guarantee high speed even when a very large amount of modules is used; this enables Intel to provide Clovertown servers with up to 64 GB of memory.

In just about every scenario Intel's Clovertowns, with their extra cores and speedy Core architecture, outmatch AMD's current offering, the second generation Opterons on Socket F. Even so there are plenty of applications where AMD comes out the winner. The secret weapon of the Opteron is still its integrated memory controller, which brings the benefits of much higher bandwidth and lower latencies than Intel manages to obtain. Any workload that demands fast memory access, such as database servers, is where AMD can really shine. Existing dual core Opteron CPUs codenamed Santa Rosa (not to be confused with Intel's identically named notebook platform), are by now available at clock speeds up to 3.2 GHz. As Intel is by no means resting on its laurels, it is high time for a successor...

AMD Barcelona

AMD released the specifications of its new quad core processor, codenamed Barcelona, quite a while ago. Now that the product has been released itself, there is little to add to the story, except all the fancy names AMD has given to the new technologies in the CPU. A short recap for those hazy on the details: Barcelona is a quad core processor, which means AMD has placed four cores in one chip. Within the new Opteron processors, each of these cores has 512 kB L2 cache. In addition to that, there is 2 MB l3 cache. The CPU is manufactured on a 65 nm process in AMD's plants in Dresden. It has about 463 million transistors.

amdbarcelonadie
The four cores in the Barcelona CPU are easily identifiable.

Aside from a lot of architectural improvements to heighten performance, Barcelona has a lot of tricks up its sleeve to lower power usage. For example, the clock frequency of all four cores can be independently varied (Independent Dynamic Core Technology), the cores and integrated memory controller work where possible at separate voltages that are kept as low as possible (Dual Dynamic Power Management) and inactive parts of the CPU are shut down when they are not needed (CoolCore technology).

Promises

At the introduction in Spain, Barcelona - where else - AMD made no promises that the new product would be the fastest of its kind. As it turns out, concessions had to be made with the clock speeds: while rumours spoke of 2.5 GHz for the new Opterons, the first CPUs to market are no faster than 2.0 GHz. A faster 2.5 GHz model is planned to be released before the year is over though. AMD does, however, intend to deliver the best performance per watt, not just by constraining the energy use of the processor, but also by opting for ordinary DDR2 memory rather than the power guzzling FB-DIMM technology.

Another focal point for AMD is 'investment protection': the new quad core Opteron models should work seamlessly in all existing Socket F servers, with only a BIOS upgrade as a condition for a smooth transition. AMD sees a large market for server upgrades, i.e. companies outfitting their existing dual core Opteron servers with new quad core CPUs. We approached several server manufacturers to ask their opinion on this strategy, but it would appear no one really buys into the idea of end users purchasing separate CPUs. The platform not changing with the new processors does have another positive consequence: it will be relatively easy for server manufacturers to bring a quad core Opteron offering to market, as they can use their existing product lines as a basis.

An important part of the idea of 'investment protection' is that the new Opterons will not only use the same platform, but also have the same demands when it comes to maximum power useage (TDP). With the current dual core Opterons available in varieties that need a maximum of 68, 95 or 120 Watt, exactly the same demands go for the new quad cores. As AMD puts it, you get two more cores without a rise in energy use or temperature.

Because a PC's energy consumption is becoming more and more important, AMD from now on will not speak of TDP values, as these are only of interest to server designers. The new magic abbreviation is ACP, meaning Average Compute Power, and as the name indicates this should tell you how highe the average power consumption is when the CPUs are fully stressed. For the 120 W TDP CPU, AMD figures on an ACP of 105 W. Similarly 95 W TDP will be about 75 W ACP and 68 W TDP come to 55 W ACP.

TDP ACP
68 W 55 W
95 W
75 W
120 W
105 W
Opteron models

As we mentioned, the first quad core Opterions are available at a maximum clock frequency of 2.0 GHz. Before the year is over we should see models running at 2.5 GHz. The new Opterons with clock frequencies of 1.7, 1.8 and 1.9 GHz are available in a HE (high efficiency) variant, meaning this CPU will have an ACP of 75 W (95W TDP). As a first for AMD, the company is introducing models not jus for 2-way servers but also for 4- and 8-way servers. Intel had nearly a year between the release of Xeon DP (Clovertown) and MP (Tigerton) models. In the table below, you can see all quad core Opterions AMD has introduced.

Model number
Clock frequency
Clock frequency mem. contr. ACP L2 cache L3 cache Max. CPU's / server
8350 2.0 GHz 1.8 GHz 75 W 4x 512 kB 2 MB 8
8347 1.9 GHz 1.6 GHz 75 W 4x 512 kB 2 MB 8
8347 HE 1.9 GHz 1.6 GHz 55 W 4x 512 kB 2 MB 8
8346 HE 1.8 GHz 1.6 GHz 55 W 4x 512 kB 2 MB 8
2350 2.0 GHz 1.8 GHz 75 W 4x 512 kB 2 MB 2
2347 1.9 GHz 1.6 GHz 75 W 4x 512 kB 2 MB 2
2347 HE 1.9 GHz 1.6 GHz 55 W 4x 512 kB 2 MB 2
2346 HE 1.8 GHz 1.6 GHz 55 W 4x 512 kB 2 MB 2
2344 HE 1.7 GHz 1.4 GHz 55 W 4x 512 kB 2 MB 2

amdopteronquadcorecpu_550

Partners

During the introduction event AMD was able to show configurations of several of its partners. Amongst others Sun, Dell, HP, Acer and IBM showed working system based on Barcelona, while Supermicro, Tyan and Unwide demonstrated suitable barebones. In other words AMD has a nice and complete list of partners, pretty much no big server manufacturer is missing from its customer list. The time till which the new servers are actually available varies from 'a few weeks' to 'a few months'. This is not surprising, as even if a new system does not require much more than a change of CPU, this market segment demands long and careful validation testing. Even so, before too long you should be able to order a Barcelona server from one of the parties mentioned.

Dell PowerEdge 2970 'Barcelona'

For this test Dell sent us a pre-production Barcelona server, based on their existing PowerEdge 2970. What it comes down to is Dell has done pretty much what we described on a previous page: switched out the dual core CPUs in an existing Opteron server for Barcelona models. As such it may well be this server will not be available exactly as we describe it, but at least it will give a good indication of what performance we can expect of the new quad core Opterons.

thumb

Our test server is kitted out with two AMD Opteron 2350 CPU's, running at 2.0 GHz. Dell supplied the machine with 4 GB registered DDR2-667 memory, but to make a better performance comparison with the other servers in this test, we added extra memory to raise the total amount to 8 GB.

The server is built on a Broadcom / ServerWorks HT-2100 / HT-1000 chipset. The mainboard used offers space for a maximum of eight memory modules and three PCI Express x8 cards. Besides that, the board has two Gigabit LAN connections, controlled through PCI Express by Broadcom BCM5708 chips. Dell delivered the server with a PERC 5/1 SAS RAID controller based on an LSI chip, with 256 MB memory as a buffer. Connected to this were four Fujitsu 36 GB 10k rpm 2,5" SAS disks. We had them configured in a RAID-10 array for our test.

thumb

There is little to criticise about Dell's server layout. Within the 2U chassis are eight removable SAS hard drive bays in the front. Directly behind those Dell has place four powerful, hot swappable fans, ensuring sufficent airflow over processors and memory. To the side of the mainboard we find space for two power supplies. The harddisk controller has found a spot at the front of the server.

thumb

We can't say much about pricing for this server, as currently the PowerEdge 2970 cannot be ordered yet with AMD's new quad core CPUs.

amdopteron2350_550
The Opteron 2350 processor as we found it in Dell's server.

Intel Harpertown

Like in the world of desktop CPUs, AMD and Intel are playing a cat and mouse game with server products as well. Just when one of the two has an important product introduction, the other will do anything to outdo the competitor. That is the way it went this time too; a week after the official introduction of AMD's Barcelona, Intel today announces its second generation quad core server processors. Mind you, these new CPUs codenamed Harpertown will not be available immediately, but they should come to market before the end of the year.

Harpertown is the first CPU from Intel's new Penryn product family to be manufactured on the brand new 45 nanometer production process. A while ago we discussed the most important aspects of these Penryn CPUs in a detailed article. The smaller transistors give Intel a few advantages; first of all the same surface should offer space to about double the amount of transistors as the 65 nanometer process did. More importantly, the new 45 nm transistors need about 30% less power, while they can switch about 20% faster. It should be clear that the new production process enables Intel to make its processors faster, more energy efficient and at the same time more complex.

intelxeonlga771

Althouh Harpertown and its little dual core brother Wolfdale are largely identical to the existing Clovertown and Woodscrest CPUs when it comes to architecture, Intel has - as described in the previously mentioned article - made a number of adjustments. We'll do a short reiteration: firstly the amount of L2 cache has been raised to 6 MB in the dual core and 12 MB in the quad core processors. Secondly new instructions were added named SSE 4.1. In April we wrote that SSE4 can nearly double videocompression speeds when programmers adjust their code to make use of these instructions. Other improvements are a fast execution unit for fast Radix-16 divisions, a new super shuffle engine and improved ways to save energy bundled with the name Enhanced Intel Dynamic Acceleration Technology. We discussed the technical background to these technologies in the previously mentioned article. Besides this, a number of the new processors has a faster frontside bus, 1600 MHz instead of the till now usual 1333 MHz.

One thing has not changed: in the new generation as well, quad core processors consist of two dual core CPUs packaged as one. A Harpertown CPU therefore is actually two Wolfdale chips.


A die shot of the dual core 45 nm Wolfdale. A Harpertown CPU is made up of two of these.

Stoakley platform

The new Xeon processors have exactly the same LGA771 socket as their predecessors. The power consumption has decreased rather than increased, even with higher clock frequencies. Theoretically the Harpertown chips should work without problems in systems now equipped with Clovertown quad core Xeons. In some cases that should be possible with the right BIOS update, but this is not a direction Intel prefers to take. While AMD makes a big story out of the unchanging platform, Intel is proud to announce it has developed a new platform for Harpertown. This platform, codenamed Stoakley, consists of on the one hand a new Harpertown CPU and on the other a new chipset with the codename Seaburg.

This Seaburg chipset has quite the list of improvements compared to Intel's current 5000X 'Blackford' chipset. The frontside bus speed has been increased from 1333 MHz to 1600 MHz. As both CPUS in a dual CPU server have their own FSB between themselves and the chipset, Seaburg offers a bandwidth in excess of 25 GB/s. The integrated memory controller has been improved as well; while the existing chispet supports FB-DIMM DDR2 memory at a maximum of 667 MHz, Seaburg supports speeds of up to 800 MHz. Because this is a quad channel memory controller, the maximum bandwidth between chipset and memory has also been raised to 25 GB/s. Besides that a variety of internal changes should counter latencies. The maximum amount of supported memory has been raised by Intel to 128 GB from 64 GB on the old platform. New as well is the improved Snoop filter, that monitors whether changes to data in either of the CPU's caches is updated consistently in all caches.

On the I/O front Seaburg has vast improvements as well. As an example the number of PCI Express lanes has been increased significantly and PCI Express 2.0 is now supported. Intel has also overhauled its I/O Acceleration Technology (IOAT): IOAT2 offers support for among other things 10 gigabit networks. Despite these improvements, Seaburg's power needs should be similar to its predecessor Blackford, according to Intel.

In the table below you can see the most important differences between the current and new chipset.

Chipset Blackford (Intel 5000X) Seaburg
Frontside bus 2x 1333 MHz 2x 1600 MHz
Bandwidth chipset <-> CPU's about 21 GB/s about 25 GB/s
Memory controller
Quad channel FB-DIMM Quad channel FB-DIMM
Klokfrequentie geheugen 667 MHz 800 MHz
Bandwidth chipset <-> memory about 21 GB/s about 25 GB/s
Max. memory capacity
64 GB 128 GB
Xeon models

As opposed to AMD's Barcelona models, the new quad core Xeons are not immediately available. Also the discussed Stoakley platform will be some time before it is available to Intel's partners. In total there will be eleven new Xeon models, based on the new technology. Most of these use a 1333 MHz FSB and should be compatible both with the new and the existing 5000-series chipset. The clock frequencies of the series ranges from 2.0 GHz to 3.17 Ghz. Most of the new Xeons have a maximum energy consumption (TDP) of 80 Watt. Only the top model, the Xeon X5460 which is really intended only for workstations, has a TDP of 120 Watt. That aside there will be two low voltage models, meant to be used in blades where a lot of processors have to operate with a relatively low amount of space. These energy efficient L540 and L5430 models have a TDP of 50 Watt. The table below has the entire range listed.

Model number
Clock frequency Frontside bus TDP L2-cache
E5405 2.0 GHz 1333 MHz 80 W 2x 6 MB
L5410 2.33 GHz 1333 MHz 50 W 2x 6 MB
E5410 2.33 GHz 1333 MHz 80 W 2x 6 MB
E5420 2.5 GHz 1333 MHz 80 W 2x 6 MB
L5430 2.67 GHz 1333 MHz 50 W 2x 6 MB
E5430 2.67 GHz 1333 MHz 80 W 2x 6 MB
E5440 2.83 GHz 1333 MHz 80 W 2x 6 MB
E5450 3.0 GHz 1333 MHz 80 W 2x 6 MB
X5460 3.17 GHz 1333 MHz 120 W 2x 6 MB
E5462 2.8 GHz 1600 MHz 80 W 2x 6 MB
E5472 3.0 GHz 1600 MHz 80 W 2x 6 MB

Supposedly the models with a 1333 MHz frontside bus will be available on 11 November. An exact date for the 1600 MHz models and the actual release of the new Seaburg chipset are as yet unknown to us.

SuperMicro Stoakley Server

To measure performance of the new Intel Xeon processors, Intel supplied us with a pre-production server based on a barebone manufactured by SuperMicro. This 2U server is built on the discussed Seaburg chipset and contains two Intel Xeon E5472 Harpertown CPUs, running at 3.0 GHz with a 1600 MHz frontside bus.

thumb

Intel supplied the server with 16 GB DDR2-800 FB-DIMM memory, manufactured by Nanya, in the shape of eight 2 GB modules. The server has a total of sixteen memory slots, half of which are taken. The SuperMicro X7DWN+ mainboard offers space for four PCI-Express x8 cards, one PCI-Express x4 card and two PCI-X expansion boards. Intel opted for a SAS RAID controller as well, the Adaptec ASR-3805. Connected to this are two Seagate Cheetah 15k.5 146 GB harddisks in a fast RAID 0 array. The front panel of the server accomodates a maximum of eight hotswappable drives.

thumb

Around the processor and memory modules, SuperMicro placed a large airduct which directs airflow from four high capacity fans in the front to all components that will heat up. The layout of this server is excellent and offers a great basis for local server builders. Again we cannot give a price for this machine, as it is not yet available.

thumb

Test

To find out which of the two new server CPUs offers the best realworld performance and, perhaps more importantly, the best performance per Watt, we subjected the two servers we described to a large number of server oriented benchmarks. In order to compare the results with products currently on the market, we added another couple of servers to the test.

The Dell PowerEdge 2950 is based on two Intel Xeon X5355 quad core processors from the Clovertown generation, running at 2.67 GHz with a 1333 MHz frontside bus. This server uses the Intel 5000X 'Blackford' chipset. It has 8 GB DDR2-667 FB-DIMM memory and a RAID 5 array of three Fujitsu Allegro MAX3073RC harddisks.

The HP Proliant DL385 is based on two AMD Opteron 2218 dual core processors, running at 2.6 GHz. This machine uses the same chipset as the Barcelona server on test, namely the Broadcom HT-2100. Memory consists of 8 GB DDR2-667. In this machine we find a RAID 0 array of two Seagate Savvio ST936701SS SAS disks.

We installed Microsoft Windows Server 2003 x64 R2 Enterprise Edition as operating system on all servers. We selected benchmarks that are specifically intended to demonstrate server workloads, as opposed to workstation workloads. Power consumption was measured with a calibrated EMU 1.58x meter.

You will have noticed that the chosen storage solution is completely different from one test system to the other. However the benchmarks we picked do not scale with harddisk performance, hence this does not affect the final results.

barcelonaclovertownharpertowncores_550
The three tested types of quad core processors next to one another.

SunGuard AA

SunGard Adaptiv Analytics is a benchmark based on the in the financial world much used software by SunGard. The benchmark we used is a stripped version of the full package. The software calculates the future value of a fictitious stock portfolio based on the frequently used Monte Carlo algorythm. The results of this benchmark are a measure of the performance for servers in the way they would be used by large financial institutions for complex calculations.

graph

It immediately becomes clear it was high time for AMD to switch from dual core to quad core technology. Instead of the 582 seconds the AMD Opteron 2218 equipped server needs, the Dell machine with AMD Opteron 2350 quad core CPUs needs only 367 seconds. Intel however leads the game, the Clovertown based server does the calculation in 252 seconds, the 45 nm Harpertown is significantly faster with a mere 220 seconds.

Another benchmark from the same sector is Black Scholes, also widely used in the financial sector. Due to licensing our benchmark only works on Intel processors and thus can only serve to illustrate once more that Harpertown means a measurable improvement for Intel.

graph

Benchmark: Caselab Euler3D

The Caselab Euler3D benchmark is based on the identically named software used to calculate fluid dynamics. As the benchmark's maker describes it:

"The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes. The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes. The benchmark executable advances the Mach 0.50 AGARD flow solution."

Fluid dynamics is one of the oft needed applications for high performance computing, where servers can't be fast enough. The graph below shows the benchmark results.

graph

It is remarkable to see how much faster AMD's Barcelona is compared to the Opteron server. Nevertheless even Barcelona cannot keep up with the Clovertown server, let alone the Harpertown based machine. That one succeeds in finishing the workload in about 25% less time than its Clovertown predecessor.

FlamMap FSPRO Benchmark

Another benchmark from the high performance computing category is FlamMap FSPRO. This software is used to calculate spread of forest fires, another task that needs plenty of computing power. The special test version calculates a relatively small workload. The graph below has the results:

graph

As yet we have not found an explanation, but for some reason the Barcelona server is outperformed by the dual core Opteron machine. Both of the Intel servers perform better and again Harpertown shows significant improvement over Clovertown. Calculating the workload takes the SuperMicro machine 347 seconds, which is 27% faster than the Clovertown server and 41% faster than the Barcelona one.

Half-Life 2 Build Map benchmark

Don't worry, we are not using the game Half Life 2 to benchmark a server. However we are using a derivative: Valve, makers of Half Life 2, sent us a benchmark based on the code used to compile levels for the popular game, based on all needed data such as blueprints, textures, lighting data and so on. Valve itself has a rack of servers dedicated to the task of these workloads. This benchmark does the calculations needed for a very small level.

graph

Again it is Intel with the best scores. The dual core Opteron server completes the task in 213 seconds. The quad core Opteron is quite a bit faster with 121 seconds. However Intel's Clovertown goes through the workload in 86 seconds and the Harpertown based machine is done and resting again in 72 seconds.

Benchmark: Cinebench 9.5 and 10

The Cinebench 9.5 and 10 benchmarks are of course not intended for servers, but they are used on workstations. As these benchmarks are fully multi-threaded, they give an excellent impression of CPU performance. Both versions of the benchmark are based on the Maxon Cinema4D 3D rendering software, with the newest version 10 having quite the more complex workload of the two.

graph

graph

Naturally the version 10 test results are most interesting. The dual core Opteron machine manages to score 7783 points. The Barcelona equipped server is with a 13002 point score about 67% faster. Nevertheless in this task as well Intel remains unbeaten. The current Clovertown quad cores manage to reach 18622 points, the new 45 nm generation surpasses this with about 18,5% and reaches 22072 points.

Benchmark: Povray 3.7b21

From the same sector we have the 3D rendering software Povray as a benchmark, where we render the images specifically intended for benchmarking with a resolution of 1024x768 pixels. Version 3.7 of Povray is fully multithreaded, resulting in excellent scores for 8 core servers.

graph

Anyone upgrading from dual core AMD Opteron to quad cores for this type of task can expect huge performance improvements. The dual core machine needs 688 seconds, Barcelona is done in 445 seconds. Intel however is even faster, with the image completed by Clovertown in 302 seconds and by Harpertown in 265 seconds.

Spec CPU2006 - Integer

Widely respected as one of the best server CPU benchmarks is Spec CPU2006. This benchmark consists of 29 subtests, which can be divided in two categories. About half of the benchmarks makes mostly use of integer calculations, while the other half is mainly based on floating point operations. The series of benchmarks is provided as source code and must be compiled by the tester. We used Microsoft Visual Studio 2005 for this task, where we used Intel's own compiler suite version 10.0 for the Intel platform and special compilers for AMD by PGI for the AMD servers. We compiled all tests with the standard optimalisations, or 'base' in Spec terminology. In the graphs you will find the results of so called 'rate tests', which run different parts of the test as many times as there are cores in a system to find out what the total available processing power is. Some results are missing, because we didn't manage to compile these test segments correctly, with time being insufficient to solve these issues. The online database of Spec of course has much better scores for the various benchmarks we ran, as the manufacturers can use a plethora of compiler tricks to get much better optimised versions of the Spec benchmark. So keep in mind that our results are based on out of the box compiling, with equal effort invested in both AMD and Intel versions.

The table below has a short description of the tests that are part of the integer part of Spec CPU2006.

Benchmark Description
400.perlbench PERL programming language
401.bzip2 Data compressions with Bzip2 algorythm
403.gcc Compiler for programming language C
429.mcf Based on MCF, a program to optimise schedules for public transport
445.gobmk Artificial intelligence, based on the game Go
456.hmmer Searching through a database filled with DNA information
458.sjeng Artifical intelligence, based on chess game Sjeng
462.libquantum Quantum mechanics calculations
464.h2164ref Video compression based on the H.264 codec
471.omnetpp Simulation of low level traffic in an ethernet network
473.astar Path finding calculation, based on game AI
483.xalancbmk XML processing; converting XML to HTML, text or different XML types.

The below graphs show the results:

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

The results speak for themselves: in nearly all cases Intel's Harpertown system has much better performance than AMD's Barcelona. At the same time we can conclude Barcelona is a big step in the right direction compared to AMD's previous generation of CPUs.

Spec CPU2006 - Floating Point

The second set of benchmarks that are part of the Spec CPU2006 test are mostly based on floating point calculations. Many of the benchmark parts are directly derived from oft used HPC software. As such the FP part of Spec is generally viewed as a very good indication of HPC server performance. The table below lists the benchmark parts and gives a short description.

Benchmark Beschrijving
410.bwaves Compuational fluid dynamics
416.gamess Quantum chemical computations
433.milc Physics / Quantum Chromodynamics
434.zeusmp Physics / Magnetohydrodynamics
435.gromacs Chemistry / Molecular Dynamics
436.cactusADM Physics / General Relatibity
437.leslie3d Compuational fluid dynamics
444.namd Scientific, Structural Biology, Classical Molecular Dynamics Simulation
447.dealII Solution of partial diggerential equations using the adaptive finite element method
450.soplex Simplex linear program solver
453.povray Ray tracing
454.calculix Structural Mechanics
459.GemsFDTD Computational Electromagnetics
465.tonto Quantum Crystallography
470.lbm Compuational fluid dynamics
481.wrf Weather forecasting
482.sphinx3 Speech recognition

And these are the results in the graphs below:

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

graph

In the various Spec CPU2006 FP benchmarks Intel's Harpertown and AMD's Barcelona nearly alternatingly come out on top. That is less suprising than it seems, as both architectures are well matched when it comes to FP processing. By now it should however be clear that when it comes to pure processing pwoer, AMD must yield to Intel. However in many HPC applications the speed of communication between CPU and memory is as important as pure processing speed, as these applications work with very large datasets that have to be stored and retrieved from memory over and over again. With its integrated memory controller, AMD can offer better memory performance, resulting in better scores in several of the Spec FP benchmarks.

Benchmark: Rightmark Memory Analyzer - Multi-threaded Memory Test

The actual bandwidth between processor and memory we tested using the multi-threaded memory test of Rightmark Memory Analyzer. To this end we performed read- or write actions on the memory in blocks of 64MB with eight threads (four on the dual core machine), to ensure we weren't just measuring the cache memory performance.

graph

graph

The graphs show two things: firstly AMD is still king of memory performance. Small wonder, Intel has to coordinate all memory operations through the chipset, AMD has the controller embedded in the CPU. Secondly Barcelona's controller is a lot more efficient than the one in the previous generation of Opterons. A continuous read performance of 15 GB/s can only be described as very impressive.

Benchmarks: power consumption

Both AMD and Intel suggest their new generation of processors offers the best "performance per Watt". With a power meter at hand we try to find out which of the two competitors most lives up to this claim.

A few important remarks before you look at the graphs below. The two AMD servers and the Dell PowerEdge 2950 with Intel Clovertown each have 8 GB of memory on board, consisting of 8 modules of each 1 GB. The SuperMicro server with Intel Harpertown has 16 GB with 8 modules of each 2 GB. We decided put the tests like this in the graphs, as the power consumption of a server mostly scales with the number of modules rather than the total memory capacity.

Besides that do take into account we measured power consumption of four completely different servers. At most it will give an impression of the energy useage of the different platforms, but not more than that.

We measured power consumption at three different moments, with the server idle, with 100% CPU-load (taken care of by the Sungard AA benchmark) and with 100% memory load (while running the Rightmark Memory Analyzer benchmark).

graph

Idle the Barcelona server with 236 Watt power consumption is clearly the most efficient. Second place goes to AMD's dual core machine, with 265 Watt. Both Intel servers are a bit less efficient, the Harpertown least of all with 273 Watt.

graph

With the CPU's fully loaded it is again the Barcelona server using least power, we did not measure more than 301 Watt consumption. Second place is for the SuperMicro machine based on the 45 nm Harpertown processors, which use significantly more power with 364 Watt. Least efficient of all is Dell's Clovertown server with no less than 412 Watt.

In absolute terms, AMD clearly wins here; the Barcelona server uses 17,3% less energy than the Harpertown server. On the other hand, the Harpertown completed the Sungard AA benchmark in 39,8% less time. Looking at the performance per watt graph, Intel is the clear winner.

graph

Measuring the energy consumption under memory load, Barcelona is again the most efficient server with an average power consumption of 305 Watt. Intel's Harpertown server we measure as using 349 Watt.

Conclusion

With a week in between, AMD and Intel introduced their newest generation of server CPUs. AMD finally makes the transition to quad core, while Intel has the switch to 45 nanometer production as the biggest claim to fame. Both promise much better performance than we have seen so far and both claim to have the best performance per watt. AMD is proud to keep the platform unchanged, Intel to have a new one that is more versatile and faster.

xeonopteronlogos

Looking purely at benchmarks, we can only conclude AMD must yield to Intel. In by far most of the benchmarks the SuperMicro server with Intel Xeon E5472 Harpertown CPUs has the best scores, and with a significant margin. For an allround server, Intel would seem to be the brand of choice for the moment. Nevertheless AMD does win in a number of benchmarks, especially a number of the FP part of Spec CPU2006. For HPC applications where memory bandwidth is as important as processing performance, a quad core Opteron server may well be the best solution. However we are obviously discussing niche markets here.

Then, claims on the performance per watt. AMD is completely right about the new quad core Opteron being a very efficient CPU. During every one of our tests, the Dell server with a total of eight cores used less energy than the much slower HP server with only four cores. AMD's claim that Barcelona gives you two extra cores without having to invest in a higher energy bill is therefore well founded. In a similar vein, the Dell server with Barcelona processors is more energy efficient in all applications than the SuperMicro server with Intel Harpertown chips. That said, the average performance difference is bigger than the average power consumption difference, so we can only conclude that Intel offers the best performance per watt. Compared to Clovertown it would be an easy win for AMD, but to beat Harpertown AMD will have to show quite a bit more performance.

Finally the claims regarding the platforms. It is commendable that the new Opteron CPUs work so well with the existing Opteron servers, albeit with a BIOS upgrade. However we are skeptical if this will result in a wave of server upgrades. Using a proven platform does ensure a lot of partners can offer Barcelona based products before too long. Another benefit is the continued use of existing software installations, as the platform remains the same. If you intend to use Intel's new processors to best effect, you can't avoid buying new servers. The fast Seaburg chipset and the new DD2-800 FB-DIMM memory are an important part of the improved performance. Upgrading just isn't that good an idea here. To use the extra performance of Harpertown, you will need to invest in both a new server and a new software installation. But then, that is usually the case anyway.

What can we conclude? Firstly, AMD's Barcelona is a big step in the right direction compared to the dual core processors. Truth be told, the new server CPU is just late, too late. If it had been released at the start of the year and at higher clock frequencies, our conclusion would have been a lot more positive. Still we remain optimistic, with a new stepping and a bit of luck AMD may well be able to make a big increase in clock frequencies - and for AMD too the transition to 45 nanometer is only a matter of time. At the moment Intel has the better offer. The new quad core Xeons built on the 45 nanometer process are faster and more energy efficient than both their predecessors, as well as the competing Barcelona CPU. And that's what it is about in the end.

In addition: what does this tell us about future desktop CPUs?

Many of you will wonder about it now, what do these server test results tell us about future desktop processors? AMD is of course releasing Phenom by the end of this year, the quad core CPU for desktop PCs based on Barcelona. Intel too will bring the 45 nm technology to its Core 2 Duo desktop CPUs, where we will see a change similar to the Clovertown - Harpertown developments.

To start with Phenom, this processor is indeed based on the same chips as the quad core Opterons. Still there are aamdphenomfxlogosmall number of differences. Firstly the clock frequencies will be a lot higher for Phenom, AMD is referring to 3 GHz and higher. That is one and a half times the clock frequency of the Opterons we tested here. Next to that there are large differences in the memory controller, the quad core Opteron version is made for registered DDR2-667, the one in Phenom will support DDR2-1066. The last part may be obvious, but still: a server platform is not optimised to run games or benchmarks like 3DMark, a desktop board is. In other words: Barcelona's performance as shown here for the server market only gives the vaguest of impressions of what we can expect of Phenom. We will have to wait for the eventual introduction to see what AMD will do in the battle against the Core 2 Duo.

The same goes for Intel: although the switch to 45 nm will have the same consequences for the desktop as for servers - higher clock speeds, more cache, SSE4.1, etc. - again we have to conclude servers are quite a different kettle of fish than desktop PC's, so we can't base any kind of serious statement on Intel's future desktop CPU's on Harpertown.

The switch from Athlon X2 to Phenom X4 will truly be a revolution for AMD. The transition from the current generation of Core 2 Duos to a new generation is more a matter of evolution. Also, we should not forget Intel currently has pulled ahead quite a bit. Whether Phenom will be fast enough to keep up with Yorkfield is the question that keeps us all occupied.... we will know before the end of the year.

Addendum: comparison table of tested servers

You can compare the specifications of the tested server in the table below.

General
   
Brand   Dell Dell HP SuperMicro
Product name   PowerEdge 2950 PowerEdge 2970 Barcelona Proliant DL385 G2 2218 4G HPM Stoakley Server
Product code   S3E9SY1600PB
Tested by   Koen Crijns Koen Crijns Koen Crijns Koen Crijns
Datum test  
Details   Details Details Details Details
Processor
Processor type   Intel Xeon X5355 AMD Opteron 2350 AMD Opteron 2218 Intel Xeon E5472
Aantal processors   2 2 2 2
Aantal cores per processor   4 4 2 4
Totaal aantal cores   8 8 4 8
Klokfrequentie   2.67 GHz 2.0 GHz 2.6 GHz 3.0 GHz
Frontside bus   1333 MHz 1000 MHz 1000 MHz 1600 MHz
L1 cache per CPU   256 kB 512 kB 256 kB 256 kB
L2 cache per CPU   8 MB 2 MB 2 MB 12 MB
L3 cache per CPU   2 MB
Geheugen
Totale geheugencapaciteit   8 GB 8 GB 4 GB 16 GB
Aantal modules   8 4 8
Type modules   DDR2 FB-DIMM DDR2 DDR2 DDR2 FB-DIMM
Rating   667 MHz 667 MHz 667 MHz 800 MHz
Merk + Model   Samsung PC2-5300P-555-12-H3 Elpida EBE10AD4AGFA-6E-E Nanya NT2GT72U4NB1BD-2C
Moederbord
Merk + Model   Dell 0NR282 Dell HP SuperMicro X7DWN+
North bridge   Intel 5000X Broadcom HT-2100 Serverworks HT-2100 Intel Seaburg
South bridge   Intel ESB Broadcom HT-1000 Serverworks HT-1000 Intel ESB
Processor sockets   2 2 2 2
DIMM slots   8 8 8 16
PCI-Express x8   2 3 4 4
PCI-Express x4   1 0 1 1
PCI-X   0 0 0 2
Integrated video   ATI ES1000 ATI ES1000 ATI ES1000 ATI ES1000
LAN
LAN 1   Broadcom BCM5708 Broadcom BCM5708 Broadcom BCM5708 Intel Zoar
LAN 1 - Snelheid   1000 Mbit/s 1000 Mbit/s 1000 Mbit/s 1000 Mbit/s
LAN 1 - Type controller   PCI-Express PCI-Express PCI-Express PCI-Express
LAN 2   Broadcom BCM5708 Broadcom BCM5708 Broadcom BCM5708 Intel Zoar
LAN 2 - Snelheid   1000 Mbit/s 1000 Mbit/s 1000 Mbit/s 1000 Mbit/s
LAN 2 - Type controller   PCI-Express PCI-Express PCI-Express PCI-Express
Storage
Gebruikte controller   Dell PERC 5/i Dell PERC 5/i HP Smart Array P400/512MB BBWC Adaptec ASR-3805
Cache op controller   256 MB 256 MB 512 MB 128 MB
Aantal harddisks   3 4 2 2
Max. aantal harddisks   6 8 8 8
Type harddisks   Fujitsu Allegro MAX3073RC Fujitsu MAY2036RC Seagate Savvio ST936701SS Seagate Cheetah 15k.5 146 GB
Capaciteit harddisks   73 GB 36 GB 36 GB 146 GB
Toerental harddisks   15000 rpm 10000 rpm 10000 rpm 15000 rpm
Cache harddisks   16 MB 16 MB 8 MB 16 MB
Interface   SAS SAS SAS SAS
RAID level   5 10 0 0
Harddisks hotswappable  
Behuizing
Merk + Model   Dell Poweredge Dell Poweredge HP Proliant SuperMicro
Type behuizing   Rackmount Rackmount Rackmount Rackmount
Rack hoogte   2 U 2 U 2 U 2 U
Fans   4 4 12 4
Hotswappable fans  
Voeding
Aantal voedingen   2 1 2 1
Merk + Model   Dell NPS-750BB Dell N750P-S0 HP DPS-800GB Ablecom PWS-702A-1R
Vermogen   750 W 750 W 800 W 700 W
Overige
Optische drive   Slimline DVD-ROM Slimline DVD-ROM Slimline DVD-ROM Slimline DVD-ROM
Fibre Channel controller  
Remote Management Module   Integrated Lights Out 2
Benchmarks - Spec CPU 2006
Gebruikte compiler suite voor SPEC CPU 2006   Intel Compiler v. 10.0 PGI 7.0.7 PGI Compiler Suite v. 6.2 Intel Compiler v. 10.0

Back to article list / Back to frontpage

Country:   /  Version: