# Intel Core i7 Sandy Bridge-E 3960X LGA-2011



#### Corso di Architetture e Progetto di Sistemi e Servizi Informatici



Cicero - Ridolfi - Siracusa

### Roadmap

- Overview
- What's New
- Inside the Architecture
- Intel's Technologies
- Performance

Overview



#### Intel<sup>®</sup> Core<sup>™</sup> i7-3960X processor Extreme Edition Summary of Product Features



- 6 Cores, 12 Threads
- Intel<sup>®</sup> Turbo Boost Technology 2.0
- Intel<sup>®</sup> Hyper-Threading Technology
- Supports LGA 2011 socket Intel<sup>®</sup> X79 Express Chipset-based motherboards
- Up to 15 MB Intel<sup>®</sup> Smart Cache
- Integrated Memory Controller
  - 4 channels of DDR3 1600 MHz, 1DPC
- Intel<sup>®</sup> AVX and AES
- 40 PCI Express<sup>\*1</sup> Lanes
- SSE4.1 & SSE4.2 Instructions

<sup>1</sup> Intel believes that some PCIe devices may be able to achieve the 8GT/s PCIe transfer rate on the X79 Express Chipset based platform.

\*Other names and brands may be claimed as the property of others

(intel)

Copyright\* 2011 Intel Corporation. All rights reserved. Under embargo until 12:01am PT November 14, 2011

Copyright\* 2011 Intel Corporation. All rights reserved. Under embargo until 12:01am PT November 14, 2011

## **Other Specs**

- 2,27 billion transistors
- 3,30 GHz core clock speed
- 3,90 GHz TB core clock speed
- 6x64 KB L1 cache
- 6x256 KB L2 cache
- 130 W TDP





Intel<sup>®</sup> X / 9 Express Unipset Block Diagram

## What's New

## **Core Block Diagram**



#### **Front End Microarchitecture**



#### **Instruction Decode in Processor Core**

- 32 Kilo-byte 8-way Associative ICache
- 4 Decoders, up to 4 instructions / cycle
- Micro-Fusion
  - Bundle multiple instruction events into a single "Uops"
- Macro-Fusion
  - Fuse instruction pairs into a complex "Uop"
- Decode Pipeline supports 16 bytes per cycle

#### **New: Decoded Uop Cache**



#### Add a Decoded Uop Cache

- An L0 Instruction Cache for Uops instead of Instruction Bytes
  - $\sim$ 80% hit rate for most applications
- Higher Instruction Bandwidth and Lower Latency
  - Decoded Uop Cache can represent 32-byte / cycle
    - More Cycles sustaining 4 instruction/cycle
  - Able to `stitch' across taken branches in the control flow

#### **New Branch Prediction Unit**



#### Do a 'Ground Up' Rebuild of Branch Predictor

- Twice as many targets
- Much more effective storage for history
- Much longer history for data dependent behaviors



#### **Execution Cluster – A Look Inside**

#### Scheduler sees matrix:

•3 "ports" to 3 "stacks" of execution units

•General Purpose Integer

- SIMD (Vector) Integer
- SIMD Floating
   Point

•The challenge is to double the output of one of these stacks in a manner that is invisible to the others

|        | ALU | VI MUL     |   | FP MUL  |
|--------|-----|------------|---|---------|
| Port 0 |     | VI Shuffle |   | Blend   |
|        |     |            |   | DIV     |
|        | GPR | SIMD INT   | S | IMD FP  |
|        | ALU | VI ADD     |   | FP ADD  |
| Port 1 |     | VI Shuffle |   |         |
|        |     |            |   |         |
|        |     |            |   |         |
|        | ALU |            |   | FP Shuf |
| Port 5 | JMP |            |   | FP Bool |
|        |     |            |   | Blend   |
|        |     |            |   |         |

#### **Execution Cluster**

#### Solution:

- Repurpose existing datapaths to dual-use
- SIMD integer and legacy
   SIMD FP use legacy stack style
- Intel<sup>®</sup> AVX utilizes *both* 128-bit execution stacks

10



"Cool" Implementation of Intel AVX 256-bit Multiply + 256-bit ADD + 256-bit Load per clock... Double your FLOPs with great energy efficiency







- Solution : Dual-Use the existing connections
  - Make load/store pipes symmetric
- Memory Unit services three data accesses per cycle
  - 2 read requests of up to 16 bytes AND 1 store of up to 16 bytes
  - Internal sequencer deals with queued requests

Second Load Port is one of highest performance features Required to keep Intel<sup>®</sup> Advanced Vector Extensions (Intel<sup>®</sup> AVX) Instruction Set fed linear power/performance means its "Cool"



## Intel AVX

- New 256-bit instruction set extension to Intel Streaming SIMD Extensions (Intel SSE)
- Released as part of the Intel microarchitecture code name Sandy Bridge
- Can give great computation power to boost applications

## Applications

- Suitable for floating point-intensive calculations in multimedia, scientific and financial applications
- Increases parallelism and throughput in floating point SIMD calculations
- Reduces register load due to the nondestructive instructions.

### SSE Vs AVX





#### **8 times faster!**



# **Original C Implementation**

```
for (int j=0 ; j<firHalfLength; j++) // firHalfLength is 1023
```

```
dFirCoefs = pFIRBuf[j];
```

```
accl += pDllBuf[lFirIndex]*dFirCoefs; //accl is accumulator for Index
acc2 += pDllBuf[lFirIndexRev]*dFirCoefs; //acc2 is accumulator for IndexRev
lFirIndex =(lFirIndex-1)&lMask; //dec backward index (modulo operation)
lFirIndexRev = (lFirIndexRev+1)&lMask;
```

}

### Intel SSE 128-bit Implementation

```
m128d DllVal, FIRCoef, mulVal;
for (int i = 0; i < firHalfLength; i+= 2) //Operate on 2 elements at a time
       FIRCoef = mm load pd(pFIRBuf+i);
       //accl
        DllVal = mm load pd(pDllBuf+lFIRIndexRev);
       mulVal = mm mul pd(FIRCoef, DllVal);
        acc1 = mm add pd(acc1, mulVal);
        //acc2
        DllVal = mm load pd(pDllBuf+lFIRIndex);
        DllVal = mm shuffle pd(DllVal, DllVal, 0x1);
        mulVal = mm mul pd(FIRCoef, DllVal);
        acc2 = mm add pd(acc2, mulVal);
        lFIRIndex -= 2:
        lFIRIndex = (lFIRIndex & lMask);
        lFIRIndexRev += 2;
        lFIRIndexRev = (lFIRIndexRev & lMask);
```

### Intel AVX Implementation

```
m256d DllVal, FIRCoef, mulVal;
 m128d tmph,tmpl,tmplsh,tmphsh;
for (int i = 0; i < firHalfLength; i+=4) //Operate on 4 elements at a time
       FIRCoef = mm256 load pd(pFIRBuf+i);
       //accl
       DllVal = mm256 load pd(pDllBuf+lFIRIndexRev);
       mulVal = mm256 mul pd(FIRCoef, DllVal);
        acc1 = mm256 add pd(acc1, mulVal);
       //acc2
       DllVal = mm256 load pd(pDllBuf+lFIRIndex);
       DllVal = mm256 permute2f128 pd (DllVal,DllVal ,0x1); // Cross lane shuffle
       DllVal = mm256 permute pd(DllVal, 0x5);
       mulVal = mm256 mul pd(FIRCoef, DllVal);
        acc2 = mm256 add pd(acc2, mulVal);
       lFIRIndex -= 4:
       lFIRIndex = (lFIRIndex & lMask);
       lFIRIndexRev += 4;
       lFIRIndexRev = (lFIRIndexRev & lMask);
```

### **Execution Speed Comparison**



#### **Key Intel® AVX Features**

BENEFITS

#### **KEY FEATURES**

#### Wider Vectors Up to 2x peak floating point operations per second (FLOPs) Increased from 128 to 256 bit output with good power efficiency - Two 128-bit load ports Enhanced Data Rearrangement Organize, access and pull only Use the new 256 bit primitives to broadcast, mask loads and necessary data more quickly and efficiently permute data Three and four Operands Fewer register copies, better Non Destructive Syntax for both 128 bit and 265 bit Intel register use for both vector and scalar code AVX instructions More opportunities to fuse load and Flexible unaligned memory access support compute operations Extensible new opcode (VEX) Code size reduction Intel<sup>®</sup> AVX is a general purpose architecture.

# Inside the Architecture

- Basic Execution Environment
- Protection
- Multiple-Processor Management
- Memory Cache Control
- Power and Thermal Management

Basic Execution Environment

### Modes of Operation



### Resources

- Basic Program Execution Registers
- Address Space
- FPU Registers
- MMX Registers
- XMM Registers
- Stack

### **Additional Resources**

- I/O ports
- Control Registers
- Memory Management Registers
- Debug Registers
- Memory Type Range Registers (MTRRs)
- Machine Specific Registers (MSRs)
- Machine Check Registers
- Performance Monitoring Counters

## Protection

- Operates at both the segment level and the page level
- Four privilege levels for segments
- Two privilege levels for pages
- Any violation results in an exception
- No performance penalty

### **Protection Checks**

- Limit Checks
- Type Checks
- Privilege Level Checks
- Restriction of Addressable Domain
- Restriction of Procedure Entry-Points
- Restriction of Instruction Set

### Multi-Processor Management

## Goals

- Maintain system memory coherency
- Maintain cache consistency
- Allow predictable ordering of writes to memory
- Distribute interrupt handling among a group of processors.
- Increase system performance by exploiting the multi-threaded and multiprocess nature of contemporary operating systems and applications.

# How

- Bus locking and/or cache coherency management
- Serializing instrunctions
- An advance programmable interrupt controller (APIC)
- Intel Hyper-Threading Technology
- A second-level cache (level 2, L2)
- A third-level cache (level 3, L3)

### Mechanisms for Locked Atomic Operations

- Guaranteed atomic operations
- Bus locking, using the LOCK# signal and the LOCK instruction prefix
- Cache coherency protocols that ensure that atomic operations can be carried out on cached data structures (cache lock)

# Automatic Locking

- When executing an XCHG instruction that references memory.
- When setting the B (busy) flag of a TSS descriptor
- When updating page-directory and pagetable entries
- Acknowledging interrupts

# Serializing Instructions

- Force the processor to complete all modifications to flags, registers, and memory by previous instructions and to drain all buffered writes to memory before the next instruction is fetched and executed
- Privileged serializing instructions INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control register, with the exception of MOV CR82), MOV(to debug register), WBINVD, and WRMSR3.
- Non-privileged serializing instructions CPUID, IRET, and RSM.

## **Multiprocessor Initialization**

- Supports controlled booting of multiple processors without requiring dedicated system hardware.
- Allows hardware to initiate the booting of a system without the need for a dedicated signal or a predefined boot processor.
- Allows all IA-32 processors to be booted in the same manner, including those supporting Intel Hyper-Threading Technology.



#### Bootstrap processor (BSP)

- The BSP flag is set in the IA32\_APIC\_BASE MSR of the BSP.
- the BSP then begins executing the operating-system initialization code

#### Application Processors (APs)

- This flag is cleared for all other processors.
- wait for a startup signal (a SIPI message) from the BSP processor. Upon receiving a SIPI message, an AP executes the BIOS AP configuration code, which ends with the AP being placed in halt state.

### Management of Idle and Blocked Conditions

- HLT instruction
- PAUSE instruction
- MONITOR/MWAIT instruction

# Memory Cache Control

# Methods of Caching

- Strong Uncacheable (UC)
- Write Combining (WC)
- Uncacheable (UC-)
- Write Trough (WT)
- Write Back (WB)
- Write Protected (WP)

# **Cache Control Protocol**

| Cache Line State                            | M (Modified)                      | E (Exclusive)                     | S (Shared)                                                                | l (Invalid)                      |
|---------------------------------------------|-----------------------------------|-----------------------------------|---------------------------------------------------------------------------|----------------------------------|
| This cache line is valid?                   | Yes                               | Yes                               | Yes                                                                       | No                               |
| The memory copy is                          | Out of date                       | Valid                             | Valid                                                                     | _                                |
| Copies exist in caches of other processors? | No                                | No                                | Maybe                                                                     | Maybe                            |
| A write to this line                        | Does not go to<br>the system bus. | Does not go to<br>the system bus. | Causes the<br>processor to gain<br>exclusive<br>ownership of the<br>line. | Goes directly to the system bus. |

## **MESI** Protocol

- Upon loading:
  - A line is marked "E"
  - Subsequent read OK
  - Write marks "M"
- If another reads an "M" line
  - Write it back
  - Mark it "S"
- Write to an "S", send "I" to all, mark "M"
- Read/write to an "I" misses

# Power and Thermal Management

# ACPI

- Industrial open standard
- Provides methods for hardware's low level control
- Defines performance state that are used to facilitate system software's ability to manage processor power consumption
- Needs compatible hardware

# ACPI System State

| State      | Description                                                                               |  |  |
|------------|-------------------------------------------------------------------------------------------|--|--|
| G0/S0      | Full On                                                                                   |  |  |
| G1/S3-Cold | Suspend-to-RAM (STR). Context saved to memory (S3-Hot is not supported by the processor). |  |  |
| G1/S4      | Suspend-to-Disk (STD). All power lost (except wakeup on PCH).                             |  |  |
| G2/S5      | Soft off. All power lost (except wakeup on PCH). Total reboot.                            |  |  |
| G3         | Mechanical off. All power removed from system.                                            |  |  |

# Core C-State

| Core C-State | Global Clock | PLL | L1/L2 Cache    | Core VCC          | Context        |
|--------------|--------------|-----|----------------|-------------------|----------------|
| CC0          | Running      | On  | Coherent       | Active            | Maintained     |
| CC1          | Stopped      | On  | Coherent       | Active            | Maintained     |
| CC1E         | Stopped      | On  | Coherent       | Request LFM       | Maintained     |
| CC3          | Stopped      | On  | Flushed to LLC | Request Retention | Maintained     |
| CC6          | Stopped      | On  | Flushed to LLC | Power Gate        | Flushed to LLC |
| CC7          | Stopped      | Off | Flushed to LLC | Power Gate        | Flushed to LLC |

### Threads and Core C-State



# Package C-State

| Package C-State           | Core<br>States                | Limiting Factors                                                                                                                                                                                                      | Retention and<br>PLL-Off             | LLC<br>Fully<br>Flushed | Notes <sup>1</sup> |  |
|---------------------------|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|-------------------------|--------------------|--|
| PC0 – Active              | CC0                           | N/A                                                                                                                                                                                                                   | No                                   | No                      | 2                  |  |
| PC2 - Snoopable<br>Idle   | CC3-CC7                       | <ul> <li>PCIe/PCH and Remote Socket<br/>Snoops</li> <li>PCIe/PCH and Remote Socket<br/>Accesses</li> <li>Interrupt response time<br/>requirement</li> <li>DMI Sidebands</li> <li>Configuration Constraints</li> </ul> | VccMin<br>Freq = MinFreq<br>PLL = ON | No                      | 2                  |  |
| PC3 – Light<br>Retention  | at least<br>one Core<br>in C3 | Core C-state     Snoop Response Time     Interrupt Response Time     Non Snoop Response Time                                                                                                                          |                                      | 2,3,4                   |                    |  |
| PC6 – Deeper<br>Retention | CC6-CC7                       | <ul> <li>LLC ways open</li> <li>Snoop Response Time</li> <li>Non Snoop Response Time</li> <li>Interrupt Response Time</li> </ul>                                                                                      | Vcc = retention<br>PLL = OFF         | No                      | 2,3,4              |  |

# Package C-State Entry/Exit



# State Combinations

| Global (G)<br>State | Sleep<br>(S) State | Processor<br>Core<br>(C) State | Processor<br>State | System<br>Clocks | Description     |
|---------------------|--------------------|--------------------------------|--------------------|------------------|-----------------|
| G0                  | S0                 | C0                             | Full On            | On               | Full On         |
| G0                  | S0                 | C1/C1E                         | Auto-Halt          | On               | Auto-Halt       |
| G0                  | S0                 | C3                             | Deep Sleep         | On               | Deep Sleep      |
| G0                  | S0                 | C6/C7                          | Deep Power<br>Down | On               | Deep Power Down |
| G1                  | S3                 | Power off                      | -                  | Off, except RTC  | Suspend to RAM  |
| G1                  | S4                 | Power off                      | -                  | Off, except RTC  | Suspend to Disk |
| G2                  | S5                 | Power off                      | -                  | Off, except RTC  | Soft Off        |
| G3                  | NA                 | Power off                      |                    | Power off        | Hard off        |

### Thermal Monitoring and Protection

- Catastrophic shutdown detector
- Automatic and adaptive thermal monitoring
- Software controlled clock modulation
- On-die digital thermal sensor and interrupt



- Advanced Smart Cache
- Smart Memory Access
- Turbo Boost 2.0
- Enhanced SpeedStep
- Hyper-Threading

#### Advanced Smart Cache Efficient Data Sharing

#### Advanced Smart Independent Cache Cache Core1 Core2 Core2 Core1 **---**L2 Cache L2 Cache Cache FSB FSB Main memory Main memory Intel Developer FOR 2X L2 to L1 Bandwidth

61

### Intel Advanced Smart Cache

- Higher Cache Hit Rate
- Reduced BUS traffic
- Lower Latency to Data

# Intel Smart Memory Access

Goals:

Improves system performance

- Hides latency of memory accesses

- How:
  - Memory Disambiguation
  - IP-based prefetcher





#### Intel® Hyper-Threading Technology



#### What is it?

- Intel® Hyper-Threading Technology enables each processor core to run two tasks at the same time
- Two thread engines per core, enabling 4way processing in dual core systems and 8-way processing in quad core systems
- Available with the new Intel<sup>®</sup> Core<sup>™</sup> family of processors

#### Benefits for consumers

- More threads and smart multitasking equals better performance
- Faster response time = less waiting



Intel Confidential Under embargo until further notice



#### Features

- Duplicated for each logical processor
- Shared by logical processors in a physical processor
- Shared or duplicated, depending on the implementation







#### Intel<sup>®</sup> Turbo Boost Technology 2.0 Dynamically Delivering Optimal Performance



#### Intel<sup>®</sup> Turbo Boost Technology<sup>1</sup> 2.0



#### **Graphics Dynamic Frequency and Power Sharing**



Note1: Power Sharing shown here with Single Core Turbo is only for Illustrative purposes. Power Sharing can also occur when other cores are active as long as thermal headroom exists

Note2: Sandy Bridge is a monolithic die with Integrated graphics. Graphics Core shown above as separate from CPU Cores is only for illustrative purposes.

- Intel® HD Graphics with Dynamic Frequency delivers graphics performance boost to graphics intensive applications
- Power sharing algorithm works in concert with Intel<sup>®</sup> Turbo Boost Technology 2.0 to deliver performance when and where needed

Performance boost to graphics intensive applications when power and thermal headroom exist

#### **Next Generation Intel® Turbo Boost Technology**

|                                                                                                     | Merom/                                        | Nehalem/V                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                          |                                                                                                                                                                                                                                      |  |  |
|-----------------------------------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Client                                                                                              | Penryn (Mobile<br>only)                       | Clarksfield<br>Lynnfield/Clarkdale                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Arrandale                                                                                                                                | Sandy Bridge                                                                                                                                                                                                                         |  |  |
| Key New<br>Capabilities                                                                             | • 1 turbo bin<br>when other<br>core is asleep | <ul> <li>Turbo controlled<br/>within power limit</li> <li>Multi-core turbo</li> <li>More turbo if cores<br/>are asleep</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | <ul> <li>Graphics Dynamic<br/>Frequency</li> <li>Driver controlled<br/>power sharing<br/>between IA and<br/>Graphics (Mobile)</li> </ul> | <ul> <li>HW controlled power<br/>sharing between IA<br/>cores and Graphics</li> <li>Dynamic Turbo provides<br/>high <u>responsiveness</u></li> <li>More Turbo headroom<br/>from Improved power<br/>monitoring and control</li> </ul> |  |  |
| Turbo<br>Behavior<br>Illustrative<br>only. Does<br>not represent<br>actual number<br>of turbo bins. |                                               | Ouad Core Die         Single Core       Dual Core       Quad Core         Turbo       Turbo       Turbo         Turbo       Turbo       Turbo | Dual Core DieSingle<br>Core<br>TurboDual<br>Core<br>TurboGraphics<br>Turbo000000000001000                                                | Dual Quad Core Die                                                                                                                                                                                                                   |  |  |

## TB 1.0 Vs TB 2.0

#### **Innovative Concept: Thermal Capacitance**



### Next Generation Intel<sup>®</sup> Turbo Boost Benefit



## Synthetic Test



### Multimedia Test



# Videogames Test



#### Processori @ Dinox PC

Margine percentuale di guadagno con Turbo ON (Core i5-2500K)



## Consumption



## Overclock wins!



## Intel Enhanced SpeedStep

- Advanced means of enabling very high performance while also meeting the powerconservation needs of mobile systems.
- Switches both voltage and frequency in tandem between high and low levels in response to processor load

# Performance

# Test Setups

| Motherboard:        | ASUS P8Z68-V Pro (Intel Z68)<br>ASUS Crosshair V Formula (AMD 990FX)<br>Intel DX79SI (Intel X79) |
|---------------------|--------------------------------------------------------------------------------------------------|
| Hard Disk:          | Intel X25-M SSD (80GB)<br>Crucial RealSSD C300                                                   |
| Memory:             | 4 x 4GB G.Skill Ripjaws X DDR3-1600 9-9-9-20                                                     |
| Video Card:         | ATI Radeon HD 5870 (Windows 7)                                                                   |
| Video Drivers:      | AMD Catalyst 11.10 Beta (Windows 7)                                                              |
| Desktop Resolution: | 1920 x 1200                                                                                      |
| OS:                 | Windows 7 x64                                                                                    |

## Processor Comparison

| Processor Number    | i7-3960X                 | i7-2600K                | i7-990X                  |
|---------------------|--------------------------|-------------------------|--------------------------|
| # of Cores          | 6                        | 4                       | 6                        |
| # of Threads        | 12                       | 8                       | 12                       |
| Clock Speed         | 3.3 GHz                  | 3.4 GHz                 | 3.46 GHz                 |
| Max Turbo Frequency | 3.9 GHz                  | 3.8 GHz                 | 3.73 GHz                 |
| Cache               | 15 MB Intel® Smart Cache | 8 MB Intel® Smart Cache | 12 MB Intel® Smart Cache |

### Cache and Memory Bandwidth Performance

| Cache/Memory Latency Comparison     |    |    |    |             |  |  |  |
|-------------------------------------|----|----|----|-------------|--|--|--|
|                                     | L1 | L2 | L3 | Main Memory |  |  |  |
| AMD FX-8150<br>(3.6GHz)             | 4  | 21 | 65 | 195         |  |  |  |
| AMD Phenom II X4<br>975 BE (3.6GHz) | 3  | 15 | 59 | 182         |  |  |  |
| AMD Phenom II X6<br>1100T (3.3GHz)  | 3  | 14 | 55 | 157         |  |  |  |
| Intel Core i5 2500K<br>(3.3GHz)     | 4  | 11 | 25 | 148         |  |  |  |
| Intel Core i7 3960X<br>(3.3GHz)     | 4  | 11 | 30 | 167         |  |  |  |

| Memory Bandwidth Comparison - Sandra 2012.01.18.10 |                                                         |                                                         |                                                      |  |  |  |
|----------------------------------------------------|---------------------------------------------------------|---------------------------------------------------------|------------------------------------------------------|--|--|--|
|                                                    | Intel Core i7<br>3960X (Quad<br>Channel, DDR3-<br>1600) | Intel Core i7<br>2600K (Dual<br>Channel, DDR3-<br>1600) | Intel Core i7 990X<br>(Triple Channel,<br>DDR3-1333) |  |  |  |
| Aggregate Memory<br>Bandwidth                      | 37.0 GB/s                                               | 21.2 GB/s                                               | 19.9 GB/s                                            |  |  |  |

# Windows 7 Application Performance

#### Cinebench 11.5 - Single Threaded Score in CBMarks - Higher is Better

Intel Core i7 3960X 1.57 Intel Core i7 2600K 1.52 Intel Core i5 2500K 1.47 Intel Core i7 990X 1.2 1.11 AMD Phenom II X6 1100T BE AMD FX-8150 1.02 0 0 2 0 2

#### **Cinebench 11.5 - Multi-Threaded**

Score in CBMarks - Higher is Better



#### 7-zip Benchmark 32MB Dictionary - Total MIPS - Higher is Better



#### AES-128 Performance - TrueCrypt 7.1 Benchmark Mean Encryption/Decryption AES Algorithm - GB/s



### x264 HD Benchmark - 1st pass - v3.03 Frames per Second - Higher is Better



### x264 HD Benchmark - 2nd pass - v3.03 Frames per Second - Higher is Better



### Adobe Photoshop CS4 - Retouch Artists Speed Test Time in Seconds - Lower is Better

| Intel Core i7 3960X               |     | 1  | 11  |      |    |    |    |    |    |    |    |
|-----------------------------------|-----|----|-----|------|----|----|----|----|----|----|----|
| Intel Core i7 2600K               |     | 11 | .3  |      |    |    |    |    |    |    |    |
| Intel Core i7 990X                |     | 1  | 2.4 |      |    |    |    |    |    |    |    |
| Intel Core i7 980X                |     | 1  | 2.4 |      |    |    |    |    |    |    |    |
| Intel Core i5 2500K               |     | 1  | 2.6 |      |    |    |    |    |    |    |    |
| AMD FX-8150                       |     |    | 14  | 8    |    |    |    |    |    |    |    |
| AMD Phenom II X6 1100T BE         |     |    |     | 18.4 |    |    |    |    |    |    |    |
| Intel Core 2 Extreme QX9770       |     |    |     | 18.5 | 5  |    |    |    |    |    |    |
| Intel Pentium Extreme Edition 955 |     |    |     |      |    |    |    |    | 4  | 4  |    |
|                                   | 0 ! | 5  | 10  | 15   | 20 | 25 | 30 | 35 | 40 | 45 | 50 |

#### **Build Chromium Project - Visual Studio 2008**

**Compile Time in Minutes - Lower is Better** 

Intel Core 3960X (3.3GHz) 15 Intel Core 990X (3.46GHz) 15.3 Intel Core i7 2600K (3.4GHz) 18.6 AMD Phenom II X6 1100T (3.3GHz) 24 Intel Core i5 2500K (3.3GHz) 27.1 Intel Core i5 2400 (3.1GHz) 27.5 AMD FX-8150 (3.6GHz) 28.8 AMD Phenom II X4 975 BE (3.6GHz) 30.3 5 10 15 20 25 30 35 0

# Gaming Performance





#### DiRT 3 - Aspen Benchmark - 1920 x 1200 High Quality Average Frames per Second - Higher is Better



#### World of Warcraft FRAPS Runthrough - FPS - Higher is Better



# **Power Consumption**

### **Power Consumption - Idle** Total System Power Consumption in Watts (Lower is Better)

AI

| Intel Core i5 2500K (3.3GHz)    |   |    |    | 76   |      |       |     |
|---------------------------------|---|----|----|------|------|-------|-----|
| Intel Core i5 2400 (3.1GHz)     |   |    |    | 76.2 |      |       |     |
| Intel Core i7 2600K (3.4GHz)    |   |    |    | 77.  | 6    |       |     |
| Intel Core i7 3960X             |   |    |    | 79   | .3   |       |     |
| AMD FX-8150 (3.6GHz)            |   |    |    | 8    | 34.8 |       |     |
| Intel Core i7 990X              |   |    |    |      | 97   | .4    |     |
| MD Phenom II X6 1100T (3.3GHz)  |   |    |    |      |      | 109.4 |     |
| MD Phenom II X4 975 BE (3.6GHz) |   |    |    |      |      | 110   |     |
|                                 | 0 | 20 | 40 | 60   | 80   | 100   | 120 |

#### Power Consumption - Load (x264 HD 3.03 2nd Pass) Total System Power Consumption in Watts (Lower is Better)



# Overclocked Performance and Consumption





#### Overclocked Power Consumption - Load (x264 HD 3.03 2nd Pass) Total System Power Consumption in Watts (Lower is Better)

| Intel Core i5 2400 (3.1GHz)      | 131.6                              |
|----------------------------------|------------------------------------|
| Intel Core i5 2500K (3.3GHz)     | 133.3                              |
| Intel Core i7 2600K (3.4GHz)     | 155.4                              |
| AMD Phenom II X4 975 BE (3.6GHz) | 183.8                              |
| Intel Core i7 990X               | 200                                |
| AMD Phenom II X6 1100T (3.3GHz)  | 200                                |
| Intel Core i7 3960X              | 211 + 52%                          |
| AMD FX-8150 (3.6GHz)             | 229                                |
| Intel Core i7 3960X @ 4.6GHz     | 320                                |
| AMD FX-8150 @ 4.8GHz             | 406                                |
| C                                | 50 100 150 200 250 300 350 400 450 |

### **Final Words**

- No-compromise, ultra high-end desktop solution
- May be world's fastest desktop CPU
- Lack of an on-die GPU
- Doesn't make gaming experience any better or speed up the majority of desktop applications

# Thank You