The Intel EM64T direct competitors
Transcrição
The Intel EM64T direct competitors
The Intel EM64T direct competitors Higino Augusto Marinho Vieira da Cunha e Costa Departamento de Informática, Universidade do Minho [email protected] Abstract. In the beginning of 64 bit computing in x86 architecture there were two processors technologies compatible with IA32: AMD64 and EM64T. Both have almost identical instruction sets, but internally are different. This paper shows the main differences between AMD64 and EM64T. 1 Introduction When the first 32 bit processors of the x86 family was released, the 4GB memory limit appeared to be more than enough for any requirement. As time has passed, the constant growth of databases and other memory hungry applications had made the 4GB memory limit per process tight, requiring an alternative to the current 32 bits architecture. The first approach was made by Intel by introducing the IA64 architecture with the Itanium. It had a completely new instruction set and broke the compatibility with the IA32. AMD took a different approach with the Opteron, making the new processor capable of executing 32 bit code natively by extending the current instruction set of IA32 for 64 Bits. AMD called it x86-64 and later renamed it to AMD64. It was the first time a company besides Intel made successful changes to the x86 architecture. Due to the success of AMD64, Intel developed the EM64T and integrated this technology in the Xeon family of processors. The advantages of the 64 bits architecture are not just the size of addressable memory by a process. Applications that use integers of 64 or more bits require less clock cycles to complete the same operation. Typical examples are scientific calculation and cryptographic algorithms. Even running 32 bits applications on 64 bits Operative Systems have benefits. In March 2004, Microsoft changed all their web servers to Opterons running a prerelease of Windows 2003 x64 Edition, initially keeping the applications unmodified [5]. Using 32 bits, the operative system and the applications had to share 4 GB of memory. With 64 bits, this operative system can address up to 8 terabytes of memory for the kernel, and 8 terabytes of memory for user processes. This way, each application can use 4 GB of memory and not interfere with the kernel memory. 2 64 Bits Processors compatible with IA32 Instruction Set Currently there are two major solutions to 64 bits processors technology able to run 32 bits code natively. The first to be released was the AMD64, as an answer to the initial Intel Approach of 64 bits processors. The main concert of AMD was to make the processor compatible with the current 32 bits architecture [2]. After the initial success of AMD64, Intel decided to follow the AMD approach, implementing the AMD64 Instruction Set, calling it EM64T (Extended Memory 64bit Technology), later renamed to Intel 64. Both processors instruction set are almost identical, and in most of the cases are able to run each others code. Compilers avoid these instructions, so binary code can be run by both. The name x86-64 was given by AMD, but is currently used as a vendor-neutral way to describe this family of processors. 2.1 AMD64 As explained before, with the new instruction set of the Itanium not being compatible with previous processors of the x86 family, AMD decided to build their own 64 processor compatible with the IA32. The AMD64 is a 64 bit processor, as it has 64 bit general purpose registers and their respective logical and arithmetic operations. The memory pointers are also 64 bit wide. In addition, the number of general purpose registers were extended from eight to sixteen and are all 64 bit. There are also eight new SSE registers (128 bit wide) (page 2 to 7 of [7]). The No-Execute bit, already available in systems that use PAE (Physical Address Extension), was implemented. This mechanist defines if a block of memory can contain code or if it is just data (page 143 of [8]). The processor knows if it is working in 64 bit or 32 bits by its operating mode [Table 1]. In the legacy mode, it acts like any other 32 bit processor of the x86 family. There are not any significant performance gains using this mode compared to other equivalent 32 bit processor. In the compatibility mode, the processor requires a 64 bit operating system, but is able to execute 32 bit code. All the registers are seen by the application as 32 bit registers [Fig. 1]. The major improvement in this mode is that each application can have 4 GB of memory. In the long mode the processor works as a true 64 bit. Both the operative system and application have to be 64 bits. Each application has access to 1TB of memory and has access to the registers and instructions added by this architecture. Table 1. AMD64 Operating Modes1 Fig. 1. AMD64 Register Set2 1 2 Source: Page 3 of [7] Source: Page 2 of [7] 3 Differences between AMD64 and EM64T processor Comparing the instructions on AMD64 and EM64T, it can be seen that Intel had reversed engineered the AMD64 Instruction Set [6]. The EM64T was based on AMD’s prerelease documentation as two instructions EM64T did not initially implement were also not present on early AMD documentation. As the Intel processor was based on the AMD64 instruction set and not the hardware itself, the way both processors decode and execute those instructions are different. 3.1 Pipelines The pipelines (chapter 3 of [1]) in a processor allows it to split instructions to smaller micro operations and run them concurrently, so the processor can execute more than one micro instruction at the same time (page 9-10 of [3]). The way the pipeline is designed directly affects the processor clock frequency. The longer the pipeline, the faster the micro operations are made, but also require more clock cycles to complete. The problems with long pipelines are the cache misses and the branch misprediction. Cache misses occur where a micro operation needs the result of another micro operation. In this case the processor needs to wait that the other micro operation finishes so it can use the result. Misspredictions occur when an instruction on the pipeline changes a register that had already been used in other micro operation. This requires to flush the pipeline and restart the micro operation. The EM64T pipeline is longer than the on AMD64 (31 to 12 stages) (pages 11-12 of [3]). Due to the less stages of the AMD pipeline, it is less likely to be affected by cache misses and branch missprediction. A smaller pipeline also allows a less complex prediction algorithm. Additionally the AMD64 has more execution and decode units than the EM64T. This allows the Opteron to start more micro operations per clock cycle. These differences make the Opteron pipeline more efficient in structured programming and the Xeon on linear programming (page 13 of [3]). 3.2 Memory controller AMD and Intel made different choices in the memory controller (page 15-20 of [3]).. In AMD, the memory controller is in the processor while in Intel is in the chipset. This makes AMD have lower memory latencies as the memory is directly attach in the processor. The Intel processor has to make the request outside the processor and thru the external memory controller. Typically the latency of the Opterons are 10 to 40 percent lower. In terms of bandwidth between memory controller and processor, the Opteron has a maximum transfer rate of 8 GB/s while Xeon has 6.4 GB/s. 3.3 Power consumption The operation cost of a server is dependent of its power consumption. In datacenters, cooling is an important issue to consider. As the power consumption rises there is also the need of better cooling solutions. So the operation cost of a server is not just the server consumption, but also the consumption needed to keep the room temperature low. The power consumption of Xeon is around 130 W while the Opteron is about 95 W (page 20-21 of [3])... 4 Benchmarks 4.1 Databases The benchmark on Table 2 and 3 counts the number of queries per second that a MySQL database can handle on Xeon and Opteron. As databases rely on structured programming, Operon had a better performance than the Xeon. Table 2. MySQL 4.0.18 using MyISAM3. Single Xeon (Nocona) with HT 277 338 358 375 371 371 368 Single Opteron 250 2.4GHz 298 370 435 465 455 470 472 Single Opteron 252 2.6 GHz 319 399 470 502 498 507 508 AVG 368 460 497 MAX 375 472 508 Concurrency 1 2 5 10 20 35 50 3 Source: http://www.anandtech.com/IT/showdoc.aspx?i=2447&p=4 Table 3. MySQL 4.0.18 using InnoDB4. Concurrency 1 2 5 10 20 35 50 Single Xeon (Irwindale) 3.6GHz with HT 191 201 219 204 199 193 181 Single Opteron 248 Dual Channel 192 223 259 242 236 221 209 AVG 199 233 MAX 219 259 5 Conclusion In the early stages of 64 bit on the x86 architecture there is not an absolute winner. While the Opteron seems to be better in most situations, the bigger pipeline of the Xeon makes it a good candidate to scientific calculation. References 1. D.Patterson, J.Henessy, Morgan Kaufmann Publishers, “Computer Architecture: A Quantitative Approach”, 3rd Ed., 2002 2. The AMD64 Computing Platform: Your Link to the Future of Computing. http://www.amd. com/us-en/assets/content_type/white_papers_and_tech_docs/30172C.pdf 3. Characterizing x86 processors for industry-standard servers: AMD Opteron and Intel Xeon. http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00238028/c00238028.pdf 4. Intel® Extended Memory 64 Technology (EM64T). http://www.dell.com/downloads/ global/vectors/2004_em64t.pdf 5. Microsoft.com Moves to x64 Version of Windows. http://www.microsoft.com/technet/ itshowcase/content/mscom64bitarchi.mspx 6. AMD and Intel Harmonize on 64: http://www.mdronline.com/watch/watch_abstract.asp? Volname=Issue%20%23118&SID=1137&on=T&SourceID=00000377000000000000 7. AMD64 Architecture Programmer’s Manual - Volume 1: Application Programming. http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_24592.pdf 8. AMD64 Architecture Programmer’s Manual - Volume 2: System Programming. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf 9. Tuning IBM eServer xSeries Servers for Performance - http://www.redbooks.ibm.com/ redbooks/pdfs/sg245287.pdf 4 Source: http://www.anandtech.com/IT/showdoc.aspx?i=2447&p=5
Documentos relacionados
Design, Synthesis and FPGA-based Implementation of a 32
Abstract—With the advent of personal computer, smart phones, gaming and other multimedia devices, the demand for DSP processors in sem iconductor industry and modern life is ever increasing. Tradit...
Leia mais