[Next] [Up] [Previous]

Progress in Chip Design

The year 1992, which, as we saw on the previous page, was notable for the introduction of Windows 3.1, which proved to be very successful, trannsitioning the IBM PC platform from being one with a command-line interface to one that, like its competitor the Macintosh, had a Graphical User Interface (GUI).

That year is also notable for the introduction, by the Digital Equipment Corporation, of the Alpha AXP 21064 microprocessor. This microprocessor, like several other RISC processors, avoided the use of condition code bits. It was one of the earliest chips to have a 64-bit architecture. Because it was a very high-performance design, representing the peak of what was possible to put on a single chip at the time, it was also quite expensive, and that limited its popularity in the marketplace, but it is remembered for the many innovations which it embodied.

That is not to say that nothing happened between 1984 and 1992. July 1985 marked the availability of the Atari ST computer, and in early 1986 one could get one's hands on an Amiga 1000. So there were GUI alternatives cheaper than a Macintosh before 1992, but this was one that just involved buying software for the computer you already had, the computer that was the standard everyone else was using for serious business purposes.

In 1989, the Intel 80486 chip came out; unlike previous chips, it included full floating-point hardware as a standard feature right on the chip, although later the pin-compatible 80486SX was offered at a lower price without floating-point.

In February, 1990, IBM released its RS/6000 line of computers. They were based on the POWER architecture, which later gave rise to the PowerPC. This RISC architecture had multiple sets of condition codes to allow the instructions that set conditions to be separated from conditional branch instructions, reducing the need for branch prediction.

The high-end machines in the RS/6000 line used a ten-chip processor, the RIOS-1, notable for being the first microprocessor to use register renaming and out-of-order execution. This technique was invented by IBM for the IBM System/360 Model 91 computer. That computer had several immediate descendants, the 95, the 360/195 and 370/195, but after them, IBM did not make use of this groundbreaking technique in its computers for a while. This has been perceived by some as an inexcusable oversight on their part, but given that this technique is only applicable to computer systems of a size and complexity that, until recently, were associated only with the very largest computers, it should be more appropriately viewed as a natural consequence of IBM making the computers that were relevant to its customers within its core business.

And IBM did make use of out-of-order execution when appropriate, and often before others.

And then out-of-order execution was again used, first in the RIOS-1, as noted, in 1990, and then in IBM mainframes in the IBM ES/9000 Model 520, from 1992. It was again used with the G3 CMOS processor in the 9672 processor in the ES/9000 family in 1996.

The RIOS-1, like the 360/91, only used OoO for its floating-point unit; the same was true of the Pentium Pro and Pentium II processors, which introduced this technique to the world of personal computers.

Out-of-order execution was first used in CMOS single-chip processors implementing the zArchitecture with the z196 from 2010.

In 1993, Intel offered the first Pentium chips, available in two versions, the full 66 MHz version, and a less expensive 60 MHz version. These chips were criticized for dissipating a considerable amount of heat, and there was the unfortunate issue of the floating-point division bug. These chips were pipelined, but in-order, in both their integer and floating-point units.

In 1994, the last chip in the Motorola 680x0 series was introduced, the 68060. Its integer unit, but not its floating-point unit was pipelined (with an in-order pipeline). This processor was never used in any Apple Macintosh computers (or, more correctly, no Apple Macintosh computers were manufactured with that chip; there were third-party upgrade boards that let you replace the existing 68040 processor in one with a 68060 and some interface circuitry), as Apple began selling Macintosh computers using the PowerPC chip instead in March 1994.

As a result, the 68060 never made much of a splash in the market, although there were a few high-end Amiga computers, such as the Amiga 4000T, that used it. There was even a motherboard, the Q60, made by a German company, that could fit in a PC case and which allowed one to run the operating system from the Sinclair QL computer with the 68060 chip.

The Intel Pentium Pro chip was announced on November 1, 1995. This design was optimized for 32-bit software, and was criticized for its performance with the 16-bit software that most people were still using. The later Pentium II resolved that issue, but was otherwise largely the same design, but of course with improvements. However, unlike the case with the Pentium Pro, the cache ran at a slower speed than the processor. Further improvements appeared in the Pentium III. A hardware random-number generator was included in the support chipset for that processor, as part of a feature which included a serial number on the chip that software could read. Although that feature could be disabled, it was controversial; the intent was to facilitate the distribution of software that had to be well-protected against piracy or misuse (i.e., software to display protected content under controlled circumstances).

The Pentium Pro was available in different versions; a 150 MHz version used a processor built on a 500 nm process (or 0.5 micron), while a 200 MHz version used a processor built on a 350 nm process.

The AMD K5 microprocessor was also out-of-order, and was introduced in 1996, not long after. It initially had bugs in branch prediction and caching, but these were later resolved. Even after the bugs were corrected, it was never very successful, unlike the K6, its successor. The K5 was largely based on the Am29050, a version of the AMD Am29000 RISC microprocessor. As this high-performance processor was used in avionics, when AMD decided to cease development of the 29000 series, it sold the design to Honeywell, which still makes products in the 29KII family to this day. 29000 microprocessors were also used in some laser printers.

The 29050 was the only out-of-order member of the 29000 family; this chip was introduced in 1990, and so it predated the Pentium Pro. However, the Intel 80960CA, from its i960 lineup, was introduced in 1989, and was also superscalar, but unlike the 29050, it required a coprocessor for hardware floating-point.

The AMD K6, introduced in 1997, was based on the NextGen Nx686, and thus resulted from AMD's purchase of that company. Unlike the K6, it was a success, and thus allowed AMD to provide serious competition to Intel.

The Pentium 4 chip, introduced on November 20, 2000, was a completely new design. It had fewer gate delays per pipeline stage. This meant that the chip's cycle frequency was faster, but instructions took more cycles to execute. At the time, this sounded to people like it was a marketing gimmick instead of an improvement. In fact, though, it was a real improvement, because the Pentium 4 was a pipelined chip with out-of-order execution, intended to issue new instructions in every single cycle, rather than waiting to perform one instruction after another: therefore, a higher cycle frequency did mean that more instructions were being performed in a given time, as long as the pipeline could be kept just as full.

But initially it required the use of a new and more expensive form of memory, which did not help its success in the market.

Intel's subsequent chips in the Core microarchitecture went back to many aspects of the Pentium III design for an important technical reason: shorter pipeline stages meant that the transistors on the chip were doing something a greater fraction of the time. This would produce more heat, and the characteristics of newer, smaller integrated circuit processes were not proving as favorable as hoped (this is known as the demise of Dennard scaling), and so a similar design on a newer process would have dissipated more heat than it would be practical to remove.

Some sources give a date of 2006-2007 for when Dennard scaling came to an end; looking at a graph of how clock speeds improved in processors, it seemed to me as if the transition from rapid progress to a much slower pace that shortly came almost to a stop took place in 2003. A closer look at what was going on at the time, though, shows that the high clock rates seen in 2003 were due to the design characteristics of the Pentium 4 processor of that period.

And, thus, other means of increasing performance were needed, and we entered the era of dual-core and quad-core microprocessors.

IBM came out with a dual-core PowerPC processor in the POWER4 series in 2001.

It was not until May 2005, though, that Intel released the Pentium D and the Pentium 840 Extreme Edition, and AMD released the Athlon 64 x2 on May 31, 2005; but a dual-core Opteron was released on April 21.

The first Core 2 Quad processor was the Extreme Edition QX6700, introduced in November 2006; it was followed by one "for the rest of us" in January 2007, the Q6600.

The Q6600 was a 65nm chip, with a clock frequency of 2.4 GHz, making it not too different from contemporary chips in raw speed, even though today's chips are on considerably smaller process nodes.

These early quad core chips were a multi-chip module with two dual-core dies in one package; in March 2008, AMD released a monolithic quad-core Phenom processor. However, the Core 2 Quad already had Hyper-Threading, whereas AMD did not introduce SMT to its line-up of processors until much later with Ryzen. And the original Threadripper 1950X from AMD used two eight-core dies to achieve its sixteen cores, while Intel's 18-core i9-7980XE was monolithic.

For comparison, in May, 2003, Pentium 4 chips with clock frequencies of up to 2.8 GHz on a 130 nm process were released.

In both cases, shortly after, more expensive versions with even higher speeds were released, these are merely the top-speed chips considered to be part of the "mainstream". Since the Pentium 4 achieved a high clock rate by using unusually short pipeline stages, however, the fact that similar clock speeds were subsequently achieved by the Core 2 design (which was viewed at the time as a return to an internal design, or microarchitecture, similar to that of the Pentium III) would imply that the move from 130 nm to 65 nm did make the logic gates faster, making it correct to view 65 nm as the point at which Dennard scaling came to an end.

We don't have to guess, however. Pentium III chips were also produced on a 130 nm process, and they went up to 1.4 GHz in speed, half the clock frequency of Pentium 4 chips made on the same process.

The Core 2 Duo E6850, manufactured on a 65nm process, offered a clock speed of 3 GHz, but the Core 2 Quad Q6700 offered 2.67 GHz.

At 45nm, one could go up to 3.33 GHz, although the Core 2 Quad Q9650 went up to 3 GHz, so, again, quad-core processors were limited to lower speeds than dual-core processors on the same process. At 32nm, up to 3.4 GHz, at 22nm up to 3.5 GHz, so process in clock speed was very gradual after that point.

Outside the Processor

As well, some dates to provide a frame of reference for the progress in bus connectors in processors might be in order.

In the original IBM PC, in 1981, memory was either added in sockets on the motherboard, or by means of memory cards that plugged into the standard peripheral bus, which was also used for video cards. The IBM Personal Computer/AT from 1984 included a revised bus with an extended connector that was upwards compatible, adding support for a larger address bus, and for a 16-bit data bus instead of an 8-bit data bus.

After the IBM Personal System/2 introduced the Micro Channel bus in 1987, a competing standard, EISA, was offered by other computer manufacturers to offer similar features but with upwards compatibility with older peripherals.

Then the PCI bus was introduced by Intel in 1992, originally in a 32-bit version.

By 1986, 30-contact single in-line memory modules (SIMMs) were used with AT-compatible computers and others. The 72-pin SIMM was adopted when newer processors encouraged a move to a 32-bit memory bus from a 16-bit one, and the 168-pin DIMM, similarly, replaced matched pairs of 72-pin SIMMs as a 64-bit memory bus was needed for the Pentium. Newer generations of memory have resulted in changes to the DIMM design, adding more contacts.

High-performance graphics cards began to move to the Advanced Graphics Port (AGP) soon after Intel announced the spec in 1997, and then they, along with other peripherals, moved to PCI-Express (PCIe) after it was introduced in 2003.

The Transition to 64 Bits

As noted above, one of the first 64-bit microprocessors was the DEC Alpha, first released in 1992. The MIPS R4000 was announced on October 1st, 1991, and has been referred to as the first 64-bit microprocessor; this chip, and its derivatives, were used in some SGI workstations.

The Itanium was announced by Intel on October 4th, 1999. The first Itanium chips were released in June 2001. This was intended to be the architecture to be used by Intel customers who needed a 64-bit address space. The Pentium Pro, from November 1, 1995, introduced Physical Address Extensions, so that x86-based systems could have large memories even if individual programs could only make use of a virtual memory no more than 2 gigabytes in size.

AMD responded by announcing their plan to provide a 64-bit extension to the x86 architecture in 1999. Their first 64-bit Opteron chips were released in April, 2003, and Intel accepted that 64-bit virtual addresses were needed by their x86 customers, and so they accepted the AMD scheme under their own name of EM64T (with a few minor changes; nearly all programs use the subset of features common to both manufacturers) releasing chips which used it starting from 2005.

In the meantime, IBM delivered its first mainframes with the 64-bit zArchitecture, modified from the 32-bit architecture of previous mainframes derived from the IBM System/360, in 2000. The z990, the top-end machine in the first generation of System/z, is shown at right.

The z990 was built from multi-chip modules. In these modules, several dies contained one CPU each; however, they contained most of the components for two cores, because each instruction was executed in parallel twice, side by side on the halves of the chip, so that the results could be compared in order to provide an extremely low probability of errors. Going to such lengths, of course, would be considered wasteful by home computer users.

The current generation of zSystem is the IBM z16, introduced on April 5, 2022. It is the ninth generation of zSystem mainframes; the CPU chip used in it has a clock frequency of 5 GHz, and has eight cores on one die.


There were still a number of RAS features (Intel used that acronym to stand for Reliability, Availability, and Security, whereas originally IBM used it to mean Reliability, Availability, and Serviceability; of course, unlike an IBM 360 mainframe, one can't open up a microchip to swap out circuit boards) provided on Itanium processors that were not available even on the high-end commercial server Xeon chips with the x86 architecture.

This changed much later, in 2014, with the introduction of the Xeon E7 v2 line of processors.


History is Still Being Made: the AMD Resurgence

October 12, 2011 was the day when the FX-4100, FX-6100, FX-8120 and FX-8150 processors from AMD were released. These were Opterons with four, six, and (in the last two cases) eight cores respectively, based on the new Bulldozer microarchitecture.

These chips were made on a 32nm process. A pair of cores shared a single 256-bit SIMD unit, which limited the design's power for programs that made use of AVX-256 instructions.

The base clock frequency of the FX-4100 was 3.6 GHz. However, the Bulldozer design was based on individual pipeline stages with a small number of gate delays, like that of the Pentium 4 from Intel.

It should be noted, though, that this is also true of the CPU chips inside current IBM mainframes. This type of design is workable in those IBM mainframes, although not for conventional desktop microcomputers, because IBM uses advanced water cooling in those mainframes. Shorter pipeline stages are not a bad thing; they provide more throughput, since more instructions can be executing concurrently within the pipeline (the pipeline being fed with instructions that don't each depend on the result of the immediately preceding instruction either by means of careful programming (facilitated by certain characteristics of RISC architectures, like their large banks of registers), or through out-of-order execution, or through simultaneous multi-threading (SMT)).

This was the beginning of a difficult era for AMD. Chips made with the Bulldozer microarchitecture, and its successors, Piledriver, Steamroller, and Excavator were percieved as having very disappointing performance. This led to AMD competing primarily in the lower end of the market, and having to price their chips based on the performance they achieved which was less than expected.

On March 2, 2017, the first Ryzen chips, the Ryzen 7 1800X, the Ryzen 7 1700X, and the Ryzen 7 1700, were available from AMD. These chips were based on an all-new Zen microarchitecture which corrected the mis-steps of the Bulldozer microarchitecture.

These chips were the first ones from AMD to include SMT (simultaneous multithreading), a feature Intel had offered for some time under the brand name HyperThreading.

Although the Zen microarchitecture was a big improvement over Bulldozer and Piledriver and the rest, however, the performance of an individual core was not equal to that of a single core on an Intel processor.

But the Ryzen processors from AMD were still very impressive, because they had eight cores, while Intel processors had four.

Intel did also make server processors with higher core counts, and indeed during the Bulldozer years, AMD also made Opterons with 12 and 16 cores. (The 16 core one was a Piledriver, however.) Those chips, being intended for business, sold at premium prices. So software intended for the consumer, particularly computer games, generally wasn't designed to make effective use of a larger number of cores.

Thus, Intel's competitive response to the introduction of the first Ryzen chips was to come out with a six-core chip, the i7-8700K in October. Because the individual cores were more powerful, it matched the eight-core Ryzen chips in total throughput, but because there was more performance in each core, it performed significantly better on games that could only make use of a limited number of cores.

Also, while AMD placed a 256-bit vector unit in each core with Ryzen, rather than sharing them between cores as in Bulldozer and its related successors, Intel had increased the amount of vector processing power in its cores, leaving AMD still behind.

The next generation of AMD Ryzen chips was announced on August 13, 2018; these included significant improvements over the previous generation, but Intel had also been improving its processors.

The third generation of AMD Ryzen chips was announced on July 7, 2019. At this point, AMD had achieved near-parity with Intel. AMD had spun off its physical chipmaking functions to a separate company, GlobalFoundries, in October, 2009. Because of the high cost of building facilities to make chips at more and more advanced process nodes, GlobalFoundries eventually declined to pursue the next major node after 14nm, although they have gone somewhat beyond that later with a 12nm process.

As a result, AMD was having the CPU portion of its Ryzen multi-chip module processor chips made by TSMC on their 7nm process.

Intel's 10nm process was basically equivalent the the process TSMC called 7nm, but Intel was having troubles getting it to work. Those troubles would turn out to take longer than expected to emerge from.

So with the third Ryzen generation, there was no real reason to hesitate in getting an AMD processor, if one wanted the best.

On November 5, 2020, AMD announced their fourth generation of chips. While the previous generation was close enough to Intel's chips in per-core performance that the difference was not significant, with this generation AMD could now claim leadership.

Meanwhile, Intel was finally able to produce chips in volume on its 10nm process. However, not all the problems with that process had been eliminated; they were only making laptop chips on the 10nm process, because they could not attain clock rates that would be competitive on the desktop with that process as it was.

Thus, on March 30, 2021, Intel released the i9-11900K and the i7-11700K, among other chips, based on the 14nm process. They were based on designs originally intended for 10nm chips.

Many reviews of these chips criticized them as having performance that was not much better, and perhaps even slightly worse, than their predecessors in Intel's previous generation.

However, these chips included support for AVX-512 instructions; and so I had suspected that when programs are written to make effective use of this new capability, they will turn out to be very impressive, thus giving AMD extremely serious competition. As of this writing, though, events have yet to justify my optimism.

Since the above was written, Intel has made it clear that AVX-512 was never really intended to be a feature of their 12th-generation Alder Lake chips (these are from the generation following the i9-11900K and i7-11700K mentioned above, these being 11th-generation Rocket Lake parts), and the ability to perform these instructions will be fused off on later production, so it was not intended to use this as a feature providing a competitive advantage over AMD on a continuing basis for the time being. In fairness, the fact that many of the chips in this line-up had both P-cores ane E-cores, with the E-cores not containing circuitry for AVX-512, would have made that feature awkward to use in any case.


Also present on this site is this page with a few words concerning recent events in the GPU rivalry between AMD and Nvidia.


[Next] [Up] [Previous]