An Introduction to Computer Processors, Part 1

Contributed by

8 min read

Note: This article references commands, behaviors, and outputs generated by Linux-based operating systems, such as CentOS or Ubuntu. Some information will not be relevant to other operating systems, such as Windows.

Editor's Note: This blog post provides an introduction to processor technologies, especially in the context of distributed computing environments. It is part of the in-depth series,A Study of Performance in Distributed Computing Environments. The blog series covers distributed computing and high-performance computing and describes the challenges of running these systems, ways of troubleshooting these systems, and will eventually describe how the MapR Data Platform solves key issues in these areas. You may want to read the previous blog post in this series, "An Introduction to Disk Storage," for additional context.

Computers can be generally described as having three types of resources: processor cycles, memory bytes, and input/output devices. For any particular application, performance tends to be either processing bound or I/O bound. Which is to say, the time it takes the application to complete is primarily related to the rate of computations the processor(s) can complete or the rate at which data can be read from or written to an I/O device. In the previous blog post in this series, I discussed the characteristics of storage systems (i.e. I/O devices) and how they affect I/O bound applications. In this article we'll take a look at the characteristics of processors and how they affect computationally intensive applications.

At MapR, the software we develop is generally compiled for x86-64. We release a fair amount of Java software, which is agnostic to the computer architecture due to Java's portability. By and large, the hardware on which our software runs is based on the x86-64 architecture. All discussions specific to an instruction set are solely in reference to x86-64, unless stated otherwise. Some content may be relevant to other architectures, but that is not a focus of this article.

Computer Processors Are Complex

Storage drives are fairly simple devices. From a functional perspective, you can read or write units of data at particular addressable offsets. The high level designs are also rather simple. A magnetic substrate is manipulated such that a particular magnetic orientation represents a 0 and a second magnetic orientation represents a 1. The substrate is formed into a circular "platter" that spins. A "head" moves radially while hovering slightly above the platter. The head is capable of writing, by setting the magnetic orientation of a small section of the platter, and reading, by checking the orientation of a small section of the platter. Certainly, the actual engineering and manufacture of the hardware is complex, but understanding the functionality and high level design of storage drives is trivial when compared to computer processors.

I'm going to preface this article by saying that we will only go through a cursory discussion of the design of computer processors. We will ultimately use some rather rudimentary abstractions when it comes time to discuss how processors affect computationally intensive applications and how to monitor processor usage and describe bottlenecks. There will be caveats to those abstractions that are too numerous to fully detail.

Instruction Set Architecture

In computing, an instruction set defines an interface between hardware and software as well as describing the general architecture of the computer. Instruction sets generally define various types of instructions for doing things like:

  • reading from and writing to memory and I/O devices
  • basic mathematical calculations like add/subtract/multiply/divide
  • bitwise and and or operations
  • logical comparisons, such as whether a value is equal to or less than another
  • jumping to another part of a program and executing the instructions found there

The instruction set defines what you can ask the computer to do, but it does not define how the computer actually implements the instruction. So while AMD and Intel both manufacture processors that implement the x86-64 instruction set, the designs of those processors differ in significant ways. In fact, different processors from a single manufacturer may both implement the same instruction set but do so by using very different designs with great consequence to performance. Like the saying goes, "there's more than one way to …"

Instruction Set Representation

In relation to software, regardless of what language is used for development, all programs must be expressed in the instruction set for the computer on which they will run.

For instance, given a line of C code like this:

long theSum = 500 + 600;

Expressing this in x86-64 instructions, also referred to as assembly code, might look like this:

movq    $600, %rax
movq    $500, %rdx
addq    %rdx, %rax

These 3 instructions will place the value of 600 into the rax register, the value 500 into the rdx register, then store the sum of rdx and rax in the rdx register. Note that there is no mention of a variable named theSum in the assembly code. We might use a logical name like theSum to help us understand what the C source code does, but for the instruction set and for the computer, we just refer to a place at which a value can be stored. In this case, the rdx register is where we are storing the value.

Computers can't actually use this representation, however. We must encode these lines of assembly to what is referred to as machine code. For an x86-64 computer, encoding those assembly lines to machine code will yield a representation like this (in hex):

48 c7 c0 58 02 00 00         (movq    $600,%rax)
48 c7 c2 f4 01 00 00         (movq    $500,%rdx)
48 01 d0                     (addq    %rdx,%rax)

Note: The content in parenthesis was added by me to clarify which bits correspond to which assembly instructions.

For the first instruction, the 0x48 byte indicates the instruction acts upon 64-bit registers; the 0xc7 byte indicates the instruction is mov; the 0xc0 indicates the operation pertains to register rax, etc. I won't go into the full details of encoding assembly to machine code. This series of bytes can be passed to the x86-64 CPU for execution, but I also won't go into the details of how bytes are actually passed into a CPU for execution.

Part two of this topic will cover processor design and efficiencies.

These blog posts are part of a larger in-depth series called A Study of Performance in Distributed Computing Environments.

This blog post was published March 06, 2019.

50,000+ of the smartest have already joined!

Stay ahead of the bleeding edge...get the best of Big Data in your inbox.

Get our latest posts in your inbox

Subscribe Now