9 min read
Note: This article references commands, behaviors, and outputs generated by Linux-based operating systems, such as CentOS or Ubuntu. Some information will not be relevant to other operating systems, such as Windows.
This is a group of four blog posts that provides an introduction to processor technologies, especially in the context of distributed computing environments. Part one covered instruction set architecture and instruction set representation; part two covered processor design and efficiencies; part three covered monitoring CPU usage and bottlenecks. These blog posts are part of a larger in-depth series called A Study of Performance in Distributed Computing Environments.
When a computing task is important, we generally have certain expectations for its performance. The specific types of expectations we have are generally related to the nature or purpose of the computing task. Those expectations tend to be described using one of the following types of metrics:
Defined as the time it takes to complete a unit of work. For instance, a common pattern is to have a server that listens for requests from remote systems, processes them, and returns a response. The elapsed time from the arrival of a request to the return of a response is the service response time for this example.
Such a requirement is often expressed as requiring service response times to be below a threshold. As performance is a complex topic, these types of requirements are also often expressed as requiring service response times to be below a threshold for a specific percentage of work units – for instance, requiring a web server to respond to >99% of requests in <100 ms.
As computing resources are limited, this type of requirement is often also expressed in conjunction with an upper limit on the rate of requests that can be handled within the required response time – for instance, requiring a web server to respond to >99% of requests in <100 ms up to a rate of 1000 requests per second. When the rate of requests per second exceeds 1000, the server might not have sufficient processing cycles to generate responses in an average time of <100 ms.
This type of requirement is often associated with applications that are described as "real-time."
Defined as a capacity to complete units of work at or above a specific rate – for instance, being able to complete a million complex calculations per minute. On the surface, this type of metric may seem similar to a service response time metric. And it is, but a throughput metric tends to place more importance on the number of units of work completed in a set period of time over the time it takes to complete any one individual unit of work.
This type of requirement is often associated with applications that are described as "batch-oriented." As a further example, data may arrive over the course of an hour, then once an hour; a computing job is started that processes the data that arrived over the previous hour. On average, 10 million data points may arrive in an hour. To ensure the computing job does not fall behind on processing incoming data points, it may be designed with a requirement that it is able to process data points at a rate of 12 million per hour. In this case, the time it takes to process a single data point is not important. Rather, it is important that the computing job is able to process data points at a rate >10 million per hour.
Let's consider the example of a recommendation system. Such a system may regularly process very large amounts of historic data to generate clusters of related content using a k-means algorithm. Then, individual events may be run through a k-nearest neighbors algorithm to determine the k-means cluster to which it is most similar. With this kind of architecture, for instance, the k-means algorithm may run twice a day to rebuild the cluster definitions and is expected to complete within 30 minutes each time. The individual events may need to be processed within 200 ms. Thus, the k-means part of the application architecture may be considered batch-oriented and may need to meet a throughput metric, while the k-nearest neighbors part of the application may be considered real-time and may need to meet a service response time metric.
This isn't directly related to computer processors, but I think it's worthwhile to bring it up in this article as CPU time is a resource that can be critical to reliable functioning of real-time applications and can also be heavily utilized during batch processing. If those computing tasks are collocated, setting appropriate priorities for the real-time processes can be very important, as can be assigning isolated CPUs to run the real-time processes only. By using CPU isolation and scheduling priorities, you can generally meet service-level requirements by avoiding CPU starvation of critical processes even in the presence of CPU contention.
As I've already pointed out, the accounting performed by the Linux kernel is imprecise, and in some cases it can be wildly inaccurate. There are also a variety of metrics and details that are not easy to express and not well represented in common monitoring utilities. This creates challenges for determining when CPU bottlenecks are present and how that manifests for individual processes as CPU starvation.
While the amount of time a process was scheduled to a CPU is (roughly) accounted for by the kernel, the amount of time that process was in the running state (and hence could have been scheduled) is not tracked. Thus, we can not determine the degree of starvation a process encounters unless that process tracks its running time on its own. Further, for a process to track the time it was in the running state can be non-trivial as there can be many points in the program's source code that need to be tracked, not to mention that the program may have a reliance on other libraries that are highly unlikely to do any tracking.
It is also difficult to determine the extent to which a bottleneck exists at the CPU core level. Since each process can have its own unique set of cores on which it is allowed to be scheduled, it is challenging just to get a count of the number of processes which can run on a given core, let alone track the states of those processes and correlate it per-core.
Linux does track the total number of processes in the running state, but that is very high level. It may show 32 processes in the running state on a 32 core system, which might seem to indicate there is no CPU bottleneck. However, those 32 processes might be restricted to running on 16 cores, in which case there very much is a CPU bottleneck and resultant starvation.
I'd love to be concluding this article with some key metric that can be used to accurately and succinctly describe everything relevant to CPU bottlenecks and starvation, but that just doesn't exist. Such a solution is feasible, but I am unaware of the existence of one, likely due to the complexity.
Computer processors are complex. Hopefully, armed with the information in this article, you'll have a good understanding of how they work and what you need to look at to understand how they are being used.
These blog posts are part of a larger in-depth series called A Study of Performance in Distributed Computing Environments.
Stay ahead of the bleeding edge...get the best of Big Data in your inbox.