No, seriously! Hear me out! Would a 3rd shoe make you run faster? No - having more shoes than feet really doesn't improve ones performance. Threads are the same way; if there are more threads than cores, your application isn't going to be any faster either.
In order to explain good and bad uses for threading, we need to first understand the design of computer systems, and how the central processing unit interacts with other subsystems. In particular, understanding what causes a CPU to stall (wait) is central to efficient threading.
To continue with the silly analogies, let's imagine you are playing chess over the phone. You can remember a few pieces in your brain, and maybe keep a few more by jotting notes on a small piece of paper, but to check the entire board, you have to talk over the phone to your partner. This is sort of how a CPU works - it has registers that are extremely quick to access, like your brain, then there is random access memory (RAM), which is like the note paper, and finally, there are IO devices like hard disks or network interfaces that are similar to the phone. Each one of these tiers takes longer to access, but can store more information. This is the concept of memory hierarchy - where each layer is further from the CPU and slower to access, but significantly larger than the previous layer.
From the perspective of our software, each CPU core can execute one machine code instruction at a time. The physical reality is more complex, as modern CPUs implement many forms of instruction parallelism, but this simplification does not detract from the overall picture. Some of these machine code instructions may execute in only a few clock cycles, whereas others may take longer. The cases we are most interested in for threading are the cases where the CPU is forced to stall, while it is waiting for other, non-CPU resources to respond. The most obvious cases are disk or network IO, but at the level of the CPU core, main memory access is extremely slow, and even cache access is slow relative to ALU and other in-core processing.
Staying with these simplified models, if a CPU core is spending its time working, and not waiting for input/output of some type, then adding more threads will not increase the throughput of one's program. In fact, it will likely reduce the throughput as there is non-trivial overhead involved in context switching between threads. If the CPU is waiting on other resources, then there is a good chance that more threads will in fact increase throughput - although this is dependent on also ensuring other non-CPU resources do not become a bottleneck.
Rule of thumb: If CPU usage (taskmgr.exe in windows, ps for linux, etc.) is high, adding more threads is unlikely to increase throughput. Conversely, if CPU usage is low and the task can be parallelized, adding more threads may increase throughput.
Omitted details: In order to keep this fairly straightforward, I omitted many details or may have even lied about how things actually work. The most obvious omissions are hyper threading and cache. Nevertheless, the simplified models here can be used to good effect.
-randy
Comments
You can follow this conversation by subscribing to the comment feed for this post.