Sunday, 14 February 2016

Why not make one big CPU core?



I don't understand why CPU manufacturers make multi-core chips. Scaling of multiple cores is horrible, this is highly application specific, and I am sure you can point out certain program or code that runs great on many cores, but most of the time the scaling is garbage. It's a waste of silicon die space and a waste of energy.


Games, for example, almost never use more than four cores. Science and engineering simulations like Ansys or Fluent is priced by how many cores the PC it runs on have, so you pay more because you have more cores, but the benefit of more cores becomes really poor past 16 cores, yet you have these 64 core workstations... it's a waste of money and energy. It is better to buy a 1500 W heater for the winter, much cheaper.


Why don't they make make a CPU with just one big core?


I think if they made a one-core equivalent of an eight-core CPU, that one core would have a 800% increase in IPC, so you would get the full performance in all programs, not just those that are optimized for multiple cores. More IPC increase performance everywhere, it's reliable and simple way to increase performance. Multiple cores increase performance only in limited number of programs, and the scaling is horrible and unreliable.




Answer



The problem lies with the assumption that CPU manufacturers can just add more transistors to make a single CPU core more powerful without consequence.


To make a CPU do more, you have to plan what doing more entails. There are really three options:




  1. Make the core run at a higher clock frequency - The trouble with this is we are already hitting the limitations of what we can do.


    Power usage and hence thermal dissipation increases with frequency - if you double the frequency you nominally double the power dissipation. If you increase the voltage your power dissipation goes up with the square of voltage.


    Interconnects and transistors also have propagation delays due to the non-ideal nature of the world. You can't just increase the number of transistors and expect to be able to run at the same clock frequency.


    We are also limited by external hardware - mainly RAM. To make the CPU faster, you have to increase the memory bandwidth, by either running it faster, or increasing the data bus width.








  1. Add more complex instructions - Instead of running faster, we can add a more rich instruction set - common tasks like encryption etc. can be hardened into the silicon. Rather than taking many clock cycles to calculate in software, we instead have hardware accelleration.


    This is already being done on Complex Instruction Set (CISC) processors. See things like SSE2, SSE3. A single CPU core today is far far more powerful than a CPU core from even 10 years ago even if run at the same clock frequency.


    The trouble is, as you add more complicated instructions, you add more complexity and make the chip gets bigger. As a direct result the CPU gets slower - the acheivable clock frequencies drop as propagation delays rise.


    These complex instructions also don't help you with simple tasks. You can't harden every possible use case, so inevitably large parts of the software you are running will not benefit from new instructions, and in fact will be harmed by the resulting clock rate reduction.


    You can also make the data bus widths larger to process more data at once, however again this makes the CPU larger and you hit a tradeoff between throughput gained through larger data buses and the clock rate dropping. If you only have small data (e.g. 32-bit integers), having a 256-bit CPU doesn't really help you.








  1. Make the CPU more parallel - Rather than trying to do one thing faster, instead do multiple things at the same time. If the task you are doing lends itself to operating on several things at a time, then you want either a single CPU that can perform multiple calculations per instruction (Single Instruction Multiple Data (SIMD)), or having multiple CPUs that can each perform one calculation.


    This is one of the key drivers for multi-core CPUs. If you have multiple programs running, or can split your single program into multiple task, then having multiple CPU cores allows you to do more things at once.


    Because the individual CPU cores are effectively seperate blocks (barring caches and memory interfaces), each individual core is smaller than the equivalent single monolithic core. Because the core is more compact, propagation delays reduce, and you can run each core faster.


    As to whether a single program can benefit from having multiple cores, that is entirely down to what that program is doing, and how it was written.





No comments:

Post a Comment

arduino - Can I use TI's cc2541 BLE as micro controller to perform operations/ processing instead of ATmega328P AU to save cost?

I am using arduino pro mini (which contains Atmega328p AU ) along with cc2541(HM-10) to process and transfer data over BLE to smartphone. I...