A processor that issues one instruction per clock never gets an IPC more than one. To improve performance, we must issue more than one instruction per clock. There are two main ways to achieve this: 1) VLIWs, and 2) superscalars. So, till here, we know that both are multiple-issue processors. But what are the differences? The below came to my mind.
- In VLIWs, the compiler identifies and bundles independent instructions that can be issued and executed in the same cycle. In superscalars, the hardware does the task. Apparently, the compiler can help superscalars by gathering potential candidates close together.
- In VLIWs, bundles include a fixed number of instructions. Thus, a fixed number of instructions are issued every cycle. In superscalar, the number of issued instructions can vary.
- In VLIWs, if compiler cannot find sufficient independent instructions to fill the bundle, it must fill it with no-ops. It bloats the code and increases the code size. In superscalars, this is not the case.
- In VLIWs, the compiler performs loop unrolling. This increases code size, too. In superscalars, the hardware unrolls the loops and thus, code size does not change.
- In VLIWs, it is hard to maintain binary compatibility at least because most VLIWs do not incorporate hazard detection hardware. So, if the microarchitecture is changed, the machine code may no longer execute correctly, or at least will not make the most of the modified microarchitecture from a performance point of view. Thus, recompilation is necessary with a microarchitectural change in VLIWs. However, this is not the case for superscalars.
- In VLIWs, instructions inside a bundle must be executed in lockstep with each other. For instance, if one gets stalled due to a cache miss, other instructions must also be stalled. This is not the case in supercalars.