1. Ultimately it is because then some instructions can execute with one less stall (no-op).
Think about the following example: ADD R5, R2, R1 SW R5, 32(R1) SUB R3, R5, R0 Let's try out the status quo, write first, read second:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
table {
border-collapse: collapse;
width: 100%;
margin-top: 20px;
}
th, td {
border: 1px solid #000;
padding: 8px;
text-align: center;
min-width: 60px;
position: relative;
}
th {
background-color: #ddd;
}
.bubble {
background-color: #fdd;
font-style: italic;
}
.circle {
display: inline-block;
padding: 5px;
border: 2px solid red;
border-radius: 50%;
font-weight: bold;
}
</style>
</head>
<body>
<table>
<tr>
<th>Instruction</th>
<th>Cycle 1</th>
<th>Cycle 2</th>
<th>Cycle 3</th>
<th>Cycle 4</th>
<th>Cycle 5</th>
<th>Cycle 6</th>
<th>Cycle 7</th>
<th>Cycle 8</th>
<th>Cycle 9</th>
</tr>
<tr>
<td>ADD R5, R2, R1</td>
<td>IF</td>
<td>ID</td>
<td>EX</td>
<td>MEM</td>
<!-- Wrap WB in a span to circle it -->
<td><span class="circle">WB</span></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<!-- Two stall cycles (bubbles) inserted after ADD -->
<tr class="bubble">
<td>Stall</td>
<td></td>
<td>Stall</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr class="bubble">
<td>Stall</td>
<td></td>
<td></td>
<td>Stall</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SW R5, 32(R1)</td>
<td></td>
<td></td>
<td></td>
<td>IF</td>
<!-- Wrap ID in a span to circle it -->
<td><span class="circle">ID</span></td>
<td>EX</td>
<td>MEM</td>
<td>WB</td>
<td></td>
</tr>
<tr>
<td>SUB R3, R5, R0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>IF</td>
<td>ID</td>
<td>EX</td>
<td>MEM</td>
<td>WB</td>
</tr>
</table>
</body>
</html>
Notice how ADD's WB (in red circle) is executed in the same clock cycle with SW's ID (also in red circle)? This is possible since the register file writes ADD instruction's result of R2+R1
into R5
first, then SW instruction fetches data in register R5
, ensuring no data hazard.
Then, let's do read first, write second.
Since we need to read after r5
is updated to avoid data hazard, we need to make sure ADD instruction finishes WB (writeback) and then SW can fetch the register r5
's data:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
table {
border-collapse: collapse;
width: 100%;
margin-top: 20px;
}
th, td {
border: 1px solid #000;
padding: 8px;
text-align: center;
min-width: 60px;
position: relative;
}
th {
background-color: #ddd;
}
.bubble {
background-color: #fdd;
font-style: italic;
}
.circle {
display: inline-block;
padding: 5px;
border: 2px solid red;
border-radius: 50%;
font-weight: bold;
}
</style>
</head>
<body>
<table>
<tr>
<th>Instruction</th>
<th>Cycle 1</th>
<th>Cycle 2</th>
<th>Cycle 3</th>
<th>Cycle 4</th>
<th>Cycle 5</th>
<th>Cycle 6</th>
<th>Cycle 7</th>
<th>Cycle 8</th>
<th>Cycle 9</th>
</tr>
<tr>
<td>ADD R5, R2, R1</td>
<td>IF</td>
<td>ID</td>
<td>EX</td>
<td>MEM</td>
<!-- Wrap WB in a span to circle it -->
<td><span class="circle">WB</span></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<!-- Two stall cycles (bubbles) inserted after ADD -->
<tr class="bubble">
<td>Stall</td>
<td></td>
<td>Stall</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr class="bubble">
<td>Stall</td>
<td></td>
<td></td>
<td>Stall</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr class="bubble">
<td>Stall</td>
<td></td>
<td></td>
<td></td>
<td>Stall</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SW R5, 32(R1)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>IF</td>
<!-- Wrap ID in a span to circle it -->
<td><span class="circle">ID</span></td>
<td>EX</td>
<td>MEM</td>
<td>WB</td>
</tr>
<tr>
<td>SUB R3, R5, R0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>IF</td>
<td>ID</td>
<td>EX</td>
<td>MEM</td>
</tr>
</table>
</body>
</html>
Notice how this time, we have to execute SW's ID one clock cycle later, since the register(r5
) will not be written first, pushing an extra stall to read the register in the next clock cycle
2. Let's look at the implementation of a register file.
Implementation of a 2^2 x 5 Register File
This implementation is from a lab of UC Riverside.
Let's look at the circuit. We can see that there are generally 2 paths: write and read. at clock edge, both write and read lines will pass in their current values (both data and addresses, if enabled).
Let's look at read first. The data from the desired registers will pass through driver and passed onto the 32 bit bus. Then let's look at write. The data will load the corresponding registers. But if you follow that bus, you will see the data will follow the bus unto the same path as the read data. Therefore, even if the read was passed through first, the write data will over write that in the databus of the register file.
However, you can see that this design does not handle clock, hence I could not really fully answer your question. Please let me know if you find a circuit of a register file that has clock. But I can imagine a and logic between the clock and write_enable or read_enable, like you mentioned in your question. I will do more research as well on how register file is implemented in real life, especially synchronously.
I have found this website that animates the data path to better understand what the pipeline does at each instruction.
Also, your textbook is very detailed, especially on this topic in chapter seven. Pity that it doesn't show a schematic of a register file anywhere in the book.