High Performance Computing - Charles Severance [10]
Many designs from the 1980s used a single cache for both instructions and data. But newer designs are employing what is known as the Harvard Memory Architecture, where the demand for data is segregated from the demand for instructions.
Main memory is a still a single large pool, but these processors have separate data and instruction caches, possibly of different designs. By providing two independent sources for data and instructions, the aggregate rate of information coming from memory is increased, and interference between the two types of memory references is minimized. Also, instructions generally have an extremely high level of locality of reference because of the sequential nature of most programs. Because the instruction caches don’t have to be particularly large to be effective, a typical architecture is to have separate L1 caches for instructions and data and to have a combined L2 cache. For example, the IBM/Motorola PowerPC 604e has separate 32-K four-way set-associative L1 caches for instruction and data and a combined L2 cache.
Virtual Memory*
Virtual memory decouples the addresses used by the program (virtual addresses) from the actual addresses where the data is stored in memory (physical addresses). Your program sees its address space starting at 0 and working its way up to some large number, but the actual physical addresses assigned can be very different. It gives a degree of flexibility by allowing all processes to believe they have the entire memory system to themselves. Another trait of virtual memory systems is that they divide your program’s memory up into pages — chunks. Page sizes vary from 512 bytes to 1 MB or larger, depending on the machine. Pages don’t have to be allocated contiguously, though your program sees them that way. By being separated into pages, programs are easier to arrange in memory, or move portions out to disk.
Page Tables
Say that your program asks for a variable stored at location 1000. In a virtual memory machine, there is no direct correspondence between your program’s idea of where location 1000 is and the physical memory systems’ idea. To find where your variable is actually stored, the location has to be translated from a virtual to a physical address. The map containing such translations is called a page table. Each process has a several page tables associated with it, corresponding to different regions, such as program text and data segments.
To understand how address translation works, imagine the following scenario: at some point, your program asks for data from location 1000. Figure 1.4 shows the steps required to complete the retrieval of this data. By choosing location 1000, you have identified which region the memory reference falls in, and this identifies which page table is involved. Location 1000 then helps the processor choose an entry within the table. For instance, if the page size is 512 bytes, 1000 falls within the second page (pages range from addresses 0–511, 512–1023, 1024–1535, etc.).
Therefore, the second table entry should hold the address of the page housing the value at location 1000.
Figure 1.4. Virtual-to-physical address mapping
The operating system stores the page-table addresses virtually, so it’s going to take a virtual-to-physical translation to locate the table in memory. One more virtual-to- physical translation, and we finally have the true address of location 1000. The memory reference can complete, and the processor can return to executing your program.
Translation Lookaside Buffer
As you can see, address translation through a page