http://www.technovelty.org/linux/bigmemory.htmlDiscontiguous MemoryUnderstanding why we have discontiguous memory requires some understanding of the chipsets the power Itanium systems. The ZX1 chipset, which powers the HP Integrity line of servers, is one of the more common ones. Others chipsets would follow similar principles. With the ZX1, the processor is not connected directly to RAM chips, but runs through a memory I/O controller or MIO for short. It looks something like +------------------------+ Processor Bus Memory Bus +-----------------------+
| Itanium 2 Processor 1 | <--------+ +---------------+ +-------> | RAM DIMM |
+------------------------+ |------> | ZX1 MIO |<------| +-----------------------+
| Itanium 2 Processor 2 | <--------+ +---------------+ +-------> | RAM DIMM |
+------------------------+ | | | | | | | | +-----------------------+
Ropes-> | | | | | | | |
+---------------+
| ZX1 IOA |
+---------------+
| | |
agp pci pci-x
From a top level view, the chipset is broadly divided into the System Bus Adapter (or SBA) and the Lower Bus Adapater (LBA). You can think of the SBA as an interface between the bus that the processors sits on and the ropes bus (below). The LBA is the PCI Host bus adapater, and contains the IO SAPIC (Streamlined Advanced Programmable Interrupt Controller, this chip handles interrupts from PCI devices, and can be thought of as a smart "helper" for the CPU that controls when it sees interrupts). A rope is described as a "fast, narrow point-to-point connection between the zx1 mio and a rope guest device (the zx1 ioa). The rope guest device is responsible for bridging from the I/O rope to an industry standard I/O bus (PCI, AGP, and PCI-X)". The ZX1 chipset can support up to 8 ropes, which can be bundled. This is distinct from the typical North bridge/South Bridge (where the north bridge connects to the CPU/Memory and south bridge, and the south bridge connects to the north bridge and slower I/O pehperials) that it used in 386 architectures. This does however introduce some more interesting timing problems. A simplified map of the memory layout presented by the ZX1 chipset MIO looks like 0xFFF FFFF FFFF +-------------------------------------+
| Unused |
0xF80 0000 0000 +-------------------------------------+
| Greater than 4GB MMIO : |
| (not currently used by anything) |
0xF00 0000 0000 +-------------------------------------+
| Unused |
....
| |
0x041 0000 0000 +-------------------------------------+
| Memory 1 (3GB) |
| |
| |
0x040 4000 0000 +-------------------------------------+
| Unused |
0x040 0000 0000 +-------------------------------------+
| Memory 2 (252GB) |
....
| |
0x001 0000 0000 +-------------------------------------+
| Less than 4GB MMIO : |
| Access to firmware, processor |
| reserved space, chipset registers |
| etc (3GB) |
0x000 8000 0000 +-------------------------------------+
| Virutal I/O (1GB) |
0x000 4000 0000 +-------------------------------------+
| Memory 0 (1GB) |
0x000 0000 0000 +-------------------------------------+
You can see it implements a 44 bit address space that is divided up into a range of different sections. The maximum memory it can theoretically support is 256GB (i.e. Memory 0 + Memory 1 + Memory 2) however physically I think most boxes only comes with enough slots for ~16Gb of memory. The ZX1 has an extension called the zx1 sme or Scalable Memory Extension that allows more memory to be delt with. 1.3. Looking at the layout in more depth 1.3.1. Virtual I/O The ZX1 chipset implements an IO-MMU, or IO Memory Managment unit, for DMA. This is very similar to the standard MMU, or memory managment unit in that converts a virtual address into a physical address. This is particularly important for a card 32 bit card that can not understand a 64 bit address. By setting up the virtual I/O region under 4GB (i.e. the maximum address a 32 bit card can deal with) the IOMMU can translate the address back to a full 64 bit address and give the 32 bit card access to all memory. So the process goes something like
The IOTLB in the ZX1 chipset is only big enough to map 1GB at time, which is why this area is only 1GB. This means there can be contention for this space if many drivers are requesting large DMA transfers -- however note 64 bit capable cards can bypass this all together reducing the problem. As you can see from the
layout above, system RAM is not presented
to the processor in a contiguous manner; that is all the memory in one
large block. It is split up into three different sections (Memory 0,
1, 2) which are placed in different areas of the address space. Memory
0 needs to be around for legacy applications that expect certain
things at low addresses. The rest of the < 4Gb area is taken up by
the
Low Memory Mapped IO region; the physical memory that would have been
mapped here is moved into Memory 1, which is located at The other thing you
need to know about is how the Linux kernel
looks at memory. The kernel keeps an array of every single physical
frame of memory in an array of struct pages in a global variable
called If we keep This is also an issue for systems that may not have such a large gap imposed by the chipset, but receive such a large gap by participating inmposed by the chipset, but receive such a large gap by participating in a NUMA (non-uniform memory access) system. In a NUMA system, many individual nodes "pool" their memory into one large memory image. In this case, each node of the system will have gaps in their memory that is actually physically residing on another, remote, node. The upshot is that they will suffer from the same wasted space.
|
