In computing, PSE-36 (36-bit Page Size Extension)[1] refers to a feature of x86 processors that extends the physical memory addressing capabilities from 32 bits to 36 bits, allowing addressing to up to 64 GB of memory.[2] Compared to the Physical Address Extension (PAE) method, PSE-36 is a simpler alternative to addressing more than 4 GB of memory. It uses the Page Size Extension (PSE) mode and a modified page directory table to map 4 MB pages into a 64 GB physical address space. PSE-36's downside is that, unlike PAE, it doesn't have 4-KB page granularity above the 4 GB mark.[3]
PSE-36 was introduced into the x86 architecture with the Pentium II Xeon and was initially advertised as part of the "Intel Extended Server Memory Architecture"[2][4] (sometimes abbreviated ESMA[5]), a branding which also included the slightly older PAE (and thus the Pentium Pro, which only supported PAE, was advertised as having only "subset support" for ESMA).[1]
The heyday of PSE-36 was relatively brief. PSE-36's main advantage was that, unlike PAE, it required little rework of the operating system's internals, and thus PSE-36 proved a suitable stopgap measure[6] around the Windows NT 4.0 Enterprise Edition timeframe. Newer Microsoft operating systems, including Windows 2000, support only PAE.[7] Some operating systems like Linux skipped PSE-36 entirely.[8] Despite this, AMD and later Intel chose to provide up to 40 bits PSE support in their 64-bit processors, when operated in legacy mode.
Operation
Detection
Support for PSE-36 is indicated by EDX bit 17 (counting from 0) in the cpuid result for feature bits. (This is a different bit from plain PSE support, which is indicated by bit 3 in the same register).[9][10]
Activation and use
As far as activating PSE-36, there isn't however a separate bit from the one that turns on PSE.[10] As long the processor (as indicated by cpuid) and chipset support PSE-36, enabling PSE alone (by setting bit 4, PSE, of the system register CR4
) allows the use of large 4 MB pages (in the 64 GB range) along with normal 4 KB pages (which are however restricted to the 4 GB range).[10]
If newer PSE-36 capability is available on the CPU, as checked using the CPUID instruction, then 4 more bits, in addition to the 10 bits used in PSE, are used inside a page directory entry pointing to a large page. This allows a large page to be located in 36 bit address space.[10]
The PS bit (bit 7) in the Page Directory Entry (PDE) denotes whether this entry refers to a page table (that describes 1024 4-KiB pages) or one 4 MB page. PDE structures in normal mode, PSE mode, and PSE-36 mode are as follows:
31–22 | 21–17 | 16–13 | 12 | 11–9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
non-PSE | base address of page table | avail | 0 | PS=0 | ign | A | PCD | PWT | U | W | P | |||||||||||||||||||||
PSE | bit 31..22 of page frame address | reserved (must be zero) | PAT[a] | avail | 0 | PS=1 | D[b] | A | PCD | PWT | U | W | P | |||||||||||||||||||
PSE-36 | bit 31..22 of page frame address | reserved (must be zero) | bit 35..32 of page frame address | PAT | avail | 0 | PS=1 | D | A | PCD | PWT | U | W | P |
- Page attribute table; since Pentium III, must be zero for older CPUs.
- "Dirty" bit: set to 1 by CPU if there was a write access to that page. For 4 KiB pages this flag exists in the according page table entry (PTE).
Extension up to 40 bits
AMD extends this scheme to 40 address bits by interpreting bits 20..13 of a PDE as bit 39..32 of the page base address in their AMD64 processors when operated in legacy mode, so only bit 21 is reserved (must be zero). Note however that CR4.PSE is ignored in long mode and PSE-style 4 MB pages are not available in that mode.[11] The total amount of physical memory addressable in AMD64 legacy mode using PSE 4-MB pages is, thus, 1024 GB.[6] Tom Shanley has called this extension PSE-40,[6] although such a designation does not appear in the official AMD documentation.[11]
The latest Intel manuals (February 2014) also indicate support for up to 40 bits in PSE. The exact number of PSE bits supported on Intel CPUs can be less though, and must be determined by using CPUID to query the maximum physical-address width supported by the processor by invoking CPUID with function 80000008H and checking the result in EAX[7:0].[12]
Usage
Practical usefulness of the PSE-36 feature depends on chipset support for more than 4 GB of RAM. Most chipsets from the Pentium II timeframe did not support this much memory, with 1 GB being the maximum for the Intel 440BX typical desktop chipset, and 2 GB for the 440GX workstation chipset. Only the high-end server Intel 450NX chipset supported 8 GB.[2][13] Support for PSE-36 (ESMA) was thus usually advertised for servers.[4]
As suitable operating system supporting PSE-36, in 1998 Intel advertised Microsoft Windows NT Server, Enterprise Edition 4.0 and supposedly the upcoming NT 5.0, both enabling use via a PSE36 device driver,[1] which kept most of the operating system unaware of PSE-36 (only the PSE36 driver enabled it temporarily), and which driver had to be called by applications that wanted to access more than 4 GB.[6] Windows NT 4.0 Enterprise Edition thus used the PSE-36 feature essentially as a RAM disk.[3] The PSE36 driver was used by some applications on Windows NT 4.0 Enterprise Edition servers, for example SAP liveCache,[14] Microsoft SQL Server 7.0,[7] Oracle 8.1.5,[15] and IBM DB2.[16] The tuning documentation for the latter noted however that "Unfortunately in most cases performance gains obtained using the PSE-36 driver are not spectacular. In many cases the server will run slower with 8 GB using the PSE-36 driver than it runs with 4 GB without the driver. [...] After more than a year of experimentation and tuning, Microsoft and IBM dropped support for PSE-36 due to insufficient performance gains. The driver is still available for vendors from Intel, but it is not useful for end customer use."[16]
Windows 2000 (NT 5.0) ended up not supporting PSE-36,[7] due to low performance when compared with the alternative PAE.[3] Windows 2000 also replaced the API of the PSE36 driver with a new API called Address Windowing Extensions (AWE), which used PAE underneath.[7][15] (AWE was only available in the Datacenter Server and Advanced Server of Windows 2000.) Windows applications consequently migrated to this new API, e.g. starting with Oracle 8.1.6[15] or MS SQL Server 2000.[7]
Compared to PAE
Physical Address Extension (PAE) is an alternative to PSE-36 which also allows 36-bit addressing. PSE-36 has the advantages that the hierarchy of page tables is not changed, and that page entries keep their old 32-bit format and are not extended to 64 bits. The obvious disadvantage of PSE-36 is that only large pages can be located in 64 GB of physical memory, and small pages can still be located only in the first 4 GB of physical memory.[3]
Intel Extended Server Memory Architecture
The Intel Extended Server Memory Architecture is defined to include two 36-bit addressing modes in the core processor: PAE-36 and PSE-36.[1]
See also
References
- 1 2 3 4 "The Intel Extended Server Memory Architecture" (PDF). Intel Order Number: 243846-001. 1998. Retrieved 2014-03-01.
- 1 2 3 "Netinfinity Performance Tuning with Windows NT 4.0" (PDF). Redbooks.ibm.com. pp. 51–52. Retrieved 2014-03-01.
- 1 2 3 4 "Operating Systems and PAE Support". Msdn.microsoft.com. 2006-07-14. Retrieved 2014-03-01.
- 1 2 Deni Connor (7 December 1998). "Here come the eight-way Xeon servers". Network World: The Leader in Network Knowledge. Network World: 19. ISSN 0887-7661.
- ↑ Michael Missbach; Uwe M. Hoffmann (2000). SAP Hardware Solutions. Prentice Hall Professional. p. 62. ISBN 978-0-13-028084-8.
- 1 2 3 4 Tom Shanley (2009). x86 Instruction Set Architecture. MindShare Press. pp. 578–579. ISBN 9780977087853.
- 1 2 3 4 5 Sajal Dam (2004). SQL Server Query Performance Tuning Distilled. Apress. p. 28. ISBN 978-1-4302-0407-7.
- 1 2 Daniel P. Bovet; Marco Cesati (17 November 2005). Understanding the Linux Kernel. "O'Reilly Media, Inc.". p. 52. ISBN 978-0-596-55491-0.
- ↑ Intel Processor Identification and the CPUID Instruction Archived 2013-07-24 at Wikiwix, Intel application note AP-485
- 1 2 3 4 Tom Shanley (2005). The Unabridged Pentium 4: IA32 Processor Genealogy. Addison Wesley Professional. pp. 732–736. ISBN 978-0-321-24656-1.
- 1 2 AMD Corporation (September 2012). "Volume 2: System Programming" (PDF). AMD64 Architecture Programmer's Manual (3.22 ed.). AMD Corporation. pp. 25–26 and 125–126. Retrieved 2014-02-17.
- ↑ "Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1" (PDF). Intel. p. "4-5" and "4-11".
If the PSE-36 mechanism is not supported, M is 32, and this row does not apply. If the PSE-36 mechanism is supported, M is the minimum of 40 and MAXPHYADDR (this row does not apply if MAXPHYADDR = 32). See Section 4.1.4 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is supported. [...] CPUID.80000008H:EAX[7:0] reports the physical-address width supported by the processor. (For processors that do not support CPUID function 80000008H, the width is generally 36 if CPUID.01H:EDX.PAE [bit 6] = 1 and 32 otherwise.) This width is referred to as MAXPHYADDR. MAXPHYADDR is at most 52.
- ↑ Intel's Pentium II Xeon Processor. The New Chipsets For The Pentium II Xeon
- ↑ "How does the liveCache < 7.4 use PSE36/AWE". Stechno.net. 2003-04-04. Retrieved 2014-03-01.
- 1 2 3 Michael R. Ault (2003-02-17). "Increasing Available Memory in Linux and Windows" (PDF). ROBO Books White Paper. pp. 10–12. Retrieved 2014-03-01.
- 1 2 Tuning IBM xSeries Servers for Performance (PDF) (3rd ed.). IBM SG24-5287-02. June 2002. p. 97. Archived from the original (PDF) on 2014-03-03.