The default policy for the TTM page limit is 50%. As AI model usage
increases, users increasingly need to set higher limits for the amount
of memory that TTM can utilize.

This is normally done in one of two ways:
1) Increasing the carve out (VRAM) size on a UMA system.
2) Increasing the TTM pages limit module parameter

Increasing the carve out size has the unfortunate side effect that
the memory can't be reclaimed for other purposes besides the GPU use.

Increasing the TTM page limit works, but can be a bit clunky:
 * If you have a UKI with ttm included that has all the kernel parameters
   wrapped inside it means rebuilding the UKI to be able to change the
   value.
 * If you have ttm compiled into the kernel a modprobe.d file won't work.
 * If you have ttm in the initramfs then the initramfs needs to be
   rebuilt.

I wanted to come up with an alternative method to set this limit that
could potentially be wrapped by tools, or even by a knob/slider in the
system firmware.

My idea was that we could allocate an EFI variable that TTM will look at
to see what value was configured.
* If user configured module parameter, use that.
* If user configured EFI variable and sane; use this value.
* If user configured EFI variable but it's insane, cap it.
* If no EFI variable, stick to 50% policy.

Another potential advantage of this is that a vendor who ships a workstation
"intended" for AI use could potentially ship with this EFI variable
pre-populated so that larger models could be loaded without extra efforts to
the user.

Mario Limonciello (2):
  drm/ttm: Add EFI variable support for page limit configuration
  tools/drm: Add TTM EFI variable configuration utility

 drivers/gpu/drm/ttm/ttm_tt.c |  95 ++++++++++-
 tools/drm/ttm_efi_config.py  | 303 +++++++++++++++++++++++++++++++++++
 2 files changed, 396 insertions(+), 2 deletions(-)
 create mode 100755 tools/drm/ttm_efi_config.py

-- 
2.53.0

Reply via email to