OpenMPI  0.1.1
mpool_rgpusm_module.c File Reference

This memory pool is used for getting the memory handle of remote GPU memory when using CUDA. More...

#include "ompi_config.h"
#include "opal/align.h"
#include "orte/util/name_fns.h"
#include "orte/runtime/orte_globals.h"
#include "ompi/mca/mpool/rgpusm/mpool_rgpusm.h"
#include <errno.h>
#include <string.h>
#include "ompi/mca/rcache/rcache.h"
#include "ompi/mca/rcache/base/base.h"
#include "ompi/mca/mpool/base/base.h"
#include "ompi/runtime/params.h"
#include "ompi/mca/common/cuda/common_cuda.h"

Macros

#define OPAL_DISABLE_ENABLE_MEM_DEBUG   1
 
#define SET_PAGE_ALIGNMENT_TO_ZERO()
 
#define RESTORE_PAGE_ALIGNMENT()   mca_mpool_base_page_size_log = saved_page_size;
 
#define RGPUSM_MPOOL_NREGS   100
 

Functions

static bool mca_mpool_rgpusm_deregister_lru (mca_mpool_base_module_t *mpool)
 
void mca_mpool_rgpusm_module_init (mca_mpool_rgpusm_module_t *mpool)
 
int mca_mpool_rgpusm_register (mca_mpool_base_module_t *mpool, void *addr, size_t size, uint32_t flags, mca_mpool_base_registration_t **reg)
 register block of memory
 
void mca_mpool_rgpusm_free (mca_mpool_base_module_t *mpool, void *addr, mca_mpool_base_registration_t *registration)
 free function More...
 
int mca_mpool_rgpusm_find (struct mca_mpool_base_module_t *mpool, void *addr, size_t size, mca_mpool_base_registration_t **reg)
 find registration for a given block of memory
 
static bool registration_is_cachebale (mca_mpool_base_registration_t *reg)
 
int mca_mpool_rgpusm_deregister (struct mca_mpool_base_module_t *mpool, mca_mpool_base_registration_t *reg)
 deregister memory
 
void mca_mpool_rgpusm_finalize (struct mca_mpool_base_module_t *mpool)
 finalize mpool
 
int mca_mpool_rgpusm_ft_event (int state)
 Fault Tolerance Event Notification Function. More...
 

Variables

static size_t saved_page_size
 

Detailed Description

This memory pool is used for getting the memory handle of remote GPU memory when using CUDA.

Hence, the name is "rgpusm" for "remote CUDA" GPU memory. There is a cache that can be used to store the remote handles in case they are reused to save on the registration cost as that can be expensive, on the order of 100 usecs. The cache can also be used just to track how many handles are in use at a time. It is best to look at this with the three different scenarios that are possible.

  1. mpool_rgpusm_leave_pinned=0, cache_size=unlimited
  2. mpool_rgpusm_leave_pinned=0, cache_size=limited
  3. mpool_rgpusm_leave_pinned=1, cache_size=unlimited (default)
  4. mpool_rgpusm_leave_pinned=1, cache_size=limited.

Case 1: The cache is unused and remote memory is registered and unregistered for each transaction. The amount of outstanding registered memory is unlimited. Case 2: The cache keeps track of how much memory is registered at a time. Since leave pinned is 0, any memory that is registered is in use. If the amount to register exceeds the amount, we will error out. This could be handled more gracefully, but this is not a common way to run, so we will leave as is. Case 3: The cache is needed to track current and past transactions. However, there is no limit on the number that can be stored. Therefore, once memory enters the cache, and gets registered, it stays that way forever. Case 4: The cache is needed to track current and past transactions. In addition, a list of most recently used (but no longer in use) registrations is stored so that it can be used to evict registrations from the cache. In addition, these registrations are deregistered.

I also want to capture how we can run into the case where we do not find something in the cache, but when we try to register it, we get an error back from the CUDA library saying the memory is in use. This can happen in the following scenario. The application mallocs a buffer of size 32K. The library loads this in the cache and registers it. The application then frees the buffer. It then mallocs a buffer of size 64K. This malloc returns the same base address as the first 32K allocation. The library searches the cache, but since the size is larger than the original allocation it does not find the registration. It then attempts to register this. The CUDA library returns an error saying it is already mapped. To handle this, we return an error of OMPI_ERR_WOULD_BLOCK to the memory pool. The memory pool then looks for the registration based on the base address and a size of 4. We use the small size to make sure that we find the registration. This registration is evicted, and we try to register again.

Macro Definition Documentation

#define SET_PAGE_ALIGNMENT_TO_ZERO ( )
Value:
saved_page_size = mca_mpool_base_page_size_log; \
mca_mpool_base_page_size_log = 0;

Function Documentation

void mca_mpool_rgpusm_free ( mca_mpool_base_module_t mpool,
void *  addr,
mca_mpool_base_registration_t registration 
)

free function

free memory allocated by alloc function

References mca_mpool_rgpusm_deregister().

int mca_mpool_rgpusm_ft_event ( int  state)

Fault Tolerance Event Notification Function.

Parameters
stateCheckpoint Stae
Returns
OMPI_SUCCESS or failure status