OpenMPI
0.1.1
|
Functions specific to implementing failover support. More...
#include "ompi_config.h"
#include "opal_stdint.h"
#include "btl_openib.h"
#include "btl_openib_endpoint.h"
#include "btl_openib_proc.h"
#include "btl_openib_failover.h"
#include "opal/util/opal_sos.h"
Functions | |
static void | error_out_all_pending_frags (mca_btl_base_endpoint_t *ep, struct mca_btl_base_module_t *module, bool errout) |
This function will find all the pending fragments on an endpoint and call the callback function with OMPI_ERROR. More... | |
static void | mca_btl_openib_endpoint_notify (mca_btl_base_endpoint_t *endpoint, uint8_t type, int index) |
This function is used to send a message to the remote side indicating the endpoint is broken and telling the remote side to brings its endpoint down as well. More... | |
void | mca_btl_openib_dump_all_local_rdma_frags (mca_btl_openib_device_t *device) |
void | mca_btl_openib_dump_all_internal_queues (bool errout) |
This function is a debugging tool. More... | |
static void | dump_local_rdma_frags (mca_btl_openib_endpoint_t *endpoint) |
void | mca_btl_openib_handle_endpoint_error (mca_btl_openib_module_t *openib_btl, mca_btl_base_descriptor_t *des, int qp, ompi_proc_t *remote_proc, mca_btl_openib_endpoint_t *endpoint) |
This function is called when we get an error on the completion event of a fragment. More... | |
void | mca_btl_openib_handle_btl_error (mca_btl_openib_module_t *openib_btl) |
This functions allows an error to map out the entire BTL. More... | |
void | btl_openib_handle_failover_control_messages (mca_btl_openib_control_header_t *ctl_hdr, mca_btl_openib_endpoint_t *ep) |
This function gets called when a control message is received that is one of the following types: MCA_BTL_OPENIB_CONTROL_EP_BROKEN MCA_BTL_OPENIB_CONTROL_EP_EAGER_RDMA_ERROR message Note that we are using the working connection to send information about the broken connection. More... | |
static void | mca_btl_openib_endpoint_notify_cb (mca_btl_base_module_t *btl, struct mca_btl_base_endpoint_t *endpoint, struct mca_btl_base_descriptor_t *descriptor, int status) |
Functions specific to implementing failover support.
This file is conditionally copiled into the BTL when one configures it in with –enable-openib-failover. When this file is compiled in, the multi-BTL configurations can handle errors. The requirement is that there needs to be more than one openib BTL in use so that all the traffic can move to the other BTL. This does not support failing over to a different BTL like TCP.
void btl_openib_handle_failover_control_messages | ( | mca_btl_openib_control_header_t * | ctl_hdr, |
mca_btl_openib_endpoint_t * | ep | ||
) |
This function gets called when a control message is received that is one of the following types: MCA_BTL_OPENIB_CONTROL_EP_BROKEN MCA_BTL_OPENIB_CONTROL_EP_EAGER_RDMA_ERROR message Note that we are using the working connection to send information about the broken connection.
That is why we have to look at the various information in the control message to figure out which endpoint is broken. It is (obviously) not the one the message was received on, because we would not have received the message in that case. In the case of the BROKEN message, that means the remote side is notifying us that it has brought down its half of the connection. Therefore, we need to bring out half down. This is done because it has been observed that there are cases where only one side of the connection actually sees the error. This means we can be left in a state where one side believes it has two BTLs, but the other side believes it only has one. This can cause problems. In the case of the EAGER_RDMA_ERROR, see elsewhere in the code what we are doing.
ctl_hdr | Pointer control header that was received |
References mca_btl_base_endpoint_t::eager_rdma_local, mca_btl_base_endpoint_t::endpoint_proc, mca_btl_base_endpoint_t::endpoint_state, mca_btl_openib_device_t::endpoints, mca_btl_openib_module_t::error_cb, error_out_all_pending_frags(), mca_btl_openib_eager_rdma_local_t::head, mca_btl_openib_component_t::ib_num_btls, mca_btl_openib_module_t::lid, mca_btl_base_endpoint_t::nbo, opal_output_verbose(), opal_pointer_array_get_item(), opal_pointer_array_get_size(), mca_btl_openib_component_t::openib_btls, ORTE_PROC_MY_NAME, ompi_proc_t::proc_name, mca_btl_elan_proc_t::proc_ompi, mca_btl_base_endpoint_t::rem_info, and orte_process_name_t::vpid.
|
static |
This function will find all the pending fragments on an endpoint and call the callback function with OMPI_ERROR.
It walks through each qp with each priority and looks for both no_credits_pending_frags and no_wqe_pending_frags. It then looks for any pending_lazy_frags, pending_put_frags, and pending_get_frags. This function is only called when running with failover support enabled. Note that the errout parameter allows the function to also be used as a debugging tool to see if there are any fragments on any of the queues.
ep | Pointer to endpoint that had error |
module | Pointer to module that had error |
errout | Boolean which says whether to error them out or not |
References mca_btl_base_descriptor_t::des_cbfunc, mca_btl_base_descriptor_t::des_flags, mca_btl_base_endpoint_t::endpoint_btl, mca_btl_openib_endpoint_qp_t::no_credits_pending_frags, mca_btl_openib_endpoint_qp_t::no_wqe_pending_frags, mca_btl_openib_component_t::num_qps, opal_list_get_size(), opal_list_remove_first(), opal_output_verbose(), mca_btl_base_endpoint_t::pending_get_frags, mca_btl_base_endpoint_t::pending_lazy_frags, and mca_btl_base_endpoint_t::pending_put_frags.
Referenced by btl_openib_handle_failover_control_messages(), mca_btl_openib_dump_all_internal_queues(), mca_btl_openib_handle_btl_error(), and mca_btl_openib_handle_endpoint_error().
void mca_btl_openib_dump_all_internal_queues | ( | bool | errout | ) |
This function is a debugging tool.
If you notify a hang, you can call this function from a debugger and see if there are any messages stuck in any of the queues. If you call it with errout=true, then it will error them out. Otherwise, it will just print out the size of the queues with data in them.
References mca_btl_openib_device_t::endpoints, error_out_all_pending_frags(), mca_btl_openib_component_t::ib_num_btls, opal_pointer_array_get_item(), opal_pointer_array_get_size(), and mca_btl_openib_component_t::openib_btls.
|
static |
This function is used to send a message to the remote side indicating the endpoint is broken and telling the remote side to brings its endpoint down as well.
This is needed because there are cases where only one side of the connection determines that the there was a problem.
endpoint | Pointer to endpoint with error |
type | Type of message to be sent, can be one of two types |
index | When sending RDMA error message, index is non zero |
References mca_btl_base_endpoint_t::endpoint_btl, mca_btl_base_endpoint_t::endpoint_proc, mca_btl_openib_device_t::endpoints, mca_btl_openib_component_t::ib_num_btls, mca_btl_base_endpoint_t::nbo, opal_output_verbose(), opal_pointer_array_get_item(), opal_pointer_array_get_size(), mca_btl_openib_component_t::openib_btls, ORTE_PROC_MY_NAME, and mca_btl_elan_proc_t::proc_ompi.
Referenced by mca_btl_openib_handle_btl_error(), and mca_btl_openib_handle_endpoint_error().
void mca_btl_openib_handle_btl_error | ( | mca_btl_openib_module_t * | openib_btl | ) |
This functions allows an error to map out the entire BTL.
First a call is made up to the PML to map out all connections from this BTL. Then a message is sent to all the endpoints connected to this BTL. This function is enabled by the btl_openib_port_error_failover MCA parameter. If that parameter is not set, then this function does not do anything.
openib_btl | Pointer to BTL that had the error |
References mca_btl_base_endpoint_t::endpoint_state, mca_btl_openib_device_t::endpoints, mca_btl_openib_module_t::error_cb, error_out_all_pending_frags(), mca_btl_openib_module_t::lid, mca_btl_openib_endpoint_notify(), opal_pointer_array_get_item(), and opal_pointer_array_get_size().
void mca_btl_openib_handle_endpoint_error | ( | mca_btl_openib_module_t * | openib_btl, |
mca_btl_base_descriptor_t * | des, | ||
int | qp, | ||
ompi_proc_t * | remote_proc, | ||
mca_btl_openib_endpoint_t * | endpoint | ||
) |
This function is called when we get an error on the completion event of a fragment.
We check to see what type of fragment it is and act accordingly. In most cases, we first call up into the PML and have it map out this connection for any future communication. In addition, this function will possibly send some control messages over the other openib BTL. The first control message will tell the remote side to also map out this connection. The second control message makes sure the eager RDMA connection remains in a sane state. See that function for more details.
openib_btl | Pointer to BTL that had the error |
des | Pointer to descriptor that had the error |
qp | Queue pair that had the error |
remote_proc | Pointer to process that had the error |
endpoint | Pointer to endpoint that had the error |
References mca_btl_base_descriptor_t::des_cbfunc, mca_btl_base_descriptor_t::des_flags, mca_btl_base_descriptor_t::des_src, mca_btl_base_endpoint_t::endpoint_state, mca_btl_openib_module_t::error_cb, error_out_all_pending_frags(), mca_btl_base_endpoint_t::get_tokens, mca_btl_openib_module_t::lid, mca_btl_openib_endpoint_notify(), opal_list_remove_first(), opal_output_verbose(), and OPAL_THREAD_ADD32.