1 / 16

Request ordering for FI_MSG and FI_RDM endpoints

Request ordering for FI_MSG and FI_RDM endpoints. 29 April ‘14. Something needed so consumers of libfabric stay sane. A few type of endpoints with simple ordering rules that are reasonably easy to understand

kirsi
Télécharger la présentation

Request ordering for FI_MSG and FI_RDM endpoints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Request ordering for FI_MSG and FI_RDM endpoints 29 April ‘14

  2. Something needed so consumers of libfabric stay sane • A few type of endpoints with simple ordering rules that are reasonably easy to understand • The ordering rules should allow for sufficient flexibility so that different providers can provide maximum performance while also insuring program correctness

  3. Background – what is ordering? Ordering is an end-to-end concept and may include some or all of the following: • Expectations of the API consumer w.r.t. the order of execution of operations posted to the fabric provider • Execution of operations as expressed on the wire • Ordering of information as packets/flits/msgs cross the wire • Ordering of inbound operations (both inbound requests and inbound RDMA operations) • Order in which inbound data is placed on the memory bus • Order in which inbound data is written to memory by the memory controller • Expectations of the API consumer w.r.t. the order in which operations are completed/notified

  4. IB: The concise ordering rules Operations on the SEND queue are transmitted on the wire in order. Operations at the RESPONDER side are executed in the order received. A SEND or RDMA WRITE may be executed before an RDMA READ! Operations on the SEND queue are completed in the order in which they were posted. 3 responder SEND 2 RDMA RD 1 SEND RDMA RD SEND 1 SEND 3 ??? READ DATA ACK 3 ACK 1 www.openfabrics.org

  5. What’s man –l man/fi_getinfo.3 have now? • FI_MSG - Provides reliable, in-order message based communication, with data transfers maintaining message boundaries. Hmmm. Okay, so if you bought an adaptive network, you wasted your money. • FI_RDM - Provides reliable datagram communication without ordering guarantees. – Hmmm. Okay, does PSM really work this way?MPI can’t use this mode easily.

  6. It’s worse than that • IB provides ordering only on operations posted to a given QP. • The QP construct binds together operations of different types in order to provide ordering guarantees between different operations, e.g. between message and RDMA operations. • How is that accomplished using the fabric interfaces?

  7. What would be nicer… • FI_MSG - Provides reliable, message based communication, with data transfers maintaining message boundaries. Messages are ordered by default, with relaxed order being optionally supported on a per message basis. • FI_RDM - Provides reliable datagram communication. By default, ordering is not guaranteed, although for datagrams targeting a given network endpoint, a sequence of datagrams can be specified as an ordered sequence.

  8. Using relaxed order in MPI – rendezvous example No ordering dependency No ordering dependency SencCmp1 responder WR1 SndCmp0 WR0 Sndcmp0 Wr0 Wr1 SndCmp1 ordering required ordering required www.openfabrics.org

  9. PCI-e Transaction order rules (for given TC, src, target) Producer/consumer model first op second op Strict producer/consumer model – i.e. strict order I’m feeling lucky and have set RO bit in second request Requesting relaxed order doesn’t mean you’ll get it, hence y/n. If you’re a HW designer, don’t count on relaxed order.

  10. IB Transaction ordering rules (RC) first op second op No ordering guarantee – don’t count on order if you want correctness I need order and have set the fence bit in second transaction Strict producer-consumer model March 30 – April 2, 2014 #OFADevWorkshop

  11. Libfabric FI_MSG now – depending on interpretation of fi_getinfo.3 first op second op March 30 – April 2, 2014 #OFADevWorkshop

  12. Libfabric FI_MSG with optional relaxed order proposal first op second op IB RC like behavior (default) Relaxed order bit set in flag (SendMsg, etc.) Fence bit set in flag Provider free to ignore b),must observe c)

  13. Ordering bits for FI_MSG • Add new flag bit for sendmsg/writemsg - FI_RELAXED_ORDER - • If this bit is set, MSG, RMA ,or AMO operation may be completed ahead of pending MSG, RMA, or AMO ops in the EP’s send queue • Messages may appear to complete out of order when this bit is set. • Add new flag bit for sendmsg/writemsg - FI_FENCE_GLOBAL - • If this bit is set, this operations posted to the EP will not be initiated till all previously posted MSG, RMA, AMO ops to the EP have completed globally • fi_ep_sync sounds blocking, this is a potentially non-blocking way to do a fence March 30 – April 2, 2014 #OFADevWorkshop

  14. Libfabric FI_RDM now – depending on interpretation of fi_getinfo.3 first op second op Not enough order? fi_ep_sync seems kind of heavy weight.

  15. Libfabric FI_RDM suggestion – HyperTransport ordered sequences first op second op a) default, as in man page. App must use fi_ep_sync for ordering. b) If second op has the same order sequence. Ops must be back-to-back.

  16. Ordering bits for FI_RDM • Add new flag bit for sendmsg/writemsg - FI_ORDERED_SEQ - • If this bit is set, message or rma or amo operation is treated as part of an ordered sequence. The sequence number is specified in the flow field of the fi_msg, etc. argument • Msg, RMA, and AMO requests within an ordered sequence must be posted sequentially to a given endpoint, with intervening requests that are not part of the ordered sequence. The operations must all target the same target address. • Some providers may be able to do this efficiently, otherwise the behavior is as if fi_ep_sync were invoked internally between each operation.

More Related