[openib-general] Re: ib_mad: Scenarios for returning posted send MADs
Hal Rosenstock
Mon Oct 4 12:34:51 PDT 2004
On Mon, 2004-10-04 at 13:52, Sean Hefty wrote:
Hal Rosenstock <halr at voltaire.com> wrote:
> > 1. In the case that a client unregisters with the MAD layer, there is
> > code which cleans up the agent send list. However, it does not appear to
> > me that if the send completion occurs after the deregistration that this
> > completion is thrown away properly but rather a callback may be
> > performed. Did I miss something here ?
>
> A reference on the MAD agent is taken whenever a work request is
> posted to the QP. An additional reference is taken on the MAD
> agent if the MAD has a timeout, indicating that a response MAD is
> expected. When RMPP is added, a single send may result in multiple
> references being taken on the MAD agent.
>
> The reference per work request is not released until the work
> request complete. The reference for the response is not released
> until the response has been received, the request times out, or is
> canceled.
>
> When a client deregisters, MADs waiting for responses are canceled. This decrements their reference counts. If the MAD had no other references, then it is done and may be completed. If it still has references, this indicates that it has active work requests on the QP that must complete before the send MAD can complete.
>
> This is why the deregistration code decrements the reference count,
> then checks the reference count before flushing the request.
I am pretty sure there is a window here as follows:
First, deregistration cancels the MAD removing it from the agent send
list.
ib_mad_complete_send_wr is invoked some time later and never checks for
the send WR still being on the agent send list. It just assumes it is.
It potentially makes a send callback.
> > 2. Another scenario for this is on WC errors which currently attempt to
> > restart the port. I am not sure all WC errors should do this. Perhaps
> > only IB_WC_FATAL_ERR and IB_WC_GENERAL_ERR.
>
> My thought is that work requests that result in a failure should be
> completed in error from the port layer to the MAD agent. The port
> layer _could_ then restart operations with the next work request,
> and the MAD agent would complete the send MAD to the user in error.
Aren't some errors fine grained and pertain only to the WR supplied
whereas other errors are coarser (like fatal and general) and might
apply to something larger (perhaps the port but maybe the QP) ? I wonder
whether there is any assistance in the Mellanox documentation as to
which errors should be treated how.
>
> Of course, throwing RMPP into this complicates the matter, since
> the work request immediately behind the one causing the failure
> might be another request associated with the same RMPP MAD, which
> may cause another failure...
>
> It would help in this case for the port layer code
> just return completions for all queued work requests to the MAD
> agents, and let the MAD agent code deal with the issue.
True for most errors. Not sure about fatal and general errors yet.
> > 3. The final scenario is board (not currently possible) or module
> > removal. My concern here is about potential send callbacks (indicating
> > FLUSHED) to a potentially stale MAD agent. When the module is removed
> > non forceably, the clients (upper layer modules) would need to be
> > removed first, which should cause the proper deregistration (and these
> > MADs would be cancelled so there would be none to cleanup). I am not
> > sure what the rules for proper behavior are on forceable module removal.
> > Board removal would be similar to this (the forceable module removal
> > case).
>
> Deregistration is a synchronous process, so will wait until all
> send MADs have completed. If this isn't happening, then the
> referencing counting is off somewhere.
I think deregistration is fine (short of issue 1 which I think is
readily fixable). I was more asking about the asynchronous scenario here
(forced module (or board) removal) where that isn't the case.
-- Hal
More information about the openib-general mailing list