[openib-general] Re: ib_mad: Scenarios for returning posted send MADs

Tue Oct 5 06:31:43 PDT 2004

On Mon, 2004-10-04 at 15:52, Sean Hefty wrote:
> Hal Rosenstock <halr at voltaire.com> wrote:
> 
> > I am pretty sure there is a window here as follows:
> > First, deregistration cancels the MAD removing it from the agent send
> > list.
> > ib_mad_complete_send_wr is invoked some time later and never checks for
> > the send WR still being on the agent send list. It just assumes it is.
> > It potentially makes a send callback.
> 
> The deregistration only removes the mad_send_wr from the agent send list 
> if its reference count is zero.  A reference is held on the mad_send_wr 
> from the time that a work request is posted to the port, until a completion 
> is reported.  So, you should never get a callback for a mad_send_wr, 
> unless its reference count is at least one.

Do you mean that you should never get a callback for a mad_send_wr if
(rather than unless) it's reference count is at least one ?

Cancelling the MAD decrements the reference count which has the effect
of moving the callback one stage ahead. Either the send completion has
already occurred in which case the reference count will be 0 or negative
and the callback will be invoked immediately or it has not yet occurred
so is invoked when the send completion occurs subsequently. The latter
case leaves the MAD on both the port and agent send lists until it
occurs. There are the other removal scenarios which interplay with this.

> > Aren't some errors fine grained and pertain only to the WR supplied
> > whereas other errors are coarser (like fatal and general) and might
> > apply to something larger (perhaps the port but maybe the QP) ? I wonder
> > whether there is any assistance in the Mellanox documentation as to
> > which errors should be treated how.
> 
> I was referring to errors that applied to a single work request only.  
> For fatal errors that we cannot recover from, we may need a way to report 
> such errors to the user to indicate that their mad_agent is no longer 
> operational.

In looking at the programmer's guide and the mthca driver, I found the
following:
All CQE syndromes are converted to the appropriate WC status.
An unknown syndrome is reported as a general error.
Fatal error appears to me to be currently unused.

> > > It would help in this case for the port layer code 
> > > just return completions for all queued work requests to the MAD 
> > > agents, and let the MAD agent code deal with the issue.
> > 
> > True for most errors. Not sure about fatal and general errors yet.
> 
> I think it would depend on the error code that was reported in the 
> send_mad_wc. If the return code is flushed, the mad_agent could just 
> repost the send.  

Agreed.

> If the return code is fatal error, it should complete the MAD to the client.

I couldn't find any fatal errors. I think this would be true for a general error.

> > > > 3. The final scenario is board (not currently possible) or module
> > > > removal. My concern here is about potential send callbacks (indicating
> > > > FLUSHED) to a potentially stale MAD agent. When the module is removed
> > > > non forceably, the clients (upper layer modules) would need to be
> > > > removed first, which should cause the proper deregistration (and these
> > > > MADs would be cancelled so there would be none to cleanup). I am not
> > > > sure what the rules for proper behavior are on forceable module removal.
> > > > Board removal would be similar to this (the forceable module removal
> > > > case).
> > > 
> > > Deregistration is a synchronous process, so will wait until all 
> > > send MADs have completed.  If this isn't happening, then the 
> > > referencing counting is off somewhere.
> > 
> > I think deregistration is fine (short of issue 1 which I think is
> > readily fixable). I was more asking about the asynchronous scenario here
> > (forced module (or board) removal) where that isn't the case.
> 
> Unless there's a bug in the code, I don't believe that we can have send 
> callbacks to stale MAD agents.  If you're trying to have the code deregister 
> for a client, this would be impossible.  Clients should receive some sort 
> of removal notification event and would need to deregister in response 
> to that event.

It all depends on the ordering of these shutdown events. If the removal
event went to all modules simultaneously, there would need to be an
interlock to prevent this from occuring.

In any case, what should the MAD layer do when there are posted sends on
an agent list ? Should it just dump them and not attempt to make a
callback ? The bad side of this is that there are legitimate scenarios
where this might occur. Not making the send callback has a number of
side effects beyond the individual client (in terms of PCI mapping and
memory leakage).

-- Hal