[openib-general] IB: I don't like what I'm seeing.

Wed Mar 31 09:50:42 PST 2004

> -----Original Message-----
> From: Troy Benjegerdes [mailto:hozer at hozed.org]
> Sent: Wednesday, March 31, 2004 8:38 AM
> To: openib-general at openib.org
> Subject: Re: [openib-general] IB: I don't like what I'm seeing.
> 
> 
> On Wed, Mar 31, 2004 at 07:12:26AM -0800, Roland Dreier wrote:
> >     ron> I'd like to see if we can put a simple non-connection-based
> >     ron> unreliable datagram stack for this thing and chunk most of
> >     ron> this crap out. My faith in it working at scale is basically
> >     ron> 0.
> > 
> > Not necessarily a bad idea if we want to start from 
> scratch.  However,
> > using unreliable datagrams (UD) means ignoring most of the 
> performance
> > features of IB, since you're limited to single packet messages and
> > have to take care of all the reliable delivery in software.  You'd
> > probably be lucky to get 1/3 the performance you can get by using
> > reliable connected transport.
> 
> Maybe so, but if the hardware 'reliable delivery' works 
> intermittently 
> with 896 nodes in a network, nobody cares about performance.
> 
> Has *anyone* heard of a 1000+ node infiniband network that 
> has been able
> to be up, and running compute jobs that last more than a day?
> 
> Unreliable datagram allows you to:

I understand the desire to build and load a simplified IB stack, but if one feels that an unreliable datagram service would be a better fit for an application, that service is available and used by things like IPoIB.  If you don't use any of the CM stuff, it shouldn't matter that it's loaded and shouldn't get in the way of using datagrams.  Having implemented IPoIB in the SourceForge stack, I agree with others who say the performance will be slow compared with a connected QP if you want to transfer anything larger than 2k in a single message.  But on the other hand, if your application could benefit from multicast services, UD is the only way to get it natively.

Roy

> 
> 1) determine network congestion/packet drops/corruption/retransmits on
> an end-to-end basis, making it easier to evaluate how well 
> the hardware
> and cableing is actually doing. Taking advantage of all the 'reliable
> transport' features means it just goes REAL SLOW and nobody 
> quite knows
> why.
> 
> 2) work around (some) network hardware/scaling problems in software,
> that YOU control, and can be tuned. And is generally much simpler than
> "vendor provided" solutions that necessarily try to solve the general
> market problems first.
> 
> I don't think we have to start from scratch, but we do need to
> concentrate an effort on scalabilty that throws away all the ULP's and
> connection management, etc, and just makes sure the driver and *USER*
> level access layer can scale to large node counts, and verify the
> hardware works reliably.
> 
> In the interest of parallelism ;) , we should also have another effort
> that works on different ULP's, and the 'whole stack', but Ron 
> shouldn't
> have to know anything about it, or the complexity involved.
> 
> 
> -- 
> To unsubscribe send an email with subject unsubscribe to 
> openib-general at openib.org.
> Please contact moderator at openib.org for questions.
> 

-- 
To unsubscribe send an email with subject unsubscribe to openib-general at openib.org.
Please contact moderator at openib.org for questions.