[openib-general] IB: I don't like what I'm seeing.

Stephen Poole
Wed Mar 31 08:50:45 PST 2004


>On Wed, Mar 31, 2004 at 07:12:26AM -0800, Roland Dreier wrote:
>>      ron> I'd like to see if we can put a simple non-connection-based
>>      ron> unreliable datagram stack for this thing and chunk most of
>>      ron> this crap out. My faith in it working at scale is basically
>>      ron> 0.
>>
>>  Not necessarily a bad idea if we want to start from scratch.  However,
>>  using unreliable datagrams (UD) means ignoring most of the performance
>>  features of IB, since you're limited to single packet messages and
>>  have to take care of all the reliable delivery in software.  You'd
>>  probably be lucky to get 1/3 the performance you can get by using
>>  reliable connected transport.
>
>Maybe so, but if the hardware 'reliable delivery' works intermittently
>with 896 nodes in a network, nobody cares about performance.

Mellanox does not offer RD in hardware, unless they have changed in 
the last few weeks.

>
>Has *anyone* heard of a 1000+ node infiniband network that has been able
>to be up, and running compute jobs that last more than a day?
>
>Unreliable datagram allows you to:
>
>1) determine network congestion/packet drops/corruption/retransmits on
>an end-to-end basis, making it easier to evaluate how well the hardware
>and cableing is actually doing. Taking advantage of all the 'reliable
>transport' features means it just goes REAL SLOW and nobody quite knows
>why.
>
>2) work around (some) network hardware/scaling problems in software,
>that YOU control, and can be tuned. And is generally much simpler than
>"vendor provided" solutions that necessarily try to solve the general
>market problems first.
>
>I don't think we have to start from scratch, but we do need to
>concentrate an effort on scalabilty that throws away all the ULP's and
>connection management, etc, and just makes sure the driver and *USER*
>level access layer can scale to large node counts, and verify the
>hardware works reliably.
>
>In the interest of parallelism ;) , we should also have another effort
>that works on different ULP's, and the 'whole stack', but Ron shouldn't
>have to know anything about it, or the complexity involved.
>
>
>--
>To unsubscribe send an email with subject unsubscribe to 
>openib-general at openib.org.
>Please contact moderator at openib.org for questions.


-- 
Steve Poole (spoole at lanl.gov) 
        Office: 
Los Alamos National Laboratory 
        Office: 
CCN - Special Projects / Advanced Development                   Fax: 





-- 
To unsubscribe send an email with subject unsubscribe to openib-general at openib.org.
Please contact moderator at openib.org for questions.




More information about the openib-general mailing list