[openib-general] IB: I don't like what I'm seeing.
Stephen Poole
Wed Mar 31 08:50:45 PST 2004
>On Wed, Mar 31, 2004 at 07:12:26AM -0800, Roland Dreier wrote:
>> ron> I'd like to see if we can put a simple non-connection-based
>> ron> unreliable datagram stack for this thing and chunk most of
>> ron> this crap out. My faith in it working at scale is basically
>> ron> 0.
>>
>> Not necessarily a bad idea if we want to start from scratch. However,
>> using unreliable datagrams (UD) means ignoring most of the performance
>> features of IB, since you're limited to single packet messages and
>> have to take care of all the reliable delivery in software. You'd
>> probably be lucky to get 1/3 the performance you can get by using
>> reliable connected transport.
>
>Maybe so, but if the hardware 'reliable delivery' works intermittently
>with 896 nodes in a network, nobody cares about performance.
Mellanox does not offer RD in hardware, unless they have changed in
the last few weeks.
>
>Has *anyone* heard of a 1000+ node infiniband network that has been able
>to be up, and running compute jobs that last more than a day?
>
>Unreliable datagram allows you to:
>
>1) determine network congestion/packet drops/corruption/retransmits on
>an end-to-end basis, making it easier to evaluate how well the hardware
>and cableing is actually doing. Taking advantage of all the 'reliable
>transport' features means it just goes REAL SLOW and nobody quite knows
>why.
>
>2) work around (some) network hardware/scaling problems in software,
>that YOU control, and can be tuned. And is generally much simpler than
>"vendor provided" solutions that necessarily try to solve the general
>market problems first.
>
>I don't think we have to start from scratch, but we do need to
>concentrate an effort on scalabilty that throws away all the ULP's and
>connection management, etc, and just makes sure the driver and *USER*
>level access layer can scale to large node counts, and verify the
>hardware works reliably.
>
>In the interest of parallelism ;) , we should also have another effort
>that works on different ULP's, and the 'whole stack', but Ron shouldn't
>have to know anything about it, or the complexity involved.
>
>
>--
>To unsubscribe send an email with subject unsubscribe to
>openib-general at openib.org.
>Please contact moderator at openib.org for questions.
--
Steve Poole (spoole at lanl.gov)
Office:
Los Alamos National Laboratory
Office:
CCN - Special Projects / Advanced Development Fax:
--
To unsubscribe send an email with subject unsubscribe to openib-general at openib.org.
Please contact moderator at openib.org for questions.
More information about the openib-general mailing list