[openib-general] IB: I don't like what I'm seeing.
ron minnich
Wed Mar 31 12:19:35 PST 2004
On 31 Mar 2004, Roland Dreier wrote:
> IB reliable transports do not count on no bit errors on the network.
neither did the other networks, the last victim being quadrics. All of
them claimed to do their own version of RC, and all of them failed on a
large scale.
> The reliable connection (RC) transport operates similarly to TCP in
> that it will detect and retransmit dropped or corrupted packets.
nothing new here. It's been done. It's failed at scale.
> I don't think chip bugs that corrupt data with RC are any more likely
> than bugs that corrupt data with UD, or for that matter much more
> likely than data corruption bugs in the host chipset or CPU.
it's an old rule of networking that the only error detection/correction
that works is end-to-end. Because IB puts that support in the card, it can
not by definition do end-to-end error detection/correction.
So yeah, it may work well, but I will believe it at scale when I see it at
scale.
> (IB networks do tend to have low error rates but they're probably only
> in the 10^-15 to 10^-18 range)
Too high, but thanks much for the number. It's a very useful one -- we've
had trouble before getting these kinds of numbers. This openness is one
reason I am enthusiastic about IB, even when I don't sound like it.
ron
--
To unsubscribe send an email with subject unsubscribe to openib-general at openib.org.
Please contact moderator at openib.org for questions.
More information about the openib-general mailing list