[openib-general] IB: I don't like what I'm seeing.
Clauser, Milton
Wed Mar 31 22:11:23 PST 2004
>>There is going to be a nasty tradeoff between BER and performance
>>that I really don't think the vendors have thought about much yet.
>But, infinitely fast with an infinitely large BER is a bad thing. :-)
We're not interested in generating garbage rapidly, so it's not really a
tradeoff. High performance is relevent only if we have confidence there are
no errors.
But I submit that it's not possible to do truly end-to-end error checking,
by which I mean checking for all possible errors from beginning to end of a
job. So where do you want to draw the line? What/how much do you have to
check to give you an acceptable level of confidence? And it shouldn't be
necessary to do as much checking on small jobs as on large jobs.
Milt Clauser
-----Original Message-----
From: Stephen Poole
To: openib-general at openib.org
Sent: 3/31/2004 8:03 PM
Subject: Re: [openib-general] IB: I don't like what I'm seeing.
>On Wed, Mar 31, 2004 at 12:58:08PM -0700, ron minnich wrote:
>> we don't believe in HCA reliability here. It has not worked once in
all
>> the years of delivered networks. We're going to assume, unless we
can see
>> BER of 10-21 app-to-app, that the network is unreliable. So, yes,
toss and
>> start over is not inconceivable.
>>
>> On the other hand, if we do get a perfect network, app to app,
nobody's
>> going to complain, but until we see it at scale 1024+, I am not sure
we
>> can really count on it.
>>
>> Sorry if I upset anyone on this list with my comments -- forgot it
was
>> this open and it was early morning. But the code still worries me.
>
>This is open-source, peer review development, if nobody is getting
upset
>we're not doing it right ;)
>
>This is the first mention of bit error rates I've seen in an infiniband
>discussion. Does anyone have end-to-end BER numbers for any deployed
>infiniband installations?
Quite difficult to determine. It would be nice to see actual numbers
when they are available. The question is *IF* it is an undetected
error, how do you know you got one, if you are not looking for them ?
:-) There was some nice work when we were working on GSN (remember
the predecessor to IB) on potential error rates based on the two
CRC's that GSN used. I will try and dig it up.
>
>There is going to be a nasty tradeoff between BER and performance that
I
>really don't think the vendors have thought about much yet.
But, infinitely fast with an infinitely large BER is a bad thing. :-)
>
>--
>To unsubscribe send an email with subject unsubscribe to
>openib-general at openib.org.
>Please contact moderator at openib.org for questions.
--
Steve Poole (spoole at lanl.gov)
Office:
Los Alamos National Laboratory
Office:
CCN - Special Projects / Advanced Development Fax:
--
To unsubscribe send an email with subject unsubscribe to
openib-general at openib.org.
Please contact moderator at openib.org for questions.
--
To unsubscribe send an email with subject unsubscribe to openib-general at openib.org.
Please contact moderator at openib.org for questions.
More information about the openib-general mailing list