Re: [vox-tech] ECC memory --- is it worth it? (semi-OT)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [vox-tech] ECC memory --- is it worth it? (semi-OT)
På 2007-04-11, skrev Rick Moen:
> Quoting Bill Broadley (bill@cse.ucdavis.edu):
> > Rick Moen wrote:
> > > A bad bit in memory, if indicative of a physical defect, will quickly
> > > manifest unmistakeably on Linux in the manner I described. If not thus
> > > indicative, (from empirical observation over a long period of time:)
> > > it's extremely unlikely to have detectable long-term consequences.
> >
> > You speculate that it contributes to premature httpd deaths but is
> > undetectable long term?
>
> I didn't think what I was saying was that difficult to follow, but here
> is what I said, again: "I'd speculate that some non-zero percentage of
> prematurely deceased httpd instances owed to that...." I figure that
> possibly (i.e., speculate that) some quite small number of such events
> ultimately owe to uncorrected single-bit memory errors that are not
> associated with actually bad RAM -- but, effectively, it's way down in
> the noise of undiagnosable oddities.
>
> > $10 a dimm requires you to "pay through the node" and the "wealth of
> > midas"?
>
> If you assumed I was endorsing your figure, you assumed wrong. ;->
>
> Ironically, the most recent RAM I purchased _was_ ECC, because it was
> a gig for the Intel L440GX+ "Lancewood" motherboard in my old VA Linux
> Systems model 2230 server. However, let's talk about the HP ProLiant
> 380 I was working on recently: 128 MB ECC Registered is $42 at SA
> Technologies, Inc. (where I would buy such things by preference).
> Without ECC, $32. ECC thus exacts a 31% premium in that case.
>
> Now, would I pay that premium? I might, or I might put the money
> somewhere else, where it's more likely to yield significant benefit.
> (In 2007, I'd actually try not to drop cash on a 800MHz PIII that I
> didn't dearly love, but five years ago might have been different.)
Here's my perspective on that. Assuming that one of those uncorrected
single-bit errors turned out to be in the worst possible place (say, a
pointer in the kernel or in postgresql in a journaling memory structure)
that turned out to cause data corruption that caused a day of work to be
lost (i.e., the last good backup was 24 hours old), then:
- assuming a man-hour is worth $50 (that's probably low)
- assuming that the machine is used by four people (other people's
servers have more users),
then the problem would cost $1600 to recover from, plus whatever
additional time was required to take the system down to restore the
backup, fsck the filesystem, etc.
That notwithstanding, I agree with Rick about disk failures being an
order of magnitute more likely. I've experienced the pain of a failing
disk more times that I care to remember.
--
Henry House
+1 530 753 3361 ext. 13
Please don't send me HTML mail! My mail system frequently rejects it.
The unintelligible text that may follow is a digital signature.
See <http://hajhouse.org/pgp> to find out how to use it.
My OpenPGP key: <http://hajhouse.org/hajhouse.asc>.
Attachment:
signature.asc
Description: Digital signature
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech
|