Topics on this page:
Data loss is a huge problem. See these facts, from http://www.ontrack.com/library/rdr_2003_whitepaper.pdf
Data loss causes, according to Ontrack engineers (who seem to have lost no data to malicious intruders):
| Hardware or Systems Malfunction | 59% |
| Human Error | 28% |
| Software Program Malfunction | 9% |
| Viruses | 4% |
| Natural Disaster | 2% |
According to a Gallup poll, most businesses value 100 megabytes of data at US$ 1,000,000.
How long do you expect a disk to last before it fails? Is one brand better than another?
Who knows, and not particularly....
Disk manufacturers do studies, but they are accelerated failure tests on their own systems only under very specific conditions. Any manufacturer can have a short run of worse or better devices, and comparisons between various manufacturers' products haven't been very meaningful.
A couple of papers presented at the 5th USENIX Conference on File And Storage Technology (FAST '07) have gotten some attention:
Here is some interesting data on returns of faulty hardware, although it is limited in time coverage: http://pro.sunrise.ru/articletext.asp?reg=30&id=283
Here is my summary of the Google paper:
Their study was based on over 100,000 disk drives, a variety of PATA and SATA from a variety of manufacturers, 80-400 GB and 5400-7400 RPM. They do not provide information about the specific manufacturers, but that really isn't all that important. All manufacturers have short runs of worse and better quality, and an attempt to measure who was better would probably be overwhelmed by measurement noise.
Some SMART parameters are highly correlated with disk failures. However, SMART parameters alone are not all that useful for predicting individual drive failures.
Contrary to common assumptions, temperature and activity are not highly correlated to drive failure.
Drive manufacturers quote yearly failure rates below 2%, but user studies report up to 6%. Many apparent failures in the field don't seem to be failures in the lab — maybe the problem was with a specific controller or data cable. They cite other studies of failure rates:
Some SMART data is clearly bogus. I agree — one of my disks seems to consistently report its temperature in degrees Farenheit instead of the expected Celsius, and so it appears to always be somewhere above the boiling temperature of water.
A significant number of drives fail within the first 3 months. The weak ones die quickly.... Then the failure rate climbs after two years. Annualized failure rates, approximated from their Figure 2:
| 0-3 | 4-6 | 7-12 | 13-24 | 25-35 | 36-48 | 49-60 |
| 2.8% | 1.7% | 1.7% | 8.1% | 8.6% | 6.0% | 7.8% |
Four SMART parameters were significantly correlated with increased failure rates.
| Error type | Meaning | After the first such occurance of this error, this many times more likely to fail within 60 days than a drive without this error |
| Scan error | Drives typically scan the disk surface in the background and report errors as they are found. Large scan error counts may indicate surface defects. | 39 times more likely to fail |
| Reallocation counts | Drive's logic has remapped a faulty sector number ot a new physical sector drawn from its pool of spares, because of recurring soft errors or a hard error. May indicate drive surface wear. | 14 times more likely to fail |
| Offline reallocation | Subset of the reallocation counts, counting only reallocated sectors found during background analysis. Should exclude sectors reallocated due to errors during actual I/O. | 21 times more likely to fail |
| Probational counts | Suspect bad sectors put "on probation". Weaker indication of possible problems. | 16 times more likely to fail |
But while that looks impressive, over 56% of the failed drives had zero counts in all four of those SMART parameters! So, models based only on those four signals will predict less than half the failed drives.
Data Risk Management has a very interesting model for data archiving: http://www.datariskmgmt.com/
It's a denial-of-service attack.
How can you tell where spam was injected? Read the "Received:" fields in reverse, looking for inconsistency where the promiscuous relayer accepted the spam from the source. Using a real example I received, my comments inserted below the relevant lines in red:
"> From Bio-Med5241_a@linux.com.pk Thu Oct 26 15:38 EST No, the message did not come from Pakistan (.pk), see below Received: from sclera.ecn.purdue.edu (root@sclera.ecn.purdue.edu [128.46.144.159]) by rvl3.ecn.purdue.edu (8.9.3/8.9.3moyman) with ESMTP id PAA16066 for <cromwell@rvl3.ecn.purdue.edu> Thu, 26 Oct 15:38:34 -0500 (EST) Hop #3 — sclera forwarded my mail to rvl3.ecn.purdue.edu From: Bio-Med5241_a@linux.com.pk Received: from glasgow3.blackid.com ([212.250.136.251]) by sclera.ecn.purdue.edu (8.9.3/8.9.3moyman) with ESMTP id PAA13819 for <cromwell@sclera.ecn.purdue.edu>; Thu, 26 Oct 15:38:24 -0500 (EST) Hop #2 — glasgow3.blackid.com, the spam relayer, hands the spam to sclera.ecn.purdue.edu Date: Thu, 26 Oct 15:38:24 -0500 (EST) Message-Id: <XXXX10262038.PAA13819@sclera.ecn.purdue.edu> Received: from geo5 (host-216-77-220-220.fll.bellsouth.net [216.77.220.220]) by glasgow3.blackid.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 449GZRTV; Thu, 26 Oct 21:33:01 +0100 Hop #1 — glasgow3.blackid.com, the spam relayer, accepts mail from the source, a dial-in client of bellsouth.net using the IP address 216.77.220.220. The dial-in client undoubtedly got its IP address via DHCP, and so any system using that IP address right now is not necessarily the original spam source. However, bellsouth.net should be able to figure out which of their clients used this IP address at this particular time. To: customer@aol.com That's odd — I'm not sure how they're getting SMTP to send it to me but with this bogus address in the "To:" field — maybe I was a blind carbon-copy recipient... Subject: A New Dietary Supplement That Can Change Your Life.... MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Length: 5463 Status: R [ long pseudo-medical nonsense deleted.... ]
Further investigation could use traceroute or whois to figure out where 216.77.220.220 really is in case the reverse resolution above either failed or was faked. As per the GNU version of whois
% whois 216.77.220.220 NetRange: 216.76.0.0 - 216.79.255.255 CIDR: 216.76.0.0/14 NetName: BELLSNET-BLK5 NetHandle: NET-216-76-0-0-1 Parent: NET-216-0-0-0-0 NetType: Direct Allocation NameServer: NS.BELLSOUTH.NET NameServer: NS.ATL.BELLSOUTH.NET Comment: Comment: For Abuse Issues, email abuse@bellsouth.net. NO ATTACHMENTS. Include IP Comment: address, time/date, message header, and attack logs.
Also see the great tool at http://www.samspade.org/.
The ridiculously capitalized eSAT iNC http://www.esatinc.com/ +1-888-895-0007 sells systems for backup data communication using bi-directional satellite links.
| Home Page | Site Map | Public Key |
|
|
|
|
|
|
| © Bob Cromwell Jul 2008. Created with /bin/vi, hosted on OpenBSD with Apache. Root password available here | ||||