Self-Monitoring, Analysis and Reporting Technology
Self-Monitoring, Analysis and Reporting Technology ( SMART or SMART , German system for self -monitoring, analysis and status reporting ) is an industry standard for monitoring hard disk drives (HDD) and solid-state drives (SSD) and is used to predict a possible failure of the Storage medium. The values of different sensors are evaluated with the help of different parameters.
overview
The monitored data is evaluated when the computer is started by the appropriately set BIOS or other firmware , or by special software that must be installed in addition to the operating system. Microsoft, for example, has provided a driver for this since Windows 95b (OSR 2) , which is then addressed by this software.
The program is based on the limit values set by the hard disk manufacturer for the individual parameters, such as temperature. After a longer period of time, the software can then predict expected failures.
"Switching off" SMART, for example in the BIOS settings, does not switch off data acquisition, but only switches off the warnings when the threshold values are exceeded. The collected data is saved in a reserved area of the hard disk that cannot be changed by programs.
The entire monitoring does not slow down the hard drive, as it only logs what is happening without taking corrective action. This is already done by mechanisms internal to the hard drive, for example in the event of vibrations, which in turn existed before SMART. Everything else, such as mileage and temperature, is recorded by specially built-in sensors and chip functions. There is a division into “online” parameters, which are permanently noted, and those which are updated during pauses when the drive is, so to speak, “offline”.
Expressiveness
SMART is limited to the mass storage devices monitored by it, such as hard drives or SSDs, and does not provide any information on the overall reliability of the computer system. There is no link between the data obtained from several mass storage devices. The system is also not standardized; it is up to the manufacturers to decide which parameters they monitor within which limits. The accuracy of the monitoring is also discussed among users. For example, some temperature sensors are considered to be incorrectly placed or set too optimistically. B. be well below room temperature.
An independent Google study, which lasted nine months, covered all manufacturers and a total of 100,000 hard drives, produced the following result in 2006: If all relevant parameters are included, 64% of all failures can be predicted with SMART. All other warning signals, i.e. audible or noticeable as data errors, were ignored. In the remaining third of all failures, the hard drive itself incorrectly reported that it was free of problems.
The stress on the hard disk had a far smaller impact on its durability than previously assumed. If a drive survives the first year, the idle portion no longer plays a role until it is regularly replaced after four years. Only in the first and after the fourth year does permanent reading and writing double the failure rate.
history
In 1992, IBM realized that as PCs became more widespread in companies, so too did the trust placed in them. Failures were increasingly becoming a financial problem that one wanted to address with PFA (Predictive Failure Analysis). IBM hard drives with this system informed the computer of any parameter changes so that the user could react in time with an exchange. A little later, Compaq introduced IntelliSafe. This filters the irrelevant and only reports the threatening changes and setpoints to the running software. Seagate , Quantum and Conner were involved in the development and adapted it to their products; Compaq did not manufacture hard drives itself.
Sensing the potential and with an industry standard in mind, the disclosure of the system was forced by Compaq and especially Seagate. Together with Conner, Quantum, Western Digital and then IBM, the two approaches merged under the name SMART
Since 1996 and the start of the ATA -3 standard, or SCSI -3 four years earlier, it has been part of the standard equipment of a hard drive almost without exception.
The specification for the SMART parameters was removed before the ATA-3 standard was adopted (see web links ). Therefore, neither the meaning of the stored values nor their scaling are stipulated (for the latter see also common parameters ). Only their location is officially standardized. Strictly speaking, even according to the ATA-7 standard, there is no way of reading out the temperature of a plate, for example. Practically all available disks adhere to the data format from the ATA-3 draft. A read-out program adds a designation such as "Seek Error Rate" to each parameter ID for better understanding. Over the years, a reliable de facto standard has emerged.
Solid-state drives (SSDs) no longer require many of the previous test points due to the system, but different, new ones. However, there is currently no coordination between the SSD controller manufacturers. As a result, some new parameter IDs were added, but sometimes existing IDs were simply given a new meaning. This leads to misinterpretations in all SMART programs, which do not yet know the meaning in the new drives.
A brief evaluation of important SMART parameters is also included in most BIOS versions, so that warning messages about defective SSDs can appear when the computer is switched on. In this case, it is advisable to switch off the SMART self-test function in the BIOS and to carry out a manual test with a current program in the operating system (see comparison of SMART programs ).
Variations after connection
The implementation of the SMART standard differs depending on the hard disk connection in the PC. There are two of them: ATA and SCSI standards. Both know the HEALTH STATUS. The firmware of the drive indicates whether it is classified as "okay" or "problematic". Both standards also support reading out the temperature and several variants of self-tests and logbooks.
In the case of ATA hard disks, numerous values and their limits can also be queried using running software. In this way, the software or the user can assess more precisely whether and why an error will occur. However, these parameters are not exactly standardized and differ in scope and interpretation, even between models from one manufacturer.
The commands and data formats for all these functions are, however, implemented completely differently for ATA and SCSI.
Basically, SCSI commands are transmitted on the USB port. The hard disks connected via USB are almost without exception not SCSI but (S) ATA disks. In the course of the introduction of the USB 3.0 interface, the USB Attached SCSI (UAS) protocol was introduced; this can also be used on USB 2.0 at reduced speed, which, in contrast to the technically simpler bulk transfer of the USB memory sticks, tunneling the ATA Enables commands via the USB bus and enables SMART queries via USB. Chip manufacturers such as Cypress, JMicron or SunPlusIT use manufacturer-specific commands. Some programs can use these commands (see section SMART programs in comparison ). There are also USB-SATA bridges that support the manufacturer-independent SCSI / ATA translation standard.
The FireWire connection - which is common on Apple computers in particular - enables transmission natively, but Mac OS X does not use this.
Drives connected via eSATA , like their internal SATA counterparts, can be read without any problems.
Serial ATA disks connected via Serial Attached SCSI (SAS) can be checked if the corresponding SAT commands are available.
For tape drives, there are functions analogous to SMART called TapeAlert . They are used to warn of worn belts.
evaluation
Usual parameters
Each value is first saved as raw data . This is then sorted on a scale from 0 to 100, 200 or 255 for better understanding. The different scales are used for finer gradations where the manufacturer considers them to be useful. Starting with the scale maximum, the value approaches zero in the event of errors or increasing age. However, the critical limit (threshold) is often well above it.
The following table shows the individual parameters and the evaluation of the respective raw values (not to be confused with the values of the value scale):
A. |
Failure-relevant parameters. If available, possible failures can be forecast.
|
---|---|
I. | Informative, parameters of little or no relevance for the failure forecast |
The higher the raw value, the better | |
The lower the raw value, the better |
ID | Hex | Parameter name (English) | Parameter name (German) | A. | I. | Better | description |
---|---|---|---|---|---|---|---|
01 | 0x01 | (Raw) Read Error Rate | Read error rate (raw) |
|
|||
02 | 0x02 | Throughput performance | Throughput |
|
|||
03 | 0x03 | Spin Up Time | Acceleration time |
|
|||
04 | 0x04 | Start / Stop Count | Start / stop processes | Yes |
|
||
05 | 0x05 | Reallocated Sectors Count | reassigned sectors |
|
|
||
07 | 0x07 | Seek error rate | Search error rate |
|
|||
09 | 0x09 | Power On Hours Count | Time in operation | Yes |
|
||
10 | 0x0A | Spin retry count | Start-up repetitions, only relevant for HDDs |
|
|
||
12 | 0x0C | Power cycle count | Number of activations | Yes |
|
||
184 | 0xB8 | End-to-end error | End-to-end errors |
|
|
||
187 | 0xBB | Reported uncorrectable error | Reported uncorrectable errors |
|
|
||
188 | 0xBC | Command timeout | Commands which could not be executed in time |
|
|
||
193 | 0xC1 |
Load cycle count
or. Load / Unload Cycle Count |
Parking processes | Yes |
|
||
194 | 0xC2 | Drive temperature | Hard drive temperature |
|
|||
195 | 0xC3 | Hardware ECC Recovered | rescued bit errors |
|
|||
196 | 0xC4 | Reallocation Event Count |
|
|
|||
197 | 0xC5 | Current pending sector count |
|
|
|||
198 | 0xC6 | Uncorrectable Sector Count | Uncorrectable sectors |
|
|
||
199 | 0xC7 | Ultra DMA CRC Error Count | DMA CRC error | Yes |
|
||
201 | 0xC9 | Soft read error rate |
|
|
There are numerous other parameters, some of which are manufacturer-exclusive. Complete lists can be found in the literature section of the web links.
example
The evaluation of important SMART parameters using the example of a Hitachi 250 GB hard drive, connected via Serial ATA and read out with the smartmontools .
Parameter ID | Parameter name | Value (normalized current measured value) | Worst (worst value so far) | Threshold (limit value - value should be greater) | Type (maximum measured value shortly before failure) | Updated (real-time or measured value after a self-test) | RAW Value (actual measured value) | comment |
---|---|---|---|---|---|---|---|---|
2 | Throughput performance | 100 | 100 | 050 | Pre-fail | Offline | 0 | |
3 | Spin Up Time | 118 | 118 | 024 | Pre-fail | Always | 294 | Hitachi uses its own counting method, no (milli-) seconds. |
4th | Start Stop Count | 100 | 100 | 000 | Old age | Always | 772 | The hard disk motor was switched on / off 772 times, including standby starts. |
5 | Reallocated sector count | 100 | 100 | 005 | Pre-fail | Always | 55 | 55 sectors were exchanged for reserve sectors due to defects. However, the drive still rates this as problem-free (the value is still 100) - perhaps wrongly. |
7th | Seek error rate | 100 | 100 | 067 | Pre-fail | Always | 0 | So far there have been no read / write errors. |
9 | Power On Hours | 100 | 100 | 000 | Old age | Always | 1775 | Drive has been powered for 1775 hours to date. This also includes standby phases in which the plates were idle. If the evaluation program does not know the hard disk model, you have to assess for yourself whether the value represents hours, minutes or seconds. |
10 | Spin retry count | 100 | 100 | 060 | Pre-fail | Always | 0 | So far there have been no false starts, the hard disk always started without any problems. |
12 | Power cycle count | 100 | 100 | 000 | Old age | Always | 745 | So far, the PC with this hard disk has been switched on and off 745 times. |
194 | Temperature | 161 | 161 | 000 | Old age | Always |
34 + ( 10 2 16 + 49 2 32 ) |
Current temperature here would be 34 ° C. Previous life maxima of the drive were 10 ° C and 49 ° C. Value has therefore dropped from 200 to 161. |
199 | UDMA CRC error count | 200 | 253 | 000 | Old age | Always | 730 | So far there have been 730 transmission errors to the main board. The cause is either a faulty hard disk controller, a defective connection cable or a loose connection . |
Value | is a normalized measured value, which mostly counts backwards (the lower, the worse). |
---|---|
Worst | worst value so far. |
Threshold | the limit below which the value must not fall. |
Type | stands for the meaning of the parameter: "Pre-fail" is a warning of an imminent failure, while "Old age" means that it is generally a question of progressive aging (the current temperature does not necessarily fall into one of the two categories). |
Updated | indicates whether the value is updated permanently (always) or only through a self-test of the type "Offline data collection". |
RAW value | is the actual measured value, e.g. the measured temperature or the number of errors. |
Evaluation : According to the hard drive's own assessment, this drive is completely okay. Nowhere was the limit even close to being reached. According to a Google study, only the 55 replaced sectors are of concern. This value should therefore be kept in mind. However, if the “UDMA CRC Error Count” does not increase any further after the cable has been replaced and the cooling is improved so that approx. 45 ° C (temperature) is no longer exceeded, the drive can actually continue to be used without any problems.
Self-test and error log
In addition to the ongoing logging of the above parameters, there are other tests. Some manufacturers start these periodically in idle mode, others leave it to the user. He can with some of the offered programs perform. What is finally tested is also determined by the manufacturer. The standard is a short test with checking of all parameters, followed by samples of the legibility of the individual panes. The long version exchanges the sample for a complete check.
ATA-6 adds two more variants. One is recommended after a drive has been transported (called Conveyance - similar to the short test), the other allows you to test areas of the drive that you can select yourself (Selective - similar to the long test).
Since 1999 and the ATA-5 standard, errors that have occurred have not only been included in the parameter values (result for example: "Error rate: high"), but also recorded in detail. The errors, the time since the device was last switched on and the five previous steps are noted. There is even a separate table for the results of the above self-tests. In general, only current error clusters are considered to be questionable here.
If the hard disk supports updating its firmware , the error log is deleted when the hard disk is rewritten (regardless of the version). The parameter values are mostly retained.
SMART programs in comparison
The following table lists well-known programs for reading out SMART data.
Program name | Operating system (s) | price | Duration of the demo version |
target group | user interface | connection | RAID controller support | Correct interpretation of SSDs | Display of the error log | Starting the self-tests | Failure prediction | Notification at | Notification by | providers | Remarks |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Argus monitor | Windows | € 14.95 | 30 days | Beginners to advanced | graphically | (S) ATA, USB | yes (not for all) | Yes | No | No | Yes | Selectable parameter changes, limit value, temperature | Windows, sound, e-mail, execute any command | ArgusMonitor | Additionally graphic display of CPU and graphics card temperature as well as CPU core frequency and Intel 'Turbo Boost' status; Display and control of mainboard and GPU fans |
smartmontools |
Windows (native or Cygwin ), Linux , Darwin ( Mac OS X ), Free / Open / Net BSD, Solaris , OS / 2 , QNX |
Open source | - | Professional users |
Command line , optional daemon or service , graphical front end |
(S) ATA, SCSI, SAT , USB | 3ware (Linux, FreeBSD, Windows), Compaq / HP (Linux, FreeBSD), HighPoint (Linux), Intel Matrix RAID (Windows) |
Yes | Yes | yes (also time-controlled) | No | Selectable parameter changes, limit value, temperature | Window (Windows only), e-mail, system log, execute any command | smartmontools GSmartControl | manual |
HDAT2 | DOS | Freeware | - | Professional users | Text menu | (S) ATA, SCSI, USB, FireWire (some) | yes (not for all) | - | Yes | Yes | No | - | - | Lubomir Cabla | Offers setting of AAM and other parameters, as well as surface tests. |
DriveSitter | Windows | from $ 29.69 | 30 days | Advanced | graphically | (S) ATA | - | ? | Yes | Yes | Yes | Selectable parameter changes, limit value, temperature | Windows, sound, e-mail, network message, system log, execute any command | Oliver Marr | Highly scalable, switches to idle mode if required at critical temperatures. |
EASIS Drive Check | Windows | Freeware / Pro € 19.- | - | Advanced | graphically | (S) ATA, USB, surface test all | - | ? | Yes | No | No | Parameter changes | Window, email | EASIS | Can perform surface tests to find defective sectors |
HDD Health | Windows | Freeware | - | Beginners to advanced | graphically | (S) ATA | - | - | yes (in new version) | yes (in new version) | Yes | every parameter change, temperature | Window, Sound, Email, Network Message (Email and Network Commercial Version Only) | PANTERASoft | |
Active SMART | Windows | from € 18.46 | 21 days | Beginners to advanced | graphically | (S) ATA, SCSI, USB | announced | - | No | No | Yes | Limit value, temperature | Window, sound, email, network message | Ariolic ATA / SCSI / USB | Switches to idle mode if the temperature is critical. |
SpeedFan | Windows | Freeware | - | Beginners to advanced | graphically | (S) ATA, SCSI | - | yes (not for all) | No | Yes | Yes | Limit value, temperature | System notification, sound, e-mail, execute any command | Alfredo Milani Comparetti | Provides online analysis of the drive [1] , monitors PC temperatures |
SMARTReporter | Mac OS X | Open Source / Pro € 4.49 | - | Beginners | graphically | (S) ATA | - | yes (based on smartmontools) | Yes | Yes | No | limit | Execute window, email, any command | Julian Mayer | |
HDTune | Windows | Freeware HD Tune Pro 24.95 EUR | - | Beginners to advanced | graphically | (S) ATA, USB (most) | - | - | No | No | No | - | - | EFD software | Performs benchmarks and surface tests; Health for ext. HDD only in the Pro version |
Norton System Doctor | Windows | proprietary | - | Beginners | graphically | (S) ATA, SCSI, USB | ? | ? | No | No | No | Limit value (for each data carrier individually) | Taskbar icon, sound, administrative message | Symantec weblink | Can be configured individually for each data carrier, interface for Disc Doktor / chkdsk : surface test, complete test on restart |
CrystalDiskInfo | Windows | Open source | - | Beginners to advanced | graphically | (S) ATA, USB (some) | Intel Matrix RAID | Yes | Yes | No | Yes | Limit value, temperature (for each data carrier individually) | Taskbar Icon, Sound, Email, Event Log | Crystal Dew World | Offers setting of AAM and other parameters |
Acronis® Drive Monitor ™ | Windows | Freeware / proprietary | - | Beginners to advanced | graphically | (S) ATA, USB (most), software RAID controllers (many) | Software RAID controller YES, hardware controller support announced | ? | Yes | ? | Yes | Hard drive problems, temperature, "critical events", backup messages | Taskbar icon, alarm message, email | Acronis | Manual |
Samsung SSD Magician | Windows | proprietary | - | Beginners to advanced | graphically | (S) ATA | - | Yes | Yes | ? | ? | ? | - | ||
DHE Drive Info | Windows | Freeware | - | Beginners to advanced | graphically | (S) ATA, SCSI, USB | experimental | Yes | Yes | Yes | ? | Limit value, temperature | window | Dirk Hauschild | portable, no installation required |
Reading of hard disks on RAID controllers
- Only the controller manufacturer has the information required to read out the SMART status in the RAID system. So he has to make this available with his driver via API function. However, not all of them do this - and when they do, it is often manufacturer-specific and only for selected models. The table evaluates the manufacturers from which the program knows the functions.
- Addressing the controller directly without using the driver functions is more successful, but also potentially unstable and therefore only acceptable under DOS .
- If SMART support is mentioned in the controller's specifications, this is often only internal to the controller. The driver then does not pass the information on to programs, some only to that of a drive.
- Hard disks in so-called software RAIDs (i.e. groups that are managed by the operating system) and those that are set up on RAID controllers as individual drives instead of as a group can always be read out. Therefore it is not counted.
swell
- ^ Heise announcement of February 16, 2007
- ↑ a b http://research.google.com/archive/disk_failures.pdf
- ↑ - ( Memento of the original from March 21, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. Example of a reallocation of an existing SMART attribute on Indilinx controllers
- ↑ Some USB devices with SMART support (smartmontools Wiki)
- ↑ Michael Schmelzle: These SMART data are important. IDG Tech Media GmbH, October 30, 2013, accessed April 5, 2017 .
- ↑ http://forums.storagereview.net/index.php?showtopic=20731
- ↑ Figure: Read / write head in park position
- ↑ Ticket # 20275: Add support for starting tests
Web links
- Manufacturer's own software
- Fujitsu
- Hitachi
- Maxtor ( Memento from April 15, 2007 in the Internet Archive )
- Samsung
- Seagate
- Western Digital
- Ultimate Boot CD - proprietary and other tools on a bootable CD.
- SSD tools: curse or blessing? an inventory… , pc-experience.de
- Software based on availability for operating systems
- literature
- Linux community: "Prevention instead of crash"
- Introduction (English)
- Compendium (English, PDF; 679 kB)
- Background (English)
- Failure study (English, also as PDF )
- Standards
- ATA-3 Standard, Draft 7b (English, PDF) - The SMART attributes mentioned here were removed before the standard was adopted.
- ATA-8 ACS Standard, Draft 6a ( Memento from December 11, 2009 in the Internet Archive ) (English, PDF; 2.8 MB) - Last draft of the currently valid standard, the SMART attributes are still missing.
- ATA-8 Appendix on SMART Attributes ( Memento of July 3, 2007 in the Internet Archive ) (English, PDF; 24 kB) - Unaccepted proposal for an informal appendix to the ATA-8 ACS standard.