題 什麼原因導致這個HDD扇區損壞模式?


出於好奇,我跑了 GNU ddrescue 在...上 希捷ST1000LM024 HN-M101MBB硬盤 與已知的壞扇區,然後我想像了 ddrescue mapfile用 ddrescueview

當我放大時,這個有趣的模式出現了:

ddrescueview screenshot

綠色間隙(連續的好扇區)通常是2440個邏輯扇區(1249280字節)大,而紅色部分(壞的和可能是物理上受損的扇區)幾乎總是1個物理扇區(4個邏輯扇區,4096個字節)大。

考慮到這種模式,我做了一個計算來估計軌道上有多少扇區。硬盤旋轉 5400 rotations per minute,驅動器那部分的持續讀取率是大約 109 mebibytes per second (我在相同型號的驅動器上測試了這個,沒有壞扇區),一條軌道繞過 360 degrees,邏輯部門是 512 bytes

((109MiB/s / 5400rpm * 360 degrees) / 512 bytes) = 2480

估計顯示有約 2480個邏輯扇區 在受影響的軌道上。每個模式都會發生 2444個邏輯部門

這表明物理損壞看起來像一個非常薄的條帶,向內朝向幾乎與盤片的圓相切。薄帶適合的弧度小於 0.001639°。這種傷害,無論它是什麼,似乎是一個微觀的傷口。

該模式的其他特徵我無法理解。

  • 壞道只存在於第一道 1/6 驅動器的外邊緣,沒有清晰的圖案帶出現的位置。
  • 這些圖案似乎不是一個長長的傷口;它們更像是虛線/虛線(“┋”):

ddrescueview screenshot

為什麼損壞會像這樣出現?什麼可以創造這樣一個有序的損害模式?


參考

聰明。

# smartctl -a /dev/sdf
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.3.0-0.bpo.1-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Momentus SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S314J90G121745
LU WWN Device Id: 5 0004cf 20f07d081
Firmware Version: 2BA30003
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Mar 19 10:02:27 2016 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (12480) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 208) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       118722
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   092   092   025    Pre-fail  Always       -       2494
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       96
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4188
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       21
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       102
 13 Read_Soft_Error_Rate    0x003a   100   100   000    Old_age   Always       -       0
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       655675
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       56
192 Power-Off_Retract_Count 0x0022   100   100   000    Old_age   Always       -       25
193 Load_Cycle_Count        0x0032   089   089   000    Old_age   Always       -       111986
194 Temperature_Celsius     0x0002   058   050   000    Old_age   Always       -       42 (Min/Max 14/50)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   037   037   000    Old_age   Always       -       10437
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       3688
240 Head_Flying_Hours       0x0032   100   100   000    Old_age   Always       -       4186
241 Total_LBAs_Written      0x0032   097   094   000    Old_age   Always       -       4770327
242 Total_LBAs_Read         0x0032   096   094   000    Old_age   Always       -       5931956
254 Free_Fall_Sensor        0x0032   252   252   000    Old_age   Always       -       0

SMART Error Log Version: 1
Warning: ATA error count 13458 inconsistent with error log pointer 4

ATA Error Count: 13458 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 13458 occurred at disk power-on lifetime: 4188 hours (174 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 28 ec cb e2  Error: UNC 8 sectors at LBA = 0x02cbec28 = 46918696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 28 ec cb e2 08      00:02:49.684  READ DMA
  b0 d5 01 00 4f c2 00 08      00:02:49.684  SMART READ LOG
  ef 10 02 00 00 00 a0 08      00:02:49.684  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      00:02:49.684  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:02:49.684  IDENTIFY DEVICE

Error 13457 occurred at disk power-on lifetime: 4188 hours (174 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 28 ec cb e2  Error: UNC 8 sectors at LBA = 0x02cbec28 = 46918696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 28 ec cb e2 08      00:02:49.681  READ DMA
  b0 da 00 00 4f c2 00 08      00:02:49.681  SMART RETURN STATUS
  ef 10 02 00 00 00 a0 08      00:02:49.681  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      00:02:49.681  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:02:49.681  IDENTIFY DEVICE

Error 13456 occurred at disk power-on lifetime: 4188 hours (174 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 28 ec cb e2  Error: UNC 8 sectors at LBA = 0x02cbec28 = 46918696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 28 ec cb e2 08      00:02:49.677  READ DMA
  b0 d1 01 01 4f c2 00 08      00:02:49.677  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  ef 10 02 00 00 00 a0 08      00:02:49.677  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      00:02:49.677  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:02:49.677  IDENTIFY DEVICE

Error 13455 occurred at disk power-on lifetime: 4188 hours (174 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 28 ec cb e2  Error: UNC 8 sectors at LBA = 0x02cbec28 = 46918696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 28 ec cb e2 08      00:02:49.674  READ DMA
  b0 d0 01 00 4f c2 00 08      00:02:49.674  SMART READ DATA
  ef 10 02 00 00 00 a0 08      00:02:49.674  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      00:02:49.674  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:02:49.674  IDENTIFY DEVICE

Error 13454 occurred at disk power-on lifetime: 4188 hours (174 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 28 ec cb e2  Error: UNC 8 sectors at LBA = 0x02cbec28 = 46918696

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 28 ec cb e2 08      00:02:49.670  READ DMA
  ec 00 01 00 00 00 00 08      00:02:49.670  IDENTIFY DEVICE
  ef 10 02 00 00 00 a0 08      00:02:49.670  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      00:02:49.670  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:02:49.670  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      4101         933752
# 2  Short offline       Completed: read failure       90%      4101         933752
# 3  Short offline       Completed: read failure       90%      4101         1417744
# 4  Short offline       Completed without error       00%      3607         -
# 5  Short offline       Completed without error       00%      3002         -
# 6  Short offline       Completed without error       00%      2338         -
# 7  Short offline       Completed without error       00%      1044         -
# 8  Short offline       Completed without error       00%       334         -
# 9  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed_read_failure [90% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ddrescue 映射文件

ddrescueview畫廊

低解析度

高分辨率


4
2018-03-19 15:19


起源


我的一部分說“誰在乎,磁盤是壞的只是垃圾它並繼續前進”但同時我喜歡謎題,但你可能過度思考它,簡單的答案通常是正確的。我認為這是由於頭部震動或瞬間觸碰或彈跳到旋轉的圓盤,損壞塗層造成的。請記住,您正在查看多個旋轉盤片的單個平面連續映射表示(實際上是2個或4個盤片,表面有4-8個r / w,數據通常在表面上進行攪拌),同心圓逐漸變大出。 - acejavelin
@acejavelin我傾向於同意,但我覺得奇怪的是,有成千上萬個不良的物理扇區,幾乎沒有一個觸及,而且由於條帶化而無法解釋這種模式。這就像我硬盤上的麥田怪圈;我有點好奇。我可以稍後打開硬盤,看看能否找到明確的答案。 - Deltik
您可能會或可能不會看到任何損壞,壞道的數量和稀疏度並不會讓我感到驚訝,特別是如果接觸是“輕微的”,它可能在某個區域碰到了很好的旋轉而沒有損壞整個接觸面。請注意,這不是一個討論論壇,而是一個問答區域,擴展的討論將不受歡迎和感動。我之所以這麼說,是因為沒有在顯微鏡下對驅動器進行物理檢查,可能無法給出合理的答案,我們可以整天猜測這一點。 - acejavelin
“導致這種硬盤部門損壞的原因是什麼?”  - 嚴重的振動或衝擊可能導致R / W磁頭撞到磁盤(即空氣軸承故障)。通過在桌面上拖動帶有橡膠/抓腳的PC /筆記本電腦可以引起這種振動。順便說一句,你只能猜測 每個軌道的部門 由於區域位記錄而產生的值。 - sawdust
@sawdust這是合情合理的,特別是從那以後 原始用戶現在可以在拖動筆記本電腦時報告振動。 - Deltik


答案: