Change a disk from a RAID before it crash procedure:
# cat /proc/mdstat // Display disks and RAID # fdisk -l // display informatoins # lsblk // Show RAID ans disks' names # smartctl -a /dev/sdX -d cciss,N // Display S.M.A.R.T. for find the disk with problems # dmesg | grep sd* // Display kernel log for find the disk with problems
# badblocks -vs /dev/sdX // Highligh your old disk in the bay
# mdadm --detail /dev/md127 // Check # mdadm --manage /dev/md127 --fail /dev/sdX // Fail the disk # mdadm --manage /dev/md127 --remove /dev/sdX // SOFTWARE remove the disk from RAID
➜ CHANGE HARDEWARELLY THE DISK
# dmesg // Check the NEW letter of the new disk # sfdisk -d /dev/sdY | sfdisk /dev/sdX // Then copy the partition from a healthy disk from the same RAID to the new one # mdadm --add /dev/md127 /dev/sdc // And add it to the RAID # cat /proc/mdstat // Check
Somme exemples
Here, some exemples that show a future crash:
Display kernel system logs:
# dmesg | grep sd* [23806611.537971] sd 0:0:3:0: [sdc] tag#238 Sense Key : Recovered Error [current] [23806611.538821] sd 0:0:3:0: [sdc] tag#238 Add. Sense: Recovered data with linking [23806887.066626] sd 0:0:3:0: [sdc] tag#7 Sense Key : Recovered Error [current] [23806887.067510] sd 0:0:3:0: [sdc] tag#7 Add. Sense: Recovered data with linking
Display informations about SCSI devices:
# lsscsi -s [...] [0:0:1:0] disk HP AB0300CDEFG HPD3 /dev/sda 300GB [0:0:2:0] disk HP AB0300CDEFG HPD3 /dev/sdb 300GB [0:0:3:0] disk HP AB0600CDHIJ HPD2 /dev/sdc 600GB [0:0:4:0] disk HP AB0600CDHIJ HPD2 /dev/sdd 600GB [0:0:5:0] disk HP AB0600CDHIJ HPD2 /dev/sde 600GB [0:0:6:0] disk HP AB0600CDHIJ HPD2 /dev/sdf 600GB [...]
smartctl 7.1 2020-04-05 r5049 [x86_64-linux-4.18.0-348.el8.x86_64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION === Vendor: HP Product: AB0600CDHIJ Revision: HPD2 Compliance: SPC-4 User Capacity: 600,127,266,816 bytes [600 GB] Logical block size: 512 bytes Rotation Rate: 15052 rpm Form Factor: 2.5 inches Logical Unit id: yyyyyyyyyyyyyyyyyy Serial number: xxxxxxxxxxxxxxxxxxxx Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Fri May 24 08:11:40 2024 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled
=== START OF READ SMART DATA SECTION === SMART Health Status: HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS [asc=5d, ascq=14]
Current Drive Temperature: 38 C Drive Trip Temperature: 60 C
Manufactured in week 01 of year 2016 Specified cycle count over device lifetime: 10000 Accumulated start-stop cycles: 42 Specified load-unload count over device lifetime: 300000 Accumulated load-unload cycles: 2940 Elements in grown defect list: 8000
Manage RAID:
# mdadm --detail /dev/md127
/dev/md127: Version : 1.2 Creation Time : Mon Jun 6 13:59:49 2016 Raid Level : raid10 Array Size : 1171860480 (1117.57 GiB 1199.99 GB) Used Dev Size : 585930240 (558.79 GiB 599.99 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri May 24 09:06:38 2024 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0
Layout : near=2 Chunk Size : 512K
Consistency Policy : bitmap
Name : 2 UUID : aaaaaaaa:bbbbbbbb:cccccccc:dddddddd Events : 193414
Number Major Minor RaidDevice State 0 8 32 0 active sync set-A /dev/sdc 1 8 48 1 active sync set-B /dev/sdd 2 8 64 2 active sync set-A /dev/sde 3 8 80 3 active sync set-B /dev/sdf
How to change a RAID disk ?
Procedure to change un RAID disk before it chrash: