Follow

Disk mapping for failed drives

How to identify failing drives

When monitoring for drive failures in Nexenta and other ZFS-based storage appliances, not all failures will manifest as taking the drive offline completely. If a user or the technician suspects that a failing drive is slowing down the I/O, they may want to look elsewhere for hardware errors that may point to impending failures. 

For a basic scan, get a list of all of the slot numbers and their corresponding GUIDs:

  • show lun slotmap

This should yield an output like this:

nmc@NS2000LEFT:/$ show lun slotmap

jbod:4:

LUN                 JBOD   Slot#    DeviceId               

c0t5000C50025FE16A7d0   jbod:4   1   id1,sd@n5000c50025fe16a7

c0t5000C50025FFB0FFd0   jbod:4   2   id1,sd@n5000c50025ffb0ff

c0t5000C50040CF1B2Fd0   jbod:4   3   id1,sd@n5000c50040cf1b2f

c0t5000C50034E9800Bd0   jbod:4   4   id1,sd@n5000c50034e9800b

c0t5000C50034FBE687d0   jbod:4   6   id1,sd@n5000c50034fbe687

c0t5000C5003412E31Fd0   jbod:4   7   id1,sd@n5000c5003412e31f

c0t5000C50034130597d0   jbod:4   8   id1,sd@n5000c50034130597


The GUIDs are highlighted in blue. Next, you’ll want to scan for hardware and software errors on the disks:

  • iostat -en

This will yield a list of GUIDs where you can see which disks are generating errors:

root@NS2000LEFT:/opt/HAC/RSF-1/log# iostat -en

 ---- errors ---

     s/w h/w trn tot device

0   0   0   0   c3d0

0   0   0   0   c0t5000C50031DB89A3d0

0   0   0   0   c0t5000C50031D8FA13d0

0   0   0   0   c0t5000C50031BB2793d0

0   0   0   0   c0t5000C50031BB2A23d0

0   0   0   0   c0t5000C50031D916A3d0

0   0   0   0   c0t5000C50042B6699Fd0

0 110 6752 6862 c0t5000C50025FE16A7d0

 

As we can see, the disk c0t5000C50025FE16A7d0 generated over 6000 errors, 110 of which were hardware errors. You can then reference the GUID back to the slotmap output to show find that the disk is installed in slot 8 of the enclosure.

nmc@NS2000LEFT:/$ show lun slotmap

jbod:4:

LUN                 JBOD   Slot#    DeviceId               

c0t5000C50025FE16A7d0   jbod:4   1   id1,sd@n5000c50025fe16a7

c0t5000C50025FFB0FFd0   jbod:4   2   id1,sd@n5000c50025ffb0ff

c0t5000C50040CF1B2Fd0   jbod:4   3   id1,sd@n5000c50040cf1b2f

c0t5000C50034E9800Bd0   jbod:4   4   id1,sd@n5000c50034e9800b

c0t5000C50034FBE687d0   jbod:4   6   id1,sd@n5000c50034fbe687

c0t5000C5003412E31Fd0   jbod:4   7   id1,sd@n5000c5003412e31f

c0t5000C50034130597d0   jbod:4   8   id1,sd@n5000c50034130597

 

You can then install a replacement drive in the slot to see if the errors subside. You will also want to check iostat again after running the replacement drive in the enclosure for a few days, to ensure that the error count doesn’t continue to increase.

In cases where significant errors are generated across all disks in an enclosure, it’s possible that a bad cable or bad backplane are generating the errors. Swap suspect hardware as necessary.

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk