Short-Circuited External Disk Recovery

Western Digital 3.5" 1.5TB drive next to the USB controller for it.

A few mornings ago, we woke up to the power out. The fuse had tripped, we flicked the switch to re-set it. We weren't sure what had tripped the fuse. My home electricity monitoring logged its last event at 02:38am, so the power must have failed just after.

Prometheus graph of whether my home servers are "up".

Later, I tried to read some data off the Western Digital My Book external hard disk I'd plugged in the night before.

The Western Digital Drive (black, right), plugged into my NAS (left).

The drive had some data from a relative who passed away, so I was keen to read it. But the disk wasn't showing up on the computer any more. Why?

Is the power supply broken?

I couldn't hear the disk spin up, unless I plugged in another power supply. So I tested the power supply with a multimeter. It should output 12V but was outputting 0V:

multimeter showing 0.00V connected to power supply
-0.00 volts on the external drive power supply. Dead!

Plug it into other machines?

I was hoping it was 'just the power supply' that shorted, that the disk was undamaged. However, with an alternative power supply, the disk still didn't mount. macOS said the disk was 'uninitialized' (unformatted), but further tries to read the first few bytes of the disk (where the partition table is stored) failed:

$ sudo cat /dev/disk4 | xxd | less
error: resource busy

That seems to say we can't even read any bytes off the disk at all! Plugging the disk into my NAS and running sudo dmesg -w, I see the kernel having errors reading the disk:

USB disk detected, then... two "Buffer I/O error" errors trying to read the drive ("async page read").

I hoped: maybe it's just the enclosure that's fried, and maybe the disk inside is still good? This was a last-resort because you have to snap the disk enclosure to open it up. So I tried plugging it into one more Linux machine, but I got much the same errors. This time, so many that buffer_io_error said 103 callbacks suppressed. I guess there were over 100 errors?

Too many errors in dmesg.

Bypass the USB controller?

I prised the case off with a screwdriver, breaking the clips:

Screwdriver going in to open up the disk case.

And I found the internal disk. A cheap Western Digital disk inside an expensive Western Digital case, that's vertical integration!

WD Caviar Green 1.5TB 3.5" disk inside the case.

I popped the raw disk into my Synology NAS. Its spare disk slot has been handy for reading many disks as I've been processing these inherited hard drives. Now Linux is talking directly to the disk, without a USB controller. Maybe it'll work?

Drive detected, then read errors, then a 'hard reset' loop.

The drive is throwing exceptions when we send read ATA commands to it, and Linux tries to reset the drive to fix it, but the reset doesn't fix it. Presumably the USB controller was trying to reset the drive too, and returning resource busy.

Conclusion

Unfortunately, I think this drive is toast. I e-wasted it, labelling it "shorted" with a sharpie, just in case any recycler tries to use it.

I failed to recover the data, but I hope that sharing the steps I followed might help others in this situation.

I'm not sure why the disk failed. The disk was maybe 10 years old. I'm not sure if drives can just 'short out' themselves? I hadn't heard of this failure mode.

The most fun explanation is: there were a few lightning storms in Sydney around the time this failed. Could it have been a lightning surge that ruined this? But no other electronics in the house seem to be broken. Is it really likely that lightning hit our power, and only one hard drive fails?

Next in this series: I evaluate cloud backup software options.

Mark Hansen

Mark Hansen

I'm a Software Engineering Manager working on Google Maps in Sydney, Australia. I write about software {engineering, management, profiling}, data visualisation, and transport.
Sydney, Australia