2020-04-25

WD Red NAS drives use SMR and I'm not happy about it - understatement of the year

If you're into NAS drives then you may have heard that recently word got around that Western Digital's NAS drives, the WD Reds specifically, apparently use SMR - Shingled Magnetic Recording. There's been a lot of fuss about it and I'm not about to repeat it all so that's why I'll provide some links to other sources I came across myself. What this blogpost is about, though, is me documenting what happened after I contacted WD from a customer perspective. Your experience may vary, but what you're about to read is mine.

A bit of history, if you don't care about that (tl;dr) you can skip to here and if you straight want to get to the results go here. About 4 weeks ago I ordered a brand-spanking-new DS1019+ with 7 WD Red 6TB (WD60EFAX) disks. Having been a very happy user of my old DS1513+ upgrading to a new Synology NAS was an easy choice. I've been using WD disks for over 25 years from back when capacity was expressed in megabytes, not gigabytes or terabytes. Having had some bad experiences in the '90's with some Seagate disks I've always just stuck with WD. Once I'm happy with a product I tend to be a very loyal customer. I've owned dozens of WD drives over the years and, yes, some have failed, but nothing out of the ordinary. Drives fail, it's a fact of life. Some even within warranty. Shit happens. To every manufacturer.

Aaah, the good ol' days when capacity was
expressed in MB instead of TB. Time flies...

However, my data is dear to me. I run my NAS in RAID6 (SHR2 since my new DS1019+) so that whenever a drive fails I don't need to worry about another drive failing in the meantime while I get a replacement. Usually I keep a few disks 'on stock' to be able to replace a disk immediately anyway. That's why I ordered 7 drives instead of the 5 that would only fit in the NAS. And, before you point it out, I am aware RAID (or SHR) is not backup. I do make backups to Amazon Glacier and TransIP's Stack storage on a scheduled interval. And my NAS itself serves for a big part as a backup in the first place for my workstation, laptops, VM's etc. I do have the 3-2-1 backup strategy down - and then some. As I said, my data is dear to me. That's why I don't mind investing, quite a lot I might add, in a decent storage solution.

Having said all that; when the time came to relieve my old NAS from it's duties it so trustworthily fulfilled over the years I set out to find good replacements for my WD3000FYZZ "Datacenter Capacity", "Enterprise" drives. Over the course of about 8 years I had replaced all but one of them with a newer one in my NAS and I still have one sealed in it's packaging. As you might notice, I don't mind paying a premium for my disks, even though I'm a "home user'". I set out on my journey and after much deliberation I came to the conclusion that the WD Red 6TB drives would be my best choice. Since my old NAS was only filled up to about 75% of it's (RAID6) capacity of about (5 - 2) x 3TB = 9TB I figured 8TB disks for my new NAS would be most likely be overkill; especially since I planned on deleting several TB's of "data hoarded" stuff I gathered over the years. So instead I opted for the 6TB drives and decided to go for the ones that have 256MB cache instead of the marginally cheaper 64MB cache drives (WD60EFRX). I figured that might be a welcome feature since I'd be going from 7200RPM to 5400RPM. And after a little bit more research I ordered my WD60EFAX drives. Yay!

I love the smell of new toys!
A few days later my drives and my new NAS came in. I couldn't wait to start playing with my new toys so that same evening I put the first 5 disks in the DS1019+ and started to initialise a RAID6 array. I gave it about 3 days, 72 hours, to initialise and by then it was still only at, I can't remember exactly, about 70% in. That... was awfully slow, wasn't it? I mean... yes, the drives were twice the size of my previous ones and "only" 5400rpm as opposed to the 7200rpm but I figured something was wrong. However, when I set up my new NAS I was asked wether I'd want "classic" RAID or "SHR", Synology's RAID-alike alternative. Instinctively I had chosen for RAID6 because.. why not? It had always worked just fine. But during the next few days I read up on SHR and had started to regret my choice for RAID anyway since, from what I read, SHR(2) should be just as performant and even be better in some respects over RAID(6). So it didn't hurt that much to abort the RAID initialisation and do it over so I could use SHR2. This time it went a lot quicker - it still took a day or two, can't remember exactly how long, but it wasn't as bad as it had been before.

Then I used Hyper Backup and Hyper Vault as per Synology's recommendation to migrate my data which, again, took... longer than I had hoped. It wasn't horrible. It wasn't unbearable. I never expected it to be blazingly fast. But somehow I was becoming more and more disappointed by my new toy and started second guessing my choice of NAS and / or drives. After everything was restored I took my time configuring my NAS and setting up Surveillance Station, Synology's NVR solution, for my 7 camera's I have in and around the house. Together they are responsible for a pretty much constant 24/7/365 data stream of around 6 to 8MB/s according to Synology's resource monitor - which is expected - which my old trusty DS1513+ with it's 5 WD3000FYZZ drives has handled for several years without even flinching.

Traffic caused by my camera's according
to Synology's Resource Monitor

I noticed my new disks 'whirring' a LOT more than my previous disks. But, hey, newer models, newer technology, newer everything so... maybe this was normal? To be expected? What do I know? But up until then I hadn't done anything to confirm my ever increasing feeling that... something was off. Later that week I copied a multi-GB ISO file and noticed that the transfer plummeted after a few seconds. My network supports Gbit transfers just fine - I even use LACP (4 NICs on my previous NAS, unfortunately the DS1019+ only has two) and up until recently I had always enjoyed my 100+ MB/s transfers. At the very least 70 to 80MB/s on a 'bad day'. My new NAS started out on a decent 100+MB/s transfer and then quickly plummeted to sub-25MB/s (and sometimes even sub-20MB/s) and stayed there.

And that's when I started testing. Copied a few files in Windows and watched the transfer graph - disappointed. Maybe... something was wrong with my SMB configuration? I tried several different tests including running hdparm in some of my Linux VM's that I mount over iSCSI on my NAS. Same disappointing figures. Tried some other NAS speed test tools and all kept confirming the same disappointing figures. What was it? What could be so wrong? Why did my drives keep whirring as if they were very busy even though I wasn't doing anything out of the ordinary? Why did they keep whirring like that even if I stopped recording my camera's streams? What was going on?

Not the performance I was expecting from
my brand new NAS and drives...
Transfers start fine at gigabit speeds...

...and then plummet.
NASTester confirms...

...as does LAN Speed Test
(note: measurements in MegaBITS, not MegaBytes)
Even hdparm agrees after a few runs

And so I set out on figuring what was going on and after a short while I stumbled upon this video. That... couldn't be! Surely this guy was wrong! Was he? Nope. He wasn't. More and more sources popped up confirming that WD used SMR in their 2-6TB drives. And not long after, Western Digital also confirmed (or should I say confessed) that they used SMR.

I tweeted at WD to let them know how cheated I felt and how disappointed I was and I got a response pointing me to their blogpost (archived version - PDF Version) which, at that time, had been updated with a follow-up confirming my drives were most likely to use SMR. It also stated: "If you have purchased a drive, please call our customer care if you are experiencing performance or any other technical issues. We will have options for you. We are here to help." And so I did. The following is why you're probably here, and what I intend to update until my issue has been solved to my satisfaction.

Contacting Western Digital and working towards a solution

I called support and spoke to "Colin S". Colin had no idea what I was referring to when I told him I contacted support because of my WD60EFAX drives being SMR drives. I had to point him to WD's own blogpost (archived version) that had instructed me to contact support. After Colin had a quick read he put me on hold to consult with his colleagues. A short while later he came back and told me there was nothing he could do other than ask me for the serial numbers and forward my case to '3rd line support'. So he created a case so I could reply to it and report the serials of my drives without having to spell them out over the phone. So we said our goodbyes and we hung up. Not long after I received an e-mail, as promised, with my case, requesting me to report the serials of my drives. And so I did.

"Colin S" created a ticket

I gave them my serials, told them my story
(and also expressed my concerns of the WD60EFAX's
apparent 180TB/year workload which was also mentioned
nowhere in the specifications at the time)

A short while later (props to WD for the quick turnaround, credit where credit is due!) I got a call from Irvine, California from "Matthew M". We spoke a little and he told me WD would be willing to swap the disks for non-SMR disks. He asked for my address, but, living in the Netherlands I'd have to spell that out letter-by-letter to make sure it got across correctly. So I asked him to update the ticket and I would reply to it with my address and other information he needed. I also told him that the drives were in use already so sending all 7 drives in to WD before getting replacements wasn't exactly... ideal. Matt told me he would look into an "Advance Replacement" where they'd send the drives over, I'd migrate to the new drives and then return the old ones.

It turns out WD intended to swap my WD60EFAX drives with WD60EFRX drives. The cheaper ones that are PMR (also known as CMR) but only have 64MB of cache. So not only did WD lie to me (or at the very least not tell the whole story) but then they try to get me to use cheaper, older, PMR drives with lower specs? You better send in an equivalent replacement at the very least. Listen, WD, I know this is kind of blowing up (in your faces) and this is a bit of a PR nightmare but... do you really want to make that worse by sending cheaper, lesser, replacements? Are you sure?

"Matt" requests additional information

I provide additional information and express that WD60EFRX
drives are, in my opinion, not an equivalent replacement.

Matt has since replied with apologies and suggested replacing the drives with WD6003FFBX (WD Red Pro) drives. Which I agreed to. Meanwhile Matt also confirmed that an Advanced Replacement could be done.

WD offers WD6003FFBX as replacement drives

I agree to their offer

WD confirms Advanced Replacement is an option and
forwards my case to the European RMA team.

So, this is where we stand now. I'm waiting for WD Europe to contact me about replacement disks. As this story further unfolds I will update this blogpost.

Update 2020-04-27 #1:

Just received this e-mail:
RMA confirmation e-mail

This e-mail confuses me more than anything else. WD has no credit-card on file, I haven't paid any $25 Advance RMA service fee and my case is about 7 drives, not a single one, even though the email only mentions a single serial, which happens to be the first of the 7 serials I reported. So maybe their system can only handle single drive cases or maybe... someone got confused or... I don't know. Clicking the "RMA Instructions" link takes me here (an yes, I am getting the same "RMA number not found" - which makes sense since there's no identifying information in the link). The "guidelines" and "packing instructions" links both take me to here. And finally the "View status" link takes me to a page that shows this:
RMA Status / instructions
Which, again, is confusing since it shows "Qty: 1" pretty explicitly and also "Pending return". It could be me and, although technically, the RMA is awaiting return, it would've made more sense if it showed "Replacement drives sent" or something. For now I'll just wait a few days and see what happens. With some luck I receive a nice pile of drives sometime in the next few days. Fingers crossed! 🤞

Update 2020-04-27 #2:

Ok, I guess I spoke (and blogged) too soon. Apparently WD decided to split my case up in 7 separate RMA's...
Over the course of an hour and 11 minutes
I received 7 RMA e-mails...

Update 2020-04-29:

I reached out to WD to clear up the confusion; especially since no drives appeared to have been sent yet (which would be a really quick turnaround, I realise that).

Trying to clear up the confusion

Confusion cleared up

Update 2020-05-06:

I just received 5 out-of 7 replacement disks at the end of the day around 18:55. The other two are still awaiting shipment according to WD's RMA page. ðŸ¤ª

5 UPS packages
Each disk packaged individually
5 x WD6003FFBX

So... I'm off rebuilding my SHR2 5 times... I'll update this post when the other disks arrive and with updated performance statistics. If anything else happens I'll post that too.

Update 2020-05-08:

Still no word on the other two disks, no status change on the RMA page either. I have pulled one WD60EFAX from my SHR2 set and replaced it with the newer WD6003FFBX disks. Rebuild started 2020-05-06 19:14:32 and ended 2020-05-08 16:21:10. Total duration: 45h 6m 38s. I just replaced the second disk. Expect another update in about 48 hours ðŸ˜…

Update 2020-05-10:

Still no word on the other two disks, no status change on the RMA page either. The second round started at 2020-05-08 16:34:56 and ended 2020-05-10 17:02:47. Total duration: 48h 27m 51s. The third disk has been swapped out and the volume is rebuilding. Again.

Update 2020-05-12:

Still no word on the other two disks, no status change on the RMA page either. It's been a week since the last message from WD. The third round started at 2020-05-10 17:07:45 and ended 2020-05-12 15:53:22. Total duration: 46h 45m 37s. The fourth disk has been swapped out and the volume is rebuilding. Again.

Update 2020-05-14:

Still no word on the other two disks, no status change on the RMA page either. The fourth round started at 2020-05-12 16:16:54 and ended 2020-05-14 12:48:08. Total duration: 44h 31m 14s. The final disk has been swapped out and the volume is rebuilding for the final time.

Update 2020-05-15:

Well... here's an update earlier than expected. The final round, the fifth rebuild of my SHR2 set started at 2020-05-14 13:34:32 and ended 2020-05-15 21:11:11. Total duration: 31h 36m 39s. That's around 30% faster after the last WD60EFAX was removed from the set. This -could- be coincidence but the difference is significant. I'll be going to take some measurements now on transfer speeds.

Oh, by the way... Still no word on the other two disks, no status change on the RMA page either. And if you're into bar graphs: here's all the start, end and duration per 'round' in a nice spreadsheet.

Update 2020-05-15: Speed test results are in!

First, let me be clear: I know and I am very aware that my measurements aren't very scientific and the tools I've used aren't the most accurate. Also, besides swapping each WD Red drive for a WD Red Pro drive and each time letting the SHR set rebuild before replacing the next one until all drives were replaced, nothing else has changed. Same NAS. Same workstation. Same network connection. Surveilance Station still recording 6-8MB/sec to the same volume. Same everything except the drives. One big difference besides the drives being SMR vs PMR is ofcourse the PRO's 7200rpm vs the 5400rpm of the non-PRO drives. However, in my (just a little, not extremely) educated opinion that shouldn't matter for multi-gigabyte sustained reads of, for example, the ISO files. Both drives claim a read speed of 210MB/s and 238MB/s respectively (source / source) which is more than enough to saturate my gigabit network connection to my workstation. For small, random, read/writes I can see this making a bigger difference.

Having said that, let the results speak for themselves...

Before... After....
Before... After...
Before... After...
Before... After....
Before... After....

I'll leave it up to you, dear reader, to draw your own conclusions. To me the result is... pretty clear.

Subjectively (I don't have any hard data to back this up) the Pro drives are a bit louder, but this was expected since they are also specified as being louder: 23 and 27 dBA for idle and seek respectively for the WD60EFAX and 29 and 36 dBA idle/seek for the WD6003FBBX drives. But one thing I immediately noticed is that the Pro drives also appear much less 'busy'. There's most definitely a lot less head movement going on.

I'm not anticipating to update this post anymore other than what happened to the last two drives I'm still expecting.

Update 2020-05-16:

Crap. After literally an entire week of rebuilding my SHR set 5 times I stumble upon this setting:

🤦‍♂️ I am a dumbass... I should've RTFM
or UTFG'd before I put a week into it...
So the rebuild times may actually have been (much) shorter had I used this setting and so the rebuild times may not be a good representation on what the drives can do. Having said that; I still think the last replacement being considerably faster is an indication and all other (granted, not very scientific) tests still hold: they were (ofcourse) performed before and after replacing the drives, not during the rebuilding.

Update 2020-05-27:

It took some time but it appears the final two disks have shipped. My ticket was closed with no update nor reason and a followup ticket referring to the first one asking for an explanation on why the initial ticket was closed without an update or the missing two disks went unanswered for over a week - even after I tried to bring it to WD's attention after a week. But, okay, the final two disks are in the mail and expected to be delivered tomorrow. As for my NAS and the replaced disks: I can most definitely say the difference is very noticeable - night and day actually. I'm not sure if my situation is unique and if results are this drastic to everyone - "your mileage may vary" - but in my case specifically it was worth the hassle. The newer drives are a tad bit louder in their operation but they sure do sound a lot less "busy" - there's the usual 'heads moving around the disks"-sounds but it is most definitely a lot less. But even disregarding the heads moving - the newer disks being 7200rpm vs. the old 5400rpm explains the Pro's being (a tiny bit) louder. But the performance improvements more than make up for that. If only WD would've been upfront about this; that would've save us all a long-ass blogpost ðŸ˜œ

Update 2020-05-28:

The last two drives have arrived 🥳 All that's left for me to do is ship the original 7 WD60EFAX drives back to WD.

Nicely packed...
...and protected
Buh bye! ðŸ‘‹

Update 2020-06-14:

A big thanks to ServeTheHome for covering this blog and the shoutout!

2 comments:

  1. I am very interested in knowing the outcome of this as I am debating the whole calling them thing myself.

    ReplyDelete
  2. Well, you got a much better response than I did. I called today as I have a drive that is less than 30 days old. They said the only thing they can do is replace it with the same model. I asked them what good that would do and they said my only recourse it to return it to where I bought it. Newegg wants to charge shipping and restocking fee making this a very undesirable solution.

    ReplyDelete