2009年4月8日星期三

IBM XIV Could Be Hazardous to Your Career

XIV的dual drive failure真是硬伤,以后盘肯定越来越大,rebuild时间也会加长,两块盘坏的概率实在大了点。如果能够把180块盘做成类似Raid-6或许会安全很多...

So, I haven't blogged in a while. I guess I should make all of the usual excuses about being busy (which is true), etc. But the fact of the matter is that I really haven't had a whole heck of a lot that I thought would be of interest, certainly there wasn't a lot that interested me!

But now, I have something that really get my juices flowing. The new IBM XIV. I don't know if you've heard about this wonderful new storage platform from the folks at IBM, but I'm starting to bump into a lot of flolks that are either looking seriously at one, or have one, or more, on the floor now. It's got some great pluses:

  • It's dirt cheap. On top of that, I heard that IBM is willing to do whatever it takes on price to get you to buy one of these boxes, to the point that they are practically giving them away. And, as someone I know and love once said "what part of free, isn't free"?
  • Fiber channel performance from a SATA box. I guess that's one of the ways that they are using to keep the price so low.
  • Teir 1 performance and reliability at a significantly lower price point.

So, that's the deal, but like with everything in this world, there's no free lunch. Yes, that's right, I hate to break it to you folks, but you really can't get something for nothing. The question to ask yourself is, is the XIV really too good to be true? The answer is yes, it is.

But the title of this blog is pretty harsh, don't you think? Well, I think that once you understand that the real price you are paying for the "almost free' XIV could be your career, or at least your job, then you might start to understand where I'm coming from. How can that be? Well, I think that in most shops, if you are the person who brought in a storage array which eventually causes a multi-day outage in your most critical systems that your job is going to be in jeopardy. And that's what could happen to you if you buy into all of the above from IBM regarding the XIV.

What are you talking about Joerg?!? IBM says that the XIV is "self healing", and that it can rebuild the lost data on a failed drive in 30 minutes or less. So how can what your said be true? Well folks, here's the dirty little secret that IBM doesn't want you to know about the XIV. Due to its architecture if you ever lose two drives in the entire box (not a shelf, not a RAID group, the whole box all 180 drives) within 30 minutes of each other, you lose all of the data on the entire array. Yup, that's right, all your tier 1 applications are now down, and you will be reloading them from tape. This is a process that could take you quite some time, I'm betting days if not weeks to complete. That's right, SAP down for a week, Exchange down for 3 days, etc. Again, do you think that if you brought that box in after something like that your career at this company wouldn't be limited?

So, IBM will tell you that the likely hood of that happening is very small, almost infinitesimal. And they are right, but it's not zero, so you are the one taking on that risk. Here's another thing to keep in mind. Studies done at large data centers have show that disk drives don't fail in a completely random way. They actually fail in clusters, so the chances of a second drive failing within the 30 minute window after that first drive failed are actually a lot higher than IBM would like you to believe. But, hey, let's keep in mind that we play the risk game all the time with RAID protected arrays, right? But the big difference here is that the scope of the data loss is so much greater. If I have a failure in a 4+1 RAID-5 raid group, I'm going to lose some LUNs, and I'm going to have to reload that data from tape. However, it's not the entire array! So I've had a much smaller impact across my Tier 1 applications, and the recovery from that should be much quicker. With the XIV, all my Teir 1 applications are down, and they have to all be reloaded from tape.

Just so you don't think that I'm entirely negative about the XIV let me say that what I really object to here is the use of a XIV with Tier 1 applications or even Tier 2 applications. If you want to use one for Tier 3 applications (i.e. archive data) I think that makes a lot of sense. Having your archive down for a week or two won't have much in the way of a negative impact on your business, unlike having your Tier 1 or Tier 2 applications down. The once exception to that I can think of is VTL. I would never use a XIV as the disks behind a VTL. Ca you imagine what would happen if you lost all of the data in your VTL? Let's hope that you have second copies of the data!

Finally, one of the responses from IBM to all of this is "just replicate the XIV if your that worried". They right, but that doubles the cost of storage, right?

ZZ From:http://joergsstorageblog.blogspot.com/2009/01/ibm-xiv-could-be-hazardous-to-your.html

没有评论:

发表评论