看来对于Publish的测试数据真的得好好琢磨琢磨,特别是XIV这种只写"0"的测法,真是太有创意了。
XIV Random IOPS, 20ms response time, cache miss IOPS maximum 22000 -- 27000...
I almost didn’t believe it. And I still wouldn’t, if it wasn’t corroborated from several sources.
I’ve been told that there are actually people trying to sell XIV to unsuspecting prospects using good old Hitachi Math.
That’s right. Hitachi Math. That “modernistic form of algebra that arrives at irreproducible results that also have the unique property of having absolutely no bearing on reality” that I’ve talked about here on numerous occasions. That same whacky logic that Hitachi has been using for years to mislead us all about how many meel-yun IOPS a USP can do by counting reads serviced exclusively from the buffers on the front-end Fibre Channel ports – a totally meaningless statistic.
Apparently, there are at least some who sell XIV arrays that are willing to stoop to these same lows in their quest to unseat the competition and gain footprint.
I guess given the growing market comprehension of the inarguable space and power inefficiencies of XIV’s “revolutionary” approach, coupled with the forced admissions that simultaneous dual drive failures in two separate XIV drive bays are indeed fatal and the growing realization that just because Moshe was there for the dawn of the Symmetrix era doesn’t make him all-powerful (nor the parent of today’s DMX)…well, I guess this all has proven just too much to overcome with IBM’s vaunted “trusted partner” approach to sales.
Nope, you won’t get no vendor bashing from those guys, just plain unadulterated crap-ola. When the facts get in the way, all you can do is lead with what you do best, I guess.
But I never would have guessed that anyone would attempt Hitachi Math using roman numerals.
Apparently it has been done.
those iops aren’t real iops
Several sources have identified the sleight of hand that has been pulled by more than one XIV sales representative in various accounts. The trick centers around the seemingly simple demonstrations of the performance of the XIV array in random I/O workloads.
Given the architecture of XIV, where every host volume is actually spread across 180 1TB SATA drives in “chunks” of 1MB, sequential I/O performance can be expected to be pretty good in an XIV - as it should be with any such wide-striped configuration.
But logic says that running a reasonably sized random I/O workloads against an XIV should quickly exceed the capability of cache to mask the slowness of the 7200 rpm SATA drives it uses. Sooner or later random workloads will overpower cache and start forcing read misses and write destages to meet the I/O demands.
However, many customers report that XIV pre-sales have demonstrated IOPS and response times for random workloads that exceed all logic. In fact, to anyone who understands storage performance, the results have seemed outright too good to be true.
Results like these usually set alarms off in the minds of skeptics and cynics. People started digging into these results, and where shocked at what they found.
They’d been scammed.
Here’s how: Apparently, the standard XIV performance demonstration uses the popular iometer workload generation and measurement tool (why they don’t use an SPC workload is beyond me, but that’s a story for another day). Only here’s the twist: the version and configuration of iometer used for the XIV demo has been carefully tuned to write (and read) data blocks that are entirely zero-filled – blocks of null data.
No big deal, right? I mean, it takes the same amount of time to write a block of all zeroes as it does to write a block with any other combinations of 1’s and 0’s, right?
Wrong!
At least, not on an XIV storage system.
One of the (apparently overlooked) features of the XIV storage system is that it defines the default state of all untouched/unwritten/unmodified data to be all-zeroes. And it checks incoming writes to see if they contain any 1’s, and if they DON’T, then the XIV storage system DOESN’T WRITE THE DATA, neither to disk nor to cache. And it doesn’t have to, in fact, because the data is already zero!
Similarly, reads of any unmodified data blocks don’t require actually reading data off the disk – a quick check of the LUN-to-block mapping tables, and if the block is either unallocated or not yet modified, a buffer of zeroes can returned without that annoying wait for the disk drive to actually retrieve the data.
Get the picture?
Run iometer set to write only zeros against an XIV array, and you’ll get incredibly high IOPS with amazingly low response times – because the array never has to actually move data to or from the disk drives!
UPDATED 21 Jan, 2009: An acquaintance emailed me about a similar ploy he witnessed performed by XIV representatives. Instead of iometer, he was shown how fast an XIV array could write data, using the UNIX dd command to copy from /dev/zero to the XIV array.
Same trick, different tool.
In effect, you’re simply measuring how fast the front-end nodes can figure out that a block is all zeroes.
Hitachi Math, XIV style.
life in the real world
Now, it’s a cute trick, you gotta admit – especially if you didn’t fall prey to the trickery.
Needless to say, I seriously doubt that there are many practical applications that routinely create mounds of zero-filled I/Os blocks. And the people who explained this misdirection to me noted that when you ran the same tests using iometer configured to write (and read) non-zero data, random IOPS fell back in line with what logic tells someone experienced in storage performance would expect.
Actually, much slower. It seems that for all the hype, an XIV storage array is only able to deliver somewhere between 22,000 and 27,000 cache miss IOPS maximum (dependent upon block size and referential locality). For perspective, that’s a fraction of what a CLARiiON or a Symmetrix DMX4 can deliver from 180 drives, whether they’re SATA or FC-based (XIV only supports the slower SATA drives), and whether they’re spinning hard drives or Enterprise Flash Drives.
See, there is no free lunch when it comes to storage performance. Sure, wide striping can deliver a lot of IOPS in aggregate. But in the end, when you need a specific block of data off of a SATA drive, and that block isn’t in cache, you’re going to have to wait for the disk to get the data (unless, of course, it’s all zeroes). A 7200 rpm SATA drove simply cannot deliver data as fast as a 10K or 15K rpm disk drive can – not to mention the sub-millisecond response times you can get from high-performance Enterprise Flash Drives.
So you’re going to wait.
Oh, and speaking of wide striping. Another reported application of Hitachi Math with Roman Numerals is to compare test results of 8 LUNs on an XIV vs. 8 LUNs on a Symm or a CLARiiON. But the XIV team will insist on comparing XIV’s “thin and wide” implementation against the standard “fat” allocation scheme of its competitors. For the Symm, this effectively constrains the test to using 64 drives on the Symm vs. 180 on the XIV – hardly a fair comparison. A much more accurate comparison would be to put 180 drives in both arrays, and to use Virtual Provisioning on the Symmetrix against a pool formed of all 180 drives.
you gotta wonder why?
When I started posting about the technical inefficiencies and data risks of the XIV approach to storage, I was accused by many as being scared of XIV. And even now, when other bloggers have begun to raise awareness about these same XIV issues (most recently HDS’s Claus Mikkelsen and Dimitris over at Recovery Monkey) there are those who claim it’s all just FUD and competition bashing.
Hardly.
No, we’re all just shining the lights on the issues so that people can make informed decisions – pointing out things that XIV sales folks don’t find important to share with prospects for some reason.
Things I encourage you to ask your IBM sales team to come clean on. Don’t take my word for these things – make IBM come clean with answers to the operational and technical flaws of the XIV storage systems that I discussed last year (here and there) starting even before IBM officially launched the product (did they ever)?
But first, put aside my motivations and those of Claus and Dimitris for the moment…
Ask yourself why someone selling XIV storage systems would resort to such a deplorable application of Hitachi Math as to mislead prospects about the performance of their product using a hacked up version of iometer to generate an outlandishly fake IO workload?
If the XIV array was as fast as it has been claimed, why would anyone ever have to resort to such tactics?
And if they’re offering you the XIV storage system for free, be sure you understand their motivation. Attractive as it may seem in today’s economy, we all know that there is no such thing as “free storage” in this world. No matter what promises they make, you know for sure that you’re going to have to pay for that XIV version of “free”sooner or later.
With a lot of luck, your price won’t include the unrecoverable loss of your data.
And unless your data is made up of nothing but 0’s,
with an XIV storage system, sooner or later you ARE going to pay…dearly!
ZZ from:http://thestorageanarchist.typepad.com/weblog/2009/01/1037-xiv-does-hitachi-math-with-roman-numbers.