Eric Hua's Blog: 2009

2009年5月13日星期三

Solid Disk Drive's MTBF

Using the architectural definitions and modeling tools for the STEC ZeusIOPS wear-leveling algorithms and assuming that the SLC NAND flash will tolerate exactly 100,000 Program/Erase (P/E) cycles, the math says that the latest version of the 256GB (raw) STEC ZeusIOPS drive will wear out below it's rated usable capacity when exposed to a 100% 8KB write workload with 0% internal cache hit at a constant arrival rate of 5000 IOs per second in 4.92 years when configured at 200GB, and in 8.91 years configured at 146GB (yeah, I was off by .08 years).

Unfortunately, I cannot share the actual data or spreadsheet used to compute these numbers because they contain STEC proprietary information about their architecture and wear-leveling algorithms. So you'll have to trust me on this, and trust that IBM and EMC are in fact using the same STEC drives with the identical wear-leveling algorithms, just formatted at different capacities.

At a mix of 50/50 Read/write, the projected life of the drive is 9.84 years @ 200GB, and 17.8 years @ 146GB. And for what TonyP asserts is the "traditional business workload" (70% read / 30% write) the projected life expectancy is a healthy 16 years @200GB and 30 years @146GB.

Now, that's long enough for the drives to be downright ancient - more likely they will have been replaced with newer/faster technology long before the drive is even half-through its P/E life expectancy under those conditions.

So in the Real World that we all actually live in, nothing is ever 100% write – even database logs (which are not recommended for Flash drives) will not typically generate a 100% constant write workload at max drive IOPS. And the current generation of SLC NAND has been observed to easily exceed 100,00 P/E cycles, so even the above numbers are extremely conservative.

No, the truth is, the difference between the projected life at 146GB and 200GB on a 256GB (raw) ZeusIOPS is truly insignificant...and your data is no more at risk for the expected life of the drive either way.

Unless, of course, your array can't adequately buffer writes or frequently writes smaller than 8k blocks which will drive up the write amplification factor...two issues I suspect the DS8K in fact suffers from. Which, of course, would explain why IBM's Distinguished Engineers wouldn't want to take the risk with the DS8K. They don't get to be DEs by leaving things to chance, to be sure.

Symmetrix, on the other hand, isn't subject to these risk factors. Writes are more deeply buffered and delayed by the larger write cache of Symmetrix (DS8K is limited to 4GB or 8GB of non-volatile write cache vs. 80% of 256GB on DMX4 and 80% of 512GB on V-Max). Symmetrix writes are always aligned to the ZeusIOPS' logical page size to minimize write amplification, and the P/E cycles experienced by the NAND in the drive is proactively monitored to enable pre-emptive replacement should a drive exhibit premature or runaway wear-out.

Not so the DS8K, apparently…hence the conservative approach.

ZZ from: http://thestorageanarchist.typepad.com/weblog/2009/05/2002-meh-ibm-really-really-doesnt-get-flash.html

2009年4月14日星期二

How to make good use of technology

The following is an interesting view of orphaned storage, or stranded capacity will vary you look at your own data/storage stack:

Storage Capacity Definition

Changing RAID levels will impact the delta between RAW and Usable.
Array virtualization will reduce the waste between usable and allocated
Thin provisioning will save space differences between allocated and used
Deduplication will impact utilized and application content
Archive will reduce both allocated and utilized volumes
SRM will help in managing and monitoring all the areas above

The right architecture is needed to right-size the total storage estate. Look at short term and long term options to improve ROA and overall utilization.

But the ultimate goal is to reduce TCO, so we should make a balance among several aspects such like performance, space efficiency and purchase budget and maintance cost.

2009年4月8日星期三

Enterprise Flash Drive Cost and Technology Project

In January a year ago, EMC surprised the IT world with the introduction of Flash drives. In the Wikibon Peer Insight this Tuesday (2/24/2009) we heard that EMC had introduced 200GB and 400GB flash drives, and reduced the price of flash drives relative to disk drives. Other leading vendors such as HDS, IBM HP and Sun have all introduced flash drives, and most if not all storage vendors have plans to introduce them in 2009.

Flash drives have two major benefits for reducing storage and IT energy budgets.

The ability to perform hundreds of times more I/O that traditional disks and replace large numbers of disks that are I/O constrained. This allows the remaining data that is I/O light to be spread across fewer high capacity lower speed SATA hard drives. The impact is fewer actuators, fewer drives and more efficient storage controllers., leading to lower storage and energy costs.
The ability to increase system throughput by reducing I/O response times. Flash can have a profound effect for workloads which are elapsed time sensitive. One EMC customer was able to avoid purchasing 1,000 system Z mainframe MIPS and software by reducing batch I/O times with flash drives. Others have placed critical database tables on flash volumes and significantly improved throughput. By eliminating a large proportion of “Wait for I/O” time, Wikibon estimates that between 2% and 7% of processor power and energy consumption can be saved.

In the Peer Insight, Daryl Molitor of JC Penney articulated a clear storage strategy of replacing FC disks with flash, and meeting the rest of the storage requirements with high density SATA disks. Daryl’s objective was to reduce storage costs and energy requirements. This bold strategy begs two questions:

In what time scale will Flash drives replace FC drives?

In June 2008 I wrote a Wikibon article "Will NAND storage obsolete FC drives?" The update of projection chart in the original article is shown in chart 1 below.

It shows that the actual reduction in prices of NAND storage is coming down at about 60%/year. At this rate of comparative reduction, FC drives will be obsolete in less than three years time. There is already significant opportunity to move some data to flash drives, and by starting now Daryl is placing himself in a good strategic position.

What architectural, infrastructure and/or ecosystem changes must be available to implement this strategy? Some vendors and analysts have predicted that Flash technology will profoundly change the way that systems are designed, leading to flash being implemented in multiple places in the systems architecture. However, such fundamental architectural changes will also require significant changes in the operating systems, database software and even application software to exploit it. Gaining industry agreement to such changes will not happen within three years. Disk drives are currently the standard technology for non-volatile secure access to data and will remain the standard for at least the next three to five years. EMC was right to introduce flash technology as a disk drive as the simplest way to introduce the technology within the current software ecosystem.

That is not to say that technology changes are not required. Vendors and analysts have pointed out that the architecture of all current array systems are not designed to cope with flash storage devices that operate at such low latencies. This leads to limited numbers of flash drives being supported within an array, and less than optimal performance from the flash drives. Vendors are moving to fix this, and this will happen within three years.

The most fundamental architectural change required is to ensure that the right data is placed on flash storage. To begin with specific database tables and high activity volumes are being moved to flash drives manually on an individual basis. The next stage will be to automate the dynamic movement of data to and from flash drives to optimize overall IO performance. A prerequisite is to be able to track I/O activity on blocks of data and hold the metadata. Virtualization architectures will have a head-start in providing the infrastructure to provide monitoring and automated dynamic movement of data blocks.

So which vendors will provide the flash technology that operates efficiently in a storage array and provides automated dynamic (second by second) data balancing? At the moment, none can. Clearly EMC have a head-start in understanding the technology, understanding customer usage and understanding the storage array requirements. The storage vendors offering virtualization are also well positioned; Compellent has probably the most versatile architecture with its unique ability to dynamically move data within a storage volume to different tiers, and IBM has broken the 1 million IOPS barrier for an SPC workload with flash storage connected to an IBM San Volume Controller (SVC). Other storage virtualization vendors such as 3PAR, NetApp, HP and Hitachi are also well positioned.

Action Item: The race is on. Storage executives should be exploring the use of flash drives for trouble spots in the short term on existing arrays in order to build up knowledge and confidence in the technology. For full scale implementation, storage executives should wait for solutions that provide storage arrays modified to accommodate low-latency flash drives and automated dynamic placement of data blocks to optimize the use of flash.

ZZ From:http://wikibon.org/?c=wiki&m=v&title=Enterprise_Flash_Drive_Cost_and_Technology_Projections

VMWare and how it effects Storage

1:VMware Server 因为上面的很多OS共用一个或几个HBA，IO为small block Random IO,因此对IOPS的要求比对带宽的要求高..

"VMWare Changes Everything"

That's a lovely marketing phrase, but when it comes to storage, it does, and it doesn't. What you really need to understand is how VMWare can effect your storage environment as well as the effects that storage has on your VMWare environment. Once you do, you'll realize that it's really just a slightly different take on what storage administrators have always battled. First some background.

Some Server Virtualization Facts

The trend of server virtualization is well under way and it's moving rapidly from test/dev environments into production environments. Some people are implementing in a very aggressive way. For example, I know one company who's basic philosophy is "it goes in a VM unless it absolutely can be proven it won't work, and even then we will try it there first."
While a lot of people think that server consolidation is the primary motivating factor in the WMVware trend, I have found that many companies are also driven by Disaster Recovery since replicating VMs is so much easier then building duplicate servers at a DR site.
85% of all virtual environments are connected to a SAN, that's down from nearly 100% a short time ago. Why? Because NFS is making a lot of headway, and that makes a lot of sense since it's easier to address some of the VMWare storage challenges with NFS than it is with traditional fiber channel LUNs.
VMWare changes the way that servers talk to the storage. For example, they force the use of more advanced file systems like VMFS. VMFS is basically a clustered file system and that's needed in order to perform some of the more attractive/advanced things you want to do with VMWare like VMotion.

Storage Challenges in a VMWare Environment

Application performance is dependant on storage performance. This isn't news for most storage administrators. However, what's different is that since VMWare can combine a number of different workloads all talking through the same HBA(s), the result is that the workload as seen by the storage array turns into a highly random, usually small block I/O workload. These kinds of workloads are typically very sensitive to latency much more than they require a great deal of bandwidth. Therefore the storage design in a VMWare environment needs to be able to provide for this type of workload across multiple servers. Again, something that storage administrators have done in the past for Exchange servers, for example, but on a much larger scale.
End to end visibility from VM to physical disk is very difficult to obtain for storage admins with current SRM software tools. These tools were typically designed with the assumption that there was a one-to-one correspondence between a server and the application that ran on that server. Obviously this isn't the case with VMWare, so reporting for things like chargeback becomes a challenge. This also effects troubleshooting and change management as well since the clear lines of demarcation between server administration and storage administration are now blurred by things like VMFS, VMotion, etc.
Storage utilization can be significantly decreased. This is due to a couple of factors, the first of which is that VMWare requires more storage overhead to hold all of the memory, etc. so that it can perform things like VMotion. The second reason that VMWare uses more storage is that VMWare admins tend to want very large LUNs assigned to them to hold their VMFS file systems and to have a pool of storage that they can use to rapidly deploy a new VM. This means that there is a large pool of unused storage sitting around on the VMWare servers waiting to be allocated to a new VM. Finally, there is a ton of redundancy in the VMs. Think about how many copies of Windows are sitting around in all those VMs. This isn't new, but VMware sure shows it to be an issue.

Some Solutions to these Challenges

As I see it there are three technical solutions to the challenges posed above.

Advanced storage virtualization - Things like thin provisioning to help with the issue of empty storage pools on the VMWare servers. Block storage virtualization to provide the flexibility to move VMWare's underlying storage around to address issues of performance, storage array end of lease, etc. Data de-dupulication to reduce the redundancy inherent in the environment.
Cross domain management tools - Tools that have the ability to view storage all the way from the VM to the physical disk and to correlate issues between the VM, server, network, SAN, and storage array are beginning to come onto the market and will be a necessary part of any successful large VMWare rollout.
Virtual HBAs - These are beginning to make their way onto the market and will help existing tools to work in a VMWare environment.

Conclusion

Organizations need to come to the realization that with added complexity comes added management challenges and that cross domain teams that encompass VMWare Admins, Network Admins, and SAN/Storage Admins will be necessary in order for any large VMWare rollout to be successful. However, the promise of server virtualization to reduce hardware costs and make Disaster Recovery easier is just too attractive to ignore for many companies and the move to server virtualization over the last year shows that a lot of folks are being drawn in. Unfortunately, unless they understand some of the challenges I outlined above, they may be in for some tough times and learn these leassons the hard way.

--joerg

ZZ From: http://joergsstorageblog.blogspot.com/2008/06/vmware-and-how-it-effects-storage.html

IBM XIV Could Be Hazardous to Your Career

XIV的dual drive failure真是硬伤，以后盘肯定越来越大，rebuild时间也会加长,两块盘坏的概率实在大了点。如果能够把180块盘做成类似Raid-6或许会安全很多...

So, I haven't blogged in a while. I guess I should make all of the usual excuses about being busy (which is true), etc. But the fact of the matter is that I really haven't had a whole heck of a lot that I thought would be of interest, certainly there wasn't a lot that interested me!

But now, I have something that really get my juices flowing. The new IBM XIV. I don't know if you've heard about this wonderful new storage platform from the folks at IBM, but I'm starting to bump into a lot of flolks that are either looking seriously at one, or have one, or more, on the floor now. It's got some great pluses:

It's dirt cheap. On top of that, I heard that IBM is willing to do whatever it takes on price to get you to buy one of these boxes, to the point that they are practically giving them away. And, as someone I know and love once said "what part of free, isn't free"?
Fiber channel performance from a SATA box. I guess that's one of the ways that they are using to keep the price so low.
Teir 1 performance and reliability at a significantly lower price point.

So, that's the deal, but like with everything in this world, there's no free lunch. Yes, that's right, I hate to break it to you folks, but you really can't get something for nothing. The question to ask yourself is, is the XIV really too good to be true? The answer is yes, it is.

But the title of this blog is pretty harsh, don't you think? Well, I think that once you understand that the real price you are paying for the "almost free' XIV could be your career, or at least your job, then you might start to understand where I'm coming from. How can that be? Well, I think that in most shops, if you are the person who brought in a storage array which eventually causes a multi-day outage in your most critical systems that your job is going to be in jeopardy. And that's what could happen to you if you buy into all of the above from IBM regarding the XIV.

What are you talking about Joerg?!? IBM says that the XIV is "self healing", and that it can rebuild the lost data on a failed drive in 30 minutes or less. So how can what your said be true? Well folks, here's the dirty little secret that IBM doesn't want you to know about the XIV. Due to its architecture if you ever lose two drives in the entire box (not a shelf, not a RAID group, the whole box all 180 drives) within 30 minutes of each other, you lose all of the data on the entire array. Yup, that's right, all your tier 1 applications are now down, and you will be reloading them from tape. This is a process that could take you quite some time, I'm betting days if not weeks to complete. That's right, SAP down for a week, Exchange down for 3 days, etc. Again, do you think that if you brought that box in after something like that your career at this company wouldn't be limited?

So, IBM will tell you that the likely hood of that happening is very small, almost infinitesimal. And they are right, but it's not zero, so you are the one taking on that risk. Here's another thing to keep in mind. Studies done at large data centers have show that disk drives don't fail in a completely random way. They actually fail in clusters, so the chances of a second drive failing within the 30 minute window after that first drive failed are actually a lot higher than IBM would like you to believe. But, hey, let's keep in mind that we play the risk game all the time with RAID protected arrays, right? But the big difference here is that the scope of the data loss is so much greater. If I have a failure in a 4+1 RAID-5 raid group, I'm going to lose some LUNs, and I'm going to have to reload that data from tape. However, it's not the entire array! So I've had a much smaller impact across my Tier 1 applications, and the recovery from that should be much quicker. With the XIV, all my Teir 1 applications are down, and they have to all be reloaded from tape.

Just so you don't think that I'm entirely negative about the XIV let me say that what I really object to here is the use of a XIV with Tier 1 applications or even Tier 2 applications. If you want to use one for Tier 3 applications (i.e. archive data) I think that makes a lot of sense. Having your archive down for a week or two won't have much in the way of a negative impact on your business, unlike having your Tier 1 or Tier 2 applications down. The once exception to that I can think of is VTL. I would never use a XIV as the disks behind a VTL. Ca you imagine what would happen if you lost all of the data in your VTL? Let's hope that you have second copies of the data!

Finally, one of the responses from IBM to all of this is "just replicate the XIV if your that worried". They right, but that doubles the cost of storage, right?

ZZ From:http://joergsstorageblog.blogspot.com/2009/01/ibm-xiv-could-be-hazardous-to-your.html

xiv does hitachi math with roman numerals

看来对于Publish的测试数据真的得好好琢磨琢磨，特别是XIV这种只写"0"的测法，真是太有创意了。
XIV Random IOPS, 20ms response time, cache miss IOPS maximum 22000 -- 27000...

I almost didn’t believe it.

And I still wouldn’t, if it wasn’t corroborated from several sources.

I’ve been told that there are actually people trying to sell XIV to unsuspecting prospects using good old Hitachi Math.

That’s right. Hitachi Math. That “modernistic form of algebra that arrives at irreproducible results that also have the unique property of having absolutely no bearing on reality” that I’ve talked about here on numerous occasions. That same whacky logic that Hitachi has been using for years to mislead us all about how many meel-yun IOPS a USP can do by counting reads serviced exclusively from the buffers on the front-end Fibre Channel ports – a totally meaningless statistic.

Apparently, there are at least some who sell XIV arrays that are willing to stoop to these same lows in their quest to unseat the competition and gain footprint.

I guess given the growing market comprehension of the inarguable space and power inefficiencies of XIV’s “revolutionary” approach, coupled with the forced admissions that simultaneous dual drive failures in two separate XIV drive bays are indeed fatal and the growing realization that just because Moshe was there for the dawn of the Symmetrix era doesn’t make him all-powerful (nor the parent of today’s DMX)…well, I guess this all has proven just too much to overcome with IBM’s vaunted “trusted partner” approach to sales.

Nope, you won’t get no vendor bashing from those guys, just plain unadulterated crap-ola. When the facts get in the way, all you can do is lead with what you do best, I guess.

But I never would have guessed that anyone would attempt Hitachi Math using roman numerals.

Apparently it has been done.

those iops aren’t real iops

Several sources have identified the sleight of hand that has been pulled by more than one XIV sales representative in various accounts. The trick centers around the seemingly simple demonstrations of the performance of the XIV array in random I/O workloads.

Given the architecture of XIV, where every host volume is actually spread across 180 1TB SATA drives in “chunks” of 1MB, sequential I/O performance can be expected to be pretty good in an XIV - as it should be with any such wide-striped configuration.

But logic says that running a reasonably sized random I/O workloads against an XIV should quickly exceed the capability of cache to mask the slowness of the 7200 rpm SATA drives it uses. Sooner or later random workloads will overpower cache and start forcing read misses and write destages to meet the I/O demands.

However, many customers report that XIV pre-sales have demonstrated IOPS and response times for random workloads that exceed all logic. In fact, to anyone who understands storage performance, the results have seemed outright too good to be true.

Results like these usually set alarms off in the minds of skeptics and cynics. People started digging into these results, and where shocked at what they found.

They’d been scammed.

Here’s how: Apparently, the standard XIV performance demonstration uses the popular iometer workload generation and measurement tool (why they don’t use an SPC workload is beyond me, but that’s a story for another day). Only here’s the twist: the version and configuration of iometer used for the XIV demo has been carefully tuned to write (and read) data blocks that are entirely zero-filled – blocks of null data.

No big deal, right? I mean, it takes the same amount of time to write a block of all zeroes as it does to write a block with any other combinations of 1’s and 0’s, right?

Wrong!

At least, not on an XIV storage system.

One of the (apparently overlooked) features of the XIV storage system is that it defines the default state of all untouched/unwritten/unmodified data to be all-zeroes. And it checks incoming writes to see if they contain any 1’s, and if they DON’T, then the XIV storage system DOESN’T WRITE THE DATA, neither to disk nor to cache. And it doesn’t have to, in fact, because the data is already zero!

Similarly, reads of any unmodified data blocks don’t require actually reading data off the disk – a quick check of the LUN-to-block mapping tables, and if the block is either unallocated or not yet modified, a buffer of zeroes can returned without that annoying wait for the disk drive to actually retrieve the data.

Get the picture?

Run iometer set to write only zeros against an XIV array, and you’ll get incredibly high IOPS with amazingly low response times – because the array never has to actually move data to or from the disk drives!

UPDATED 21 Jan, 2009: An acquaintance emailed me about a similar ploy he witnessed performed by XIV representatives. Instead of iometer, he was shown how fast an XIV array could write data, using the UNIX dd command to copy from /dev/zero to the XIV array.

Same trick, different tool.

In effect, you’re simply measuring how fast the front-end nodes can figure out that a block is all zeroes.

Hitachi Math, XIV style.

life in the real world

Now, it’s a cute trick, you gotta admit – especially if you didn’t fall prey to the trickery.

Needless to say, I seriously doubt that there are many practical applications that routinely create mounds of zero-filled I/Os blocks. And the people who explained this misdirection to me noted that when you ran the same tests using iometer configured to write (and read) non-zero data, random IOPS fell back in line with what logic tells someone experienced in storage performance would expect.

Actually, much slower. It seems that for all the hype, an XIV storage array is only able to deliver somewhere between 22,000 and 27,000 cache miss IOPS maximum (dependent upon block size and referential locality). For perspective, that’s a fraction of what a CLARiiON or a Symmetrix DMX4 can deliver from 180 drives, whether they’re SATA or FC-based (XIV only supports the slower SATA drives), and whether they’re spinning hard drives or Enterprise Flash Drives.

See, there is no free lunch when it comes to storage performance. Sure, wide striping can deliver a lot of IOPS in aggregate. But in the end, when you need a specific block of data off of a SATA drive, and that block isn’t in cache, you’re going to have to wait for the disk to get the data (unless, of course, it’s all zeroes). A 7200 rpm SATA drove simply cannot deliver data as fast as a 10K or 15K rpm disk drive can – not to mention the sub-millisecond response times you can get from high-performance Enterprise Flash Drives.

So you’re going to wait.

Oh, and speaking of wide striping. Another reported application of Hitachi Math with Roman Numerals is to compare test results of 8 LUNs on an XIV vs. 8 LUNs on a Symm or a CLARiiON. But the XIV team will insist on comparing XIV’s “thin and wide” implementation against the standard “fat” allocation scheme of its competitors. For the Symm, this effectively constrains the test to using 64 drives on the Symm vs. 180 on the XIV – hardly a fair comparison. A much more accurate comparison would be to put 180 drives in both arrays, and to use Virtual Provisioning on the Symmetrix against a pool formed of all 180 drives.

you gotta wonder why?

When I started posting about the technical inefficiencies and data risks of the XIV approach to storage, I was accused by many as being scared of XIV. And even now, when other bloggers have begun to raise awareness about these same XIV issues (most recently HDS’s Claus Mikkelsen and Dimitris over at Recovery Monkey) there are those who claim it’s all just FUD and competition bashing.

Hardly.

No, we’re all just shining the lights on the issues so that people can make informed decisions – pointing out things that XIV sales folks don’t find important to share with prospects for some reason.

Things I encourage you to ask your IBM sales team to come clean on. Don’t take my word for these things – make IBM come clean with answers to the operational and technical flaws of the XIV storage systems that I discussed last year (here and there) starting even before IBM officially launched the product (did they ever)?

But first, put aside my motivations and those of Claus and Dimitris for the moment…

Ask yourself why someone selling XIV storage systems would resort to such a deplorable application of Hitachi Math as to mislead prospects about the performance of their product using a hacked up version of iometer to generate an outlandishly fake IO workload?

If the XIV array was as fast as it has been claimed, why would anyone ever have to resort to such tactics?

And if they’re offering you the XIV storage system for free, be sure you understand their motivation. Attractive as it may seem in today’s economy, we all know that there is no such thing as “free storage” in this world. No matter what promises they make, you know for sure that you’re going to have to pay for that XIV version of “free”sooner or later.

With a lot of luck, your price won’t include the unrecoverable loss of your data.

And unless your data is made up of nothing but 0’s,
with an XIV storage system, sooner or later you ARE going to pay…dearly!
ZZ from:http://thestorageanarchist.typepad.com/weblog/2009/01/1037-xiv-does-hitachi-math-with-roman-numbers.

Wide striping is a two edged sword

当XIV的sales 对你的忽悠说，XIV可以提供和FC阵列提供一样的IOPS时，千万要记住，这是在牺牲存储的利用率的基础上得到的。
I have spent a lot of time lately talking with some of my coworkers, friends, etc. on the topic of wide striping. This topic keeps coming up since there are now a number of vendors selling storage arrays with SATA drives that claim to have "the same performance as fiber channel". Some of the Sales folks I work with keep asking how we are supposed to dissuade people from that idea, or if it's true. One of the prime offenders in this regard is IBM with their new XIV array. The XIV uses wide striping and SATA drives and they claim to have "enterprise performance" at a very low price point. But they aren't the only ones; you have Dell telling people the same thing about their EqualLogic line of storage as well, and there are other too. For an excellent article about the XIV and its performance claims, take a look at http://thestorageanarchist.typepad.com/weblog/2009/01/1037-xiv-does-hitachi-math-with-roman-numbers.html.
What I usually tell them is that the statement is true; you can get fiber channel performance by striping across a large number of SATA drives. The only problem is that you have to give up a lot of usable disk space in order to keep it that way. A quick example usually illustrates the point quite well. Let's say that for the sake of easy math the average application in your environment uses about 5TB of space (I'm sure some are a lot more, and some a lot less, but we are talking average here). Let's also say that you need about 2,000 IOPS per application in order to maintain the 20ms max response time you need in order to meet the SLAs you have with your customers. Finally, let's also assume that your SATA array has about 90TB of useable space using 180 750GB SATA drives and you can get about 20,000 IOPS in total from the array. So, let's do some basic math here. That means that you can run about 10 applications at 5 TB apiece which will take up about 50TB. So, your array will perform well, right up until you cross the ½ full barrier. After that, performance will slowly decline as you add more application/data to the array.
So, what does this mean? It means that the cost per GB of these arrays is really about twice what the vendors would have you believe. OK, but considering how much cheaper SATA drives are than 15K fiber channel drives, that's still OK, right? Sure, as long as you are willing to run your XIV at ½ capacity. In today's' economic climate, that's going to be tough to do. I can just imagine the conversation between your typical CIO and his Storage Manager.
Storage Manager – "I need to buy some more disk space."
CIO – "What are you talking about, you're only at 50% used in theses capacity reports you send me and we didn't budget for a storage expansion in the first year after purchase!"
Storage Manager – "Well, you know all that money we are saving by using SATA drives? Well, it means I can't fill up the array; I have to add space once I reach 50% or performance will suffer."
CIO – "So let performance suffer! We don't have budget for more disk this year. Why didn't you tell me this when you came to me with that 'great idea' of replacing our 'enterprise' arrays with a XIV?!?!"
Storage Manager – "Ahhh … ummmmm … gee, I didn't know, IBM didn't tell me! But we had some performance issues early on, and figured this out. Do you really want to tell the SAP folks that their response time is going to double over the next year?"
CIO – "WHAT! We can't let that happen, we have an SLA with the SAP folks and my bonus is tied to keeping our SLAs! How could you let something like this happen! Maybe I should use the money for your raise to pay for the disks!"
Storage Manager – "Um, well, actually, we need to buy an entire new XIV, the one we have is already full."
OK, enough fun, you get the idea … make sure you understand what wide striping really buys you and if you decide that the TCO and ROI make sense, make sure you communicate that up the management tree in the clearest possible terms. Look at the applications that you currently run, see how much space they require, but don't base the sizing of your EqualLogic (see, I'm not just bashing the XIV) just on your space requirements. Base them more on your IOPS requirements. With SATA drives chances are pretty good that if you size for IOPS, you'll have more than enough space.

ZZ from:http://joergsstorageblog.blogspot.com/2009/01/wide

The Real Cost of Storage

存储的价格需要两个方面来衡量：

1：采购时单位容量的价格

2：维护的费用，包括维护的人员费用，厂商的维护费用等等..

What's the real cost of storage? I get asked this question all the time, and it's so difficult to answer because it really does depend on so many factors from storage team to storage team. What's really surprising to me is that I'm being asked the question at all. You would think that everyone who runs a storage organization would know exactly what that number is. Some people simply look at their budget and say "here you go, this is what it costs". But can you break it down? Do you know where all of that money is going, and why? I think that's really what people are asking. They know how much they are spending, but they want to know why and how they can save money. Certainly in these economic times storage managers are asked to do more with less while the data continues to grow. So that leaves them asking, how? How do I manage to address this growing pile of data with fewer people, less CAPEX budget, and more demands from the business around things like disaster recovery?

So how do you address the question? How do you do more with less? A lot of storage managers are looking at the cost per GB of their disks and asking, can I get this number down? I think that they can, but it may mean doing some things in different ways than they have in the past. Specifically, here are some things to look at.

Tiered Storage

Yup, I'm recycling that idea again. Getting data off expensive spinning disk and onto cheaper disk saves money, I think that's been well established and taking another look at how you are classifying your data is a worthwhile endeavor at this point in time. Why? Because things have changed in the last year or two, and those changes might have an impact on your data classification policies, so I think a review might be in order. For example, a few years ago when I was classifying data I used SATA disk pretty much just for dev/test and archive data. But things have changed, and now there's technology out there that will allow you to use SATA disk for some of your production workload. Some technology that will even allow you to use SATA drives for all but your most demanding workloads for that matter the IBM's new XIV are now available. So, another look at your tiering policies and the SATA technology that's available today is probably a good use of your time if you're looking to save some money.

Cost of Managing Your Storage

What does it really cost to manage your storage on a per GB basis? This is really the age old question of "how many TB of storage can a single storage admin administer?" that we have been asking for a long time. The answer to this question is critical since you probably aren't getting a whole lot more headcount right now, and you might even be asked to give some up. So how do you manage more disk space with the same or fewer people? First, you have to keep in mind all of the things that go into managing a TB of space. There's a lot more to it than just provisioning a TB to an application and then walking away, right? Here are a few examples of the kinds of things that go into managing a TB of space based on my experience:

Provisioning – This one is obvious, right? But you would be surprised how many people have immature processes and procedures around disk provisioning. How many people still manage their disks based on spreadsheets and command-line scripts making the process time consuming and error prone.
Backup/recovery – So you have to make sure that your data is protected, and that you can get it back should the need arise. This can be a time consuming effort, and one place that you can look for efficiencies that will save you money. It's also a place that people sometimes forget to account for when they are buying more disks. Don't forget that as you add disk capacity, you also have to add backup/restore capacity, and that means more tape, or backup disk, etc but it also means that you have to account for the increased load on your backup admins as well.
Disaster recovery – All of the same things I talked about above with backup/recovery also applies to DR.
Data migration – Sooner or later you're going to have to move this data around. Whether it's because the lease is up on an array, or you need to re-tier the data doesn't matter, what matters is that this can be a costly process in perms of people time, and sooner or later you're going to have to do it.
Performance management – At some point you always get that call "hey, our database is slow and we've looked at everything else and haven't found the problem, can you look and see if it's the disks?" Unless you have some very mature performance management processes in place, this tends to turn into a huge people time sink.
Capacity management – We all know that our data is growing, that's a given, so that means that we need to spend some time planning how we are going to address that growth. When are we going to have to make those new disk purchases, when will we have to buy a whole new array? What about the switches? Are we going to need to expand that environment when we bring in that new array as well?
Documentation – yes, that's right, I said it, documentation is an important part of managing your storage, and it can take up quite a bit of the storage admins time, but it has to be done.

So the question I always ask is, "how mature and efficient are your processes?" Do you have a high degree of automation around all of the above? What use are you making of technology to help you manage the processes above? If you have very mature processes, employ a high degree of automation, and make good use of technology to help you automate as many of those processes as possible, then you probably have done everything you can to drive down the cost of managing your storage. But now is a good time to take a look and see if you can improve any of those areas. For example, does my disk vendor really provide tools to make managing my disk arrays easier? Not just from a provisioning standpoint, but from the standpoint of all of the above. If not, maybe it's time to consider looking at another vendor, one that has better tools.

Let me leave you with a final thought in this area based on my experience. What I found when I was managing storage was that the cost of managing a TB of disk could easily meet or exceed the cost of buying that disk over the 3-4 year life of that disk. So, a myopic focus on who has the cheapest disks on a per GB basis may not make much sense. Perhaps what we should focus on is how much it costs to manage a TB of a particular vendor's disk. In other words, the 3-4 year TCO for any storage acquisition needs to include the cost of management, not just the per GB cost of the space.

SSD vs. Wide Striping

So, what's this got to do with the topic at hand? Well, I think that a lot of the argument around this is really an argument around the cost of managing disks. Both technologies have their places, and both can help you address certain performance issues, and both can help you save money. The difference is that SSDs only help with a very small percentage of cases, whereas wide striping can help you with the vast majority of cases. What's more, wide striping can help you address those management costs and drive down that 3-4 year TCO I keep talking about, where-as SSDs really don't help there at all, and in a lot of cases, I believe that the 3-4 year TCO goes way up with SSDs. That's not to say that for those cases where you need the performance, that using SSDs in a targeted way isn't a good idea. But just keep in mind what I said about the cost of managing a TB of storage perhaps exceeding the cost of purchasing it in the first place. In the end, I think we need both, but I think that the bulk of your storage should be on a side striped array where your storage admins don't have to spend a lot of time trying to figure out exactly where they should place the data so that the new LUNs will perform, and the added load doesn't negatively impact existing applications.

My vision

So, ideally, I think that the storage team should have a vast majority of their data on an array that does wide striping, manage that space though some kind of virtualization engine, and purchase SSDs very tactically to address specific performance issues, again managing everything through the virtualization engine thus allowing re-tiering of the data should that be necessary, and making migrations when they are needed quicker, easier, and less impactful to the business. You also need to deploy software to help you with performance management as well as capacity management, and something to help automate the documentation process. This means that there is very likely not a single vendor that can provide all of the technology, but rather you will need to put together a "best of breed" approach you your storage environment. Here's an example of one set of technologies that I think can help get you to where you want to be.

IBM XIV storage – The XIV provides wide striped storage on SATA disks and makes it all very easy to manage. This is where I would put the bulk of my data since my admins wouldn't have to sit there and try and figure out where to place the data, etc.

EMC CLARiion – Put some flash drives in a CLARiiON and I think you have a great platform for those few LUNs you need that require the kind of performance that SSDs offer if you have that kind of need.

Datacore SANSymphony - A software approach to SAN virtualization which allows you to move data around to different arrays without the users being aware that it's going on. This is the way that you address things like re-tiering of your data as well.

Akorri – This is a software tool that helps you to manage your entire storage infrastructure find the bottlenecks, and generally free up storage admin time.

Quantum DXi 7500 – This is a deduplicating VTL that will help you reduce the amount of time that your backup admins spend troubleshooting failed backups.

Aptare Storage Console – This is software that will help you manage your backups. It will report on things like what backups failed, which of those were on SOX systems, etc.

The above are just a few examples of what's available out there to help you to create a more mature, automated, easier to manage storage environment, but they certainly aren't the only ones, just some good examples of what's available, and why you should be looking at that kind of technology. In the end, whatever you choose, just making sure that you are truly addressing the 3-4 year TCO of your environment is the key to getting those management costs under control and allowing your storage/backup admins to manage larger and larger environments.

ZZ from:http://joergsstorageblog.blogspot.com/2009/04/real-cost-of-storage.html

Symmetrix Virtual Provisioning

Organizations continually search for ways to both simplify storage management processes and improve storage capacity utilization. Several products have been released over the past few years that promise efficient use of storage space. One of the technologies that is quickly catching up is thin provisioning. 3PAR was one of the first vendors to introduce the concept while the rest quickly followed the suite.

When provisioning storage for a new application, administrators must consider that application’s future capacity requirements rather than simply its current requirements. In order to reduce the risk that storage capacity will be exhausted, disrupting application and business processes, organizations often have allocated more physical storage to an application than is needed for a significant amount of time. This allocated but unused storage introduces operational costs. Even with the most careful planning, it often is necessary to provision additional storage in the future, which could potentially require an application outage.

EMC Virtual Provisioning: - introduced with Enginuity 5773, addresses some of these challenges. It builds on the base “thin provisioning” functionality, which is the ability to have a large “thin” device (volume) configured and presented to the host while consuming physical storage from a shared pool only as needed. Symmetrix Virtual Provisioning can improve storage capacity utilization and simplify storage management by presenting the application with sufficient capacity for an extended period of time, reducing the need to provision new storage frequently and avoiding costly allocated but unused storage. Symmetrix Management Console and the command line interface (CLI) are the primary management and monitoring tools.

Symmetrix Virtual Provisioning: - introduces a new type of host accessible device called a “thin device” that can be used in the same way that a regular device has traditionally been used. Unlike regular Symmetrix devices, thin devices do not need to have physical storage completely allocated at the time the devices are presented to a host. The physical storage that is used to supply disk space for a thin device comes from a shared thin storage pool that has been associated with the thin device. A thin storage pool is comprised of a new type of internal Symmetrix device called a data device that is dedicated to the purpose of providing the actual physical storage used by thin devices. When they are first created, thin devices are not associated with any particular thin pool. An operation referred to as “binding” must be performed to associate a thin device with a thin pool.

When a write is performed to a portion of the thin device, the Symmetrix allocates a minimum allotment of physical storage from the pool and maps that storage to a region of the thin device. The storage allocation operations are performed in small units of storage called “thin device extents.” A round-robin mechanism is used to balance the allocation of data device extents across all of the data devices in the pool that are enabled and that have remaining unused capacity. The thin device extent size is 12 tracks (768 KB). That means that the initial bind of a thin device to a pool causes one thin device extent, or 12 tracks, to be allocated per thin device. So a four-member thin meta would cause 48 tracks (3078 KB) to be allocated when the device is bound to a thin pool.

When a read is performed on a thin device, the data being read is retrieved from the appropriate data device in the storage pool to which the thin device is bound. When more storage is required to service existing or future thin devices, data devices can be added to existing thin storage pools. New thin devices can also be created and associated with existing thin pools.
It is possible for a thin device to be presented for host-use before all of the reported capacity of the device has been mapped. It is also possible for the sum of the reported capacities of the thin devices using a given pool to exceed the available storage capacity of the pool. Such a thin device configuration is said to be oversubscribed.

The storage is allocated from the pool using a round-robin approach that tends to stripe the data devices in the pool. Storage Admin should keep in mind that when implementing Virtual Provisioning, it is important that realistic utilization objectives are set. Generally, organizations should target no higher than 60 percent to 80 percent capacity utilization per pool. A buffer should be provided for unexpected growth or a “runaway” application that consumes more physical capacity than was originally

Benefits of Virtual Provisioning :
Less expensive to pre-provision storage
In case one needs to preprovision storage, the entire amount of physical storage has to be configured and dedicated at the time of pre provision.
But in case of thin luns, one can exceed the amount of physical storage during provisioning. Also with time, as costs of physical storage drops consistently, it could save dollars.

Easy implementation of wide stripes
A configured thin pool ensures that a thin device will be widely stripped across the backend in 768K extends. Thus a single thin device requires no planning on part of administrator.

Performance
The performance for certain random IO workloads can be improved due to the fact that thin devices are widely stripped across the backend. Typically in a thin device implementation there is a modest response time overhead incurred the first time a write is performed on an unallocated region of a thin device. This overhead tends to disappear once the working set of thin device has been written to

ZZ from the following link:
http://www.emcstorageinfo.com/2009/04/symmetrix-virtual-provisioning.html

订阅：博文 (Atom)

Eric Hua's Blog