- Published on Tuesday, 21 June 2011 09:59
- Written by Marc Hebert
If you read my previous post, you’re already familiar with my vision for the corporate archive repository. Here’s a quick refresher:
A corporate archive repository is not just a drawer full of data backup tapes. Rather, it’s a default storage location for archived and retired data for the enterprise. The ideal corporate repository will provide a common platform for archiving data, a common structure for storing it, a common set of reporting tools for retrieving it, and common tools for managing it.
Some might even call this “common” sense, and rightly so.
But what does this data storage structure look like? And how is it more efficient than the ways in which companies have traditionally stored data? As it turns out, data storage may be the area in which the corporate repository can deliver the greatest possible savings.
Addressing Sloppy Disk Allocation
Everyone seems to be talking about the corporate data storage crunch. What’s behind this disk-space crisis? You might instinctively assume that it’s all a matter of companies churning out more data – but you’d be wrong.
Yes, companies have become more data-intensive than ever before, and this data certainly takes up space. But in most cases, sloppy disk allocation takes up far more space.
When database administrators (DBAs) allocate disk space for business applications and their databases, the last thing they want to do is see an application run out of space at a critical moment for the business. So, right from the outset, they’ll typically allocate much more disk space than the application needs. Over time, the application’s burgeoning database will slowly grow into its disk – and the DBA won’t have to worry about reallocating space for a while.
It’s hard to fault DBAs for this approach – especially at a time when the price of storage hardware continues to fall. But the “cushions” they create can be alarmingly large. In a major enterprise, it’s not unusual to see a 2 TB production database sitting in the middle of a 10 TB swath of disk, thus tying up an extra 8 TB of perfectly good disk space.
But it gets worse. Large companies tend to allocate the same amount of disk space for non-production activities related to these applications. In this case, that would mean another 10 TB each for testing, development testing, training, and performance tuning. You may see 5 or even 10 copies of the production database used for non-production purposes. That could mean up to 20 TB of data and 80 TB of tied-up disk space.
Compressing Databases, Cutting Costs
The good news is that although disk space cushions tend to sit unused during production, you can easily recover this space when you use a corporate repository to archive data or retire an application. IBM Optim makes this process highly efficient by using a proprietary file format – a sort of flat file – that’s designed to squeeze unused disk space out of an allocated portion.
Optim can also squeeze inefficiencies out of disk space that’s currently being used in production, saving you even more space. Picture a massive Oracle database. Its thousands of tables and columns will typically be allocated with a fixed column width that isn’t fully used. For example, the comment field on customer service records may allow 1,000 characters, but the typical field may contain only 50 to 100 characters of input. That leaves 900 characters of unused space per field – wasted storage that can quickly add up across an enterprise.
When you archive this database into a flat file format, Optim will compress this extra space. It’s not uncommon to compress the data in an Optim archive by 90%. What was a 100 GB database in production becomes a 10 GB archive file.
As you can see, archiving data to your corporate repository leaves you with a relatively small archive that’s centrally accessible and efficient to manage. In terms of cost and convenience, there’s no comparison between the corporate repository and a drawer full of tapes.
But one caveat: there’s no such thing as “set it and forget it” archiving. Even after your production data is safely in the archive, you’ll need to make data governance an ongoing activity. We’ll discuss that next.