Tuesday, November 26, 2013

Data Deduplication – A Perfect Tool for Database Management

Efficient management and storage of data is usually a problem that most organizations face these days. There are various methods and technologies which are in place to solve this issue. The amount of space available for storage must be used efficiently so as to store maximum data in minimum space. Data Deduplication is a method which looks for repetition or redundancy in sequences of bytes over a large collection of data. The first uniquely stored version of a data sequence is referenced at further points than be stored again. Data deduplication is also known as intelligent compression or single-instance storage method.

File Level Deduplication

In its most common form, deduplication is done at the file level. It means that no file which is identical is stored again and this is done by filtering the incoming data and processing it so as to avoid repeated storage of the same file unnecessarily. This level of deduplication is known as single-instance storage (SIS) method. Another level of deduplication occurs at block level, where blocks of data that are similar in two non identical files, are identified and only one copy of the block is stored. This method frees up more space than the former, as it analyzes and compares data at a deeper level.



Target Level Deduplication

The second type of implementation is at the target level which is the backup system.The deployment is easier compared to the first source type. There are two types of implementation – inline or post process. In inline implementation the deduplication is done before the data is written or stored to the backup disk.This requires less storage which is an advantage but more time as the backup process can be completed only after the deduplication filtering is done. In the case of post process data the storage space requirement is higher but deployment happens much faster. These methods are chosen depending on the system, the amount of data to be handled, the storage space available for the system as well as back up, the processor capacity and the time constraints.


The greatest advantage is less storage requirements which improves bandwidth efficiency. As primary data storage have become inexpensive over the years, organizations tend to maintain backup data of a project for a longer period so that new employees can reuse certain data for future projects. These data storehouses need to have cooling process with proper maintenance and hence consumes a lot of electric power. The amount of disk or tapes which the organization needs to buy and maintain for data storage also reduces, thus reducing  the total cost for storage. Deduplication can reduce the bandwidth requirements for backup and in some cases it can also boost both backup and recovery process.

No comments:

Post a Comment