Improving Data Technology Warehouse Performance Using Filesystem with GZIP, LZJB and ZLE Compression

Suharjito Suharjito

Abstract


Data warehouse application is commonly used by many corporations as an analytical platform to develop marketing strategy and give advantage edge to their business. Many times, data warehouse platform need to manage huge amount of data as it needs to extract, load and transform enterprise data from many sources. To improve performance of data warehouse system, compression has been used in various ways, both at database layer or at filesystem layer. Compression brings an other benefit  as it reduce storage capacity therefore reduce cost. However compression also add data processing that may impact to overall application response time.  In this research, three different compression algorithms which are zle, lzjb and gzip were tested on data warehouse filesystem to understand the performance impact and the capacity saving benefit. Using swingbench as the benchmark tool and oracle database 12c, it shows that zle is the best compression algoritm with performance improvement of 92%, follows by lzjb with 78% performance improvement and gzip with 53% improvement. In terms of compression ratio, gzip can deliver the highest with 3.31 compression ratio, follows by lzjb with 2.17 compression ratio and zle with 1.55 compression ratio. AW.

Full Text:

PDF

References


S. Sharma and R. Jain, “Enhancing Business Intelligence using Data Warehousing : A Multi Case Analysis,” Int. J. Adv. Res. Comput. Sci. Manag. Stud., vol. 1, no. 7, pp. 160–167, 2013.

M. A. Roth and S. J. Van Horn, “Database compression,” Sigmod Record, vol. 22, no. 3, pp. 31–39, 1993.

M.A. Bassiouni, “Data Compression in Scientific and Statistical Databases,” IEEE Transactions on Software Engineering, vol. SE11, no. 10, pp.1047-1058, 1985.

M. Murugesan and T. Ravichandran, “Evaluate Database Compression Performance and Parallel Backup,” International Journal of Database Management System (IJDMS), vol. 5, no. 4, 2013.

D.A. Lelewer and D. S. Hirschberg, “Data Compression,” ACM Comput. Surv., vol 19, no. 3, pp. 261-296, 1987.

S. Shanmugasundaram and R. Lourdusamy, “A Comparative Study of Text Compression Algoritms”, International Journal of Wisdom Based Computing, vol. 1, no.3, pp. 68, 2011.

A.K. Bhattacharjee, T. Bej, and S. Agarwal, “Comparison Study of Lossless Data Compression Algoritms for Text Data”, IOSR Journal of Computer Engineering (IOSR-JCE), pp.15-19, 2013.

A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M. Horowitz, “CPU DB: Recording Microprocessor History”, 2012. [Online]. Available: http://queue.acm.org/detail.cfm?id=2181798. [Accessed: 14- Jun- 2015].

E. Grochowski and R.D. Halem, “Technological impact of magnetic hard disk drives on storage systems”, IBM Systems Journal, ProQuest Science Journals, 338, 2003.

Y. Rathore, M.K. Ahirwar and R. Pandey, “A Brief Study of Data Compression Algorithms”, International Journal of Computer Science and Information Security (IJCSIS),vol. 11, no. 10, 2013.

W. Chang, B. Fang, X. Yun and S. Wang, “The Block Lossless Data Compression Algorithm”, International Journal of Computer Science and Network Security (IJCSNS), vol.9, no.10, pp. 116, 2009.

Oracle, “Oracle Advanced Compression with Oracle”, 2015. [Online]. Available: http://www.oracle.com/technetwork/database/options/compression /advanced-compression-wp-12c-1896128.pdf. [Accessed: 14- Jun- 2015].

D. Abadi, S. Madden and M. Ferreira, “Integrating compression and execution in column-oriented database systems”, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 671-682, 2006.

D. Abadi, P. Boncz, S. Hrizopoulos, S. Idreos, and S. Madden, “The Design and Implementation of Modern Column-Oriented Database Systems”, Foundations and Trends in Databases, vol. 5, no. 3, pp. 197-280, 2012.

A. Habib, A. S. Hoque and M .S. Rahman, “High Performance Query operations on Compressed Database”, International Journal of Database Theory and Application, vol. 5, no. 3, 2012.

P. Raichand and R.R. Aggarwal, A Short Survey of Data Compression Techniques for Column Oriented Databases. Journal of Global Research in Computer Science, vol. 4, no. 7, 2013.

D. Giles, Swingbench, 2015. [Online]. Available: http://www.dominicgiles.com/Swingbench.pdf [Accessed: 14- Jun- 2015].

D. Quintero, “IBM Power Systems Performance Guide Implementing and Optimizing”, 2013. [Online]. Available: http://www.redbooks.ibm.com/redbooks/pdfs/sg248080.pdf. [Accessed: 14- Jun- 2015].

EMC, “EMC Tiered Storage for Oracle Database 11g - Data Warehouse”, 2010. [Online]. Available: http://estonia.emc.com/collateral/solutions/white-papers/h7068tiered-storage-oracle-vmax-fast-ionix-wp.pdf. [Accessed: 14- Jun- 2015].

Ramachandran, “Evaluating and Comparing Oracle Database Appliance”, 2014. [Online]. Available: http://www.oracle.com/technetwork/database/ databaseappliance/documentation/oda-eval-comparing-performance1895230.pdf. [Accessed: 14- Jun- 2015].

S. Sivathanu, “Performance of vSphere Flash Read Cache in VMware Vsphere 5.5.”, [Online]. Available: http://www.vmware.com/files/pdf/ techpaper/vfrc-perfvsphere55.pdf. [Accessed: 14- Jun- 2015].

Cisco, “Deploying Oracle Real Application Clusters on the Cisco Unified Computing System with EMC Clariion Storage”, 2012. [Online]. Available: http://www.cisco.com/c/en/us/products/collateral/servers-unifiedcomputing/ucs-b-series-blade-servers/white_paper_c11562881.pdf. [Accessed: 14- Jun- 2015].


Refbacks

  • There are currently no refbacks.