Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/66781

TítuloEvaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
Autor(es)Costa, Eduarda
Costa, Carlos A. P.
Santos, Maribel Yasmina
Palavras-chaveBig Data
Big Data Warehouse
Buckets
Hive
Partitions
Data2019
EditoraSpringerOpen
RevistaJournal of Big Data
Resumo(s)Hive has long been one of the industry-leading systems for Data Warehousing in Big Data contexts, mainly organizing data into databases, tables, partitions and buckets, stored on top of an unstructured distributed file system like HDFS. Some studies were conducted for understanding the ways of optimizing the performance of several storage systems for Big Data Warehousing. However, few of them explore the impact of data organization strategies on query performance, when using Hive as the storage technology for implementing Big Data Warehousing systems. Therefore, this paper evaluates the impact of data partitioning and bucketing in Hive-based systems, testing different data organization strategies and verifying the efficiency of those strategies in query performance. The obtained results demonstrate the advantages of implementing Big Data Warehouses based on denormalized models and the potential benefit of using adequate partitioning strategies. Defining the partitions aligned with the attributes that are frequently used in the conditions/filters of the queries can significantly increase the efficiency of the system in terms of response time. In the more intensive workload benchmarked in this paper, overall decreases of about 40% in processing time were verified. The same is not verified with the use of bucketing strategies, which shows potential benefits in very specific scenarios, suggesting a more restricted use of this functionality, namely in the context of bucketing two tables by the join attribute of these tables.
TipoArtigo
URIhttps://hdl.handle.net/1822/66781
DOI10.1186/s40537-019-0196-1
Versão da editorahttps://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0196-1
Arbitragem científicayes
AcessoAcesso aberto
Aparece nas coleções:CAlg - Artigos em revistas internacionais / Papers in international journals

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
Costa2019_Article_EvaluatingPartitioningAndBucke.pdf2,08 MBAdobe PDFVer/Abrir

Este trabalho está licenciado sob uma Licença Creative Commons Creative Commons

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID