Partitioning and bucketing in hive-based big data warehouses

doi:10.1007/978-3-319-77712-2_72

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/55212

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Costa, Eduarda	por
dc.contributor.author	Costa, Carlos A.	por
dc.contributor.author	Santos, Maribel Yasmina	por
dc.date.accessioned	2018-07-02T08:59:56Z	-
dc.date.issued	2018	-
dc.identifier.isbn	9783319777115	por
dc.identifier.issn	2194-5357	-
dc.identifier.uri	https://hdl.handle.net/1822/55212	-
dc.description.abstract	Hive is a tool that allows the implementation of Data Warehouses for Big Data contexts, organizing data into tables, partitions and buckets. Some studies have been conducted to understand ways of optimizing the performance of data storage and processing techniques/technologies for Big Data Warehouses. However, few of these studies explore whether the way data is structured has any influence on how Hive responds to queries. Thus, this work investigates the impact of creating partitions and buckets in the processing times of Hive-based Big Data Warehouses. The results obtained with the application of different modelling and organization strategies in Hive reinforces the advantages associated to the implementation of Big Data Warehouses based on denormalized models and, also, the potential benefit of adequate partitioning that, once aligned with the filters frequently applied on data, can significantly decrease the processing times. In contrast, the use of bucketing techniques has no evidence of significant advantages.	por
dc.description.sponsorship	This work is supported by COMPETE: POCI-01-0145- FEDER-007043 and FCT – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013, and by European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nº 002814; Funding Reference: POCI-01-0247-FEDER-002814].	por
dc.language.iso	eng	por
dc.publisher	Springer Verlag	por
dc.relation	info:eu-repo/grantAgreement/FCT/5876/147280/PT	por
dc.rights	restrictedAccess	por
dc.subject	Big data	por
dc.subject	Big data warehouse	por
dc.subject	Buckets	por
dc.subject	Hive	por
dc.subject	Partitions	por
dc.title	Partitioning and bucketing in hive-based big data warehouses	por
dc.type	conferencePaper	por
dc.peerreviewed	yes	por
oaire.citationStartPage	764	por
oaire.citationEndPage	774	por
oaire.citationVolume	746	por
dc.date.updated	2018-06-30T18:32:02Z	-
dc.identifier.doi	10.1007/978-3-319-77712-2_72	por
dc.description.publicationversion	info:eu-repo/semantics/publishedVersion	por
sdum.export.identifier	5167	-
sdum.journal	Advances in Intelligent Systems and Computing	por
Aparece nas coleções:	CAlg - Artigos em livros de atas/Papers in proceedings