Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/73976

Registo completo
Campo DCValorIdioma
dc.contributor.authorPereira, Pedro Josépor
dc.contributor.authorPereira, Adrianapor
dc.contributor.authorCortez, Paulopor
dc.contributor.authorPilastri, André Luizpor
dc.date.accessioned2021-09-09T11:18:34Z-
dc.date.available2021-09-09T11:18:34Z-
dc.date.issued2021-09-
dc.identifier.citationPereira P.J., Pereira A., Cortez P., Pilastri A. (2021) A Comparison of Machine Learning Methods for Extremely Unbalanced Industrial Quality Data. In: Marreiros G., Melo F.S., Lau N., Lopes Cardoso H., Reis L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science, vol 12981. Springerpor
dc.identifier.isbn978-3-030-86229-9-
dc.identifier.issn0302-9743por
dc.identifier.urihttps://hdl.handle.net/1822/73976-
dc.description.abstractThe Industry 4.0 revolution is impacting manufacturing companies, which need to adopt more data intelligence processes in order to compete in the markets they operate. In particular, quality control is a key manufacturing process that has been addressed by Machine Learning (ML), aiming to improve productivity (e.g., reduce costs). However, modern industries produce a tiny portion of defective products, which results in extremely unbalanced datasets. In this paper, we analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products. The eight datasets in- clude millions of records but only a tiny percentage of failures (less than 0.07%). To handle such datasets, we perform a two-stage ML comparison study. Firstly, we consider two products and explore four ML algorithms, Random Forest (RF), two Automated ML (AutoML) methods and a deep Autoencoder (AE), and three balancing training strategies, namely None, Synthetic Minority Oversampling Technique (SMOTE) and Gaussian Copula (GC). When considering both classification performance and computational effort, interesting results were obtained by RF. Then, the selected RF was further explored by considering all eight datasets and five balancing methods: None, SMOTE, GC, Random Undersampling (RU) and Tomek Links (TL). Overall, competitive results were achieved by the combination of GC with RF.por
dc.description.sponsorshipThis work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internation- alization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479].por
dc.language.isoengpor
dc.publisherSpringerpor
dc.rightsopenAccesspor
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/por
dc.subjectAnomaly Detectionpor
dc.subjectIndustrial Datapor
dc.subjectRandom Forestpor
dc.titleA comparison of machine learning methods for extremely unbalanced industrial quality datapor
dc.typeconferencePaperpor
dc.peerreviewedyespor
dc.relation.publisherversionhttps://link.springer.com/chapter/10.1007/978-3-030-86230-5_44por
oaire.citationStartPage561por
oaire.citationEndPage572por
oaire.citationVolumeLNCS 12981por
dc.identifier.doi10.1007/978-3-030-86230-5_44por
dc.identifier.eisbn978-3-030-86230-5-
dc.subject.fosCiências Naturais::Ciências da Computação e da Informaçãopor
dc.subject.wosScience & Technologypor
sdum.journalLecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)por
sdum.conferencePublication20th EPIA Conference on Artificial Intelligence (EPIA 2021)por
oaire.versionAMpor
dc.subject.odsIndústria, inovação e infraestruturaspor
Aparece nas coleções:CAlg - Artigos em livros de atas/Papers in proceedings

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
paper88.pdf226,13 kBAdobe PDFVer/Abrir

Este trabalho está licenciado sob uma Licença Creative Commons Creative Commons

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID