Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/64073

TítuloProPythia, an automated platform for the classification of peptides/proteins using machine learning
Autor(es)Sequeira, Ana Marta Fernandes Tavares
Pereira, S.
Lousa, Diana
Rocha, Miguel
Palavras-chaveMachine learning
Peptide classification
Viral fusion peptides
Data19-Fev-2020
CitaçãoSequeira, Ana; Pereira, Sara; Lousa, Diana; Rocha, Miguel, ProPythia, an automated platform for the classification of peptides/proteins using machine learning. BOD 2020 - IX Bioinformatics Open Days (Conference Book). Braga, Feb 19-21, 2020.
Resumo(s)One of the most challenging problems in bioinformatics is to computationally characterize sequences, structures and functions of proteins. Sequence-derived structural and physicochemical properties of proteins have been used in the development of machine learning models in protein related problems. However, tools and platforms to calculate features and perform Machine learning (ML) with proteins are scarce and have their limitations in terms of effectiveness, user-friendliness and applicability. Here, a generic modular automated ML-based platform for the classification of proteins based on their physicochemical properties is proposed. ProPythia, developed as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate protein features, pre-process datasets, execute feature reduction and selection, perform clustering, train and optimize ML models and make predictions. This platform was validated by testing its ability to classify anticancer and antimicrobial peptides and further used to explore viral fusion peptides. Membrane-interacting peptides play a crucial role in several biological processes. Fusion peptides are a subclass found in enveloped viruses, that are particularly relevant for membrane fusion. Determining what are the properties that characterize fusion peptides and distinguishing them from other proteins is a very relevant scientific question with important technological implications. Using three different datasets composed by well annotated sequences, different feature extraction techniques and feature selection methods, ML models were trained, tested and used to predict the location of a known fusion peptide in a protein sequence from the Dengue virus. Feature importance was also analysed. The models obtained will be useful in future research, also providing a biological insight into the distinctive physicochemical characteristics of fusion peptides. This work presents a freely available tool to perform ML-based protein classification and the first global analysis and prediction of viral fusion peptides using ML, reinforcing the usability and importance of ML in protein classification problems.
TipoResumo em ata de conferência
URIhttps://hdl.handle.net/1822/64073
Versão da editorahttp://www.bioinformaticsopendays.com/
Arbitragem científicayes
AcessoAcesso aberto
Aparece nas coleções:CEB - Resumos em Livros de Atas / Abstracts in Proceedings

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
document_53524_1.pdf202 kBAdobe PDFVer/Abrir

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID