Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes

doi:10.3390/biology12070959

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/85507

Registo completo

Campo DC	Valor	Idioma
dc.contributor.author	Sanchez, Jeniffer D.	por
dc.contributor.author	Rêgo, Leandro C.	por
dc.contributor.author	Ospina, Raydonal	por
dc.contributor.author	Leiva, Víctor	por
dc.contributor.author	Chesneau, Christophe	por
dc.contributor.author	Castro, Cecília	por
dc.date.accessioned	2023-07-12T09:43:40Z	-
dc.date.available	2023-07-12T09:43:40Z	-
dc.date.issued	2023-07	-
dc.identifier.issn	2079-7737	por
dc.identifier.uri	https://hdl.handle.net/1822/85507	-
dc.description.abstract	Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.	por
dc.description.sponsorship	ANCD -Agenția Națională pentru Cercetare și Dezvoltare(UIDB/00013/2020)	por
dc.language.iso	eng	por
dc.publisher	MDPI	por
dc.relation	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F00013%2F2020/PT	por
dc.relation	info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F00013%2F2020/PT	por
dc.rights	openAccess	por
dc.subject	Biological data	por
dc.subject	Coefficient of variation	por
dc.subject	Data science	por
dc.subject	Distance measures	por
dc.subject	Estimation methods	por
dc.subject	Predictive modeling	por
dc.subject	Monte Carlo simulation	por
dc.subject	Similarity functions	por
dc.title	Similarity-based predictive models: Sensitivity analysis and a biological application with multi-attributes	por
dc.type	article	por
dc.peerreviewed	yes	por
dc.relation.publisherversion	https://www.mdpi.com/2079-7737/12/7/959	por
oaire.citationIssue	7	por
oaire.citationVolume	12	por
dc.identifier.doi	10.3390/biology12070959	por
dc.subject.fos	Ciências Naturais::Matemáticas	por
sdum.journal	Biology	por
oaire.version	VoR	por
dc.subject.ods	Parcerias para a implementação dos objetivos	por
Aparece nas coleções:	CMAT - Artigos em revistas com arbitragem / Papers in peer review journals

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
biology-12-00959.pdf		1,54 MB	Adobe PDF	Ver/Abrir

Ver registo simples Sugerir correção Estatísticas

Citations

Altmetrics