Development of human body pose detection algorithms for in-car scenario, and validation with suitable ground-truth system

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/75969

Título:	Development of human body pose detection algorithms for in-car scenario, and validation with suitable ground-truth system
Autor(es):	Silva, João Pedro Borges Araújo Oliveira
Orientador(es):	Fonseca, Jaime C.
Data:	29-Mai-2020
Resumo(s):	Automated driving cars are emerging, increasing the need for advanced occupant monitoring applications. A transversal need for such systems is the detection of the occupants’ posture. Discriminative approaches have received increased focus in the past decade, due to its automated detection and the growth in Machine Learning (ML) applications and frameworks. One of its downsides is the need for a large dataset to train, to achieve high accuracy. To allow a robust algorithmic training and validation, an algorithmic development pipeline able to generate both real and synthetic datasets in the in-car scenario needs to be established, together with adequate evaluation procedures, this thesis addresses such development. The approach focuses first in two toolchains for in-car human body pose dataset generation: (1) real, and (2) synthetic. The first toolchain uses two types of sensors for the data generation: (1) image data is captured through a Timeof- Flight (ToF) sensor, and (2) human body pose data (ground-truth) is captured through an inertial suit and optical system. Besides quantifying the inertial suit inherent sensitivity and accuracy, the feasibility of the overall system for human body pose capture in the in-car scenario was demonstrated. Finally, the feasibility of using system generated data (which was made publicly available) to train ML algorithms is demonstrated. The second toolchain uses the features and labels from the previous one, in this case both sensors are synthetically rendered. The toolchain creates a customized synthetic environment, comprising human models, car, and camera. Poses are automatically generated for each human, taking into account a per-joint axis Gaussian or incremental distribution, constrained by anthropometric and Range of Motion measurements. Scene validation is done through collision detection. Rendering is focused on vision data, supporting ToF and RGB cameras, generating synthetic images from these sensors. The feasibility of using synthetic data (which was made publicly available), combined with real data, to train distinct machine learning agorithms is demonstrated. Finally, several algorithms were evaluated, and a Deep Learning (DL) based algorithm, namely Part Affinity Fields, was selected, customized and trained with datasets generated with the previously mentioned toolchains, ultimately aiming to improve accuracy for the in-car scenario. Veículos totalmente autónomos estão a emergir, aumentando a necessidade de um sistema de deteção avançada dos ocupantes. Uma necessidade transversal destes sistemas é a deteção da postura dos ocupantes. Abordagens discriminativas tem gerado um maior interesse na última década, muito devido à sua deteção automática bem como ao aumento de aplicações de Machine Learning (ML).Contudo, estes necessitam de um grande conjunto de dados de treino, de forma a aumentar a precisão. Para permitir um treino e validação robusta, é necessário estabelecer um pipeline de desenvolvimento algoritmico capaz de gerar conjuntos de dados reais e sintéticos para o cenário automóvel, juntamente com procedimentos de avaliação adequados, esta tese visa este desenvolvimento. Esta foca-se inicialmente no desenvolvimento de duas toolchains para a geração de datasets de pose humana no interior do veículo: (1) reais, e (2) sintéticos. A primeira toolchain utiliza dois tipos de sensores para a geração de dados: (1) imagens através de um sensor Time-of-Flight (ToF), e (2) a pose humana (ground-truth) é através de um fato inercial e um sistema óptico. Para além de quantificar a sensibilidade e precisão inerente do sistema inercial, a viabilidade do sistema completo para captura de pose humana no interior do veículo foi demonstrada. Por fim, é demonstrada a viabilidade de usar dados reais (disponibilizados publicamente), para treinar algoritmos ML. A segunda toolchain utiliza as mesmas features e labels da anterior, neste caso ambos os sensores são sintéticos. Esta cria um cenário customizavel, constituído por modelos humanos, carro, e câmera. As poses são geradas automaticamente para cada humano, tendo em conta uma distribuição Gaussiana ou incremental, sendo estas restringidas por medidas antropométricas. Diferentes tipos de deteção de colisões são avaliados de forma a validar os dados, nomeadamente corpo-corpo, humano-humano, e humano-carro. A renderização é focada em câmeras ToF e RGB, gerando imagens sintéticas destes sensores. É demonstrada a viabilidade de utilizar dados sintéticos (disponibilizados publicamente), para treinar algoritmos ML. Finalmente, vários algoritmos foram avaliados, e um algoritmo, nomeadamente Part Affinity Fields, foi selecionado, modificado e treinado com os datasets gerados através das toolchains mencionadas anteriormente, de forma a aumentar a sua precisão para o cenário do interior do veículo.
Tipo:	Tese de doutoramento
Descrição:	Doctoral Thesis in Electronics and Computers Engineering
URI:	https://hdl.handle.net/1822/75969
Acesso:	Acesso aberto
Aparece nas coleções:	BUM - Teses de Doutoramento CAlg - Teses de doutoramento/PhD theses