TY - JOUR
T1 - Prediction of human actions in assembly process by a spatial-temporal end-to-end learning model
AU - Zhang, Zhujun
AU - Wang, Weitian
AU - Chen, Yi
AU - Jia, Yunyi
AU - Peng, Gaoliang
N1 - Publisher Copyright:
© 2019 SAE International. All Rights Reserved.
PY - 2019/4/2
Y1 - 2019/4/2
N2 - It's important to predict human actions in the industry assembly process. Foreseeing future actions before they happened is an essential part for flexible human-robot collaboration and crucial to safety issues. Vision-based human action prediction from videos provides intuitive and adequate knowledge for many complex applications. This problem can be interpreted as deducing the next action of people from a short video clip. The history information needs to be considered to learn these relations among time steps for predicting the future steps. However, it is difficult to extract the history information and use it to infer the future situation with traditional methods. In this scenario, a model is needed to handle the spatial and temporal details stored in the past human motions and construct the future action based on limited accessible human demonstrations. In this paper, we apply an autoencoder-based deep learning framework for human action construction, merging into the RNN pipeline for human action prediction. This contrasts with traditional approaches which use hand-crafted features and different domain outputs. We implement the proposed framework on a model vehicle seat assembly task. Our experiment results indicate that the proposed model is effective in capturing the historical details that are necessary for future human action prediction. In addition, the proposed model synthesizes the prior information from human demonstrations and generates the corresponding future action by those spatial-temporal features successfully.
AB - It's important to predict human actions in the industry assembly process. Foreseeing future actions before they happened is an essential part for flexible human-robot collaboration and crucial to safety issues. Vision-based human action prediction from videos provides intuitive and adequate knowledge for many complex applications. This problem can be interpreted as deducing the next action of people from a short video clip. The history information needs to be considered to learn these relations among time steps for predicting the future steps. However, it is difficult to extract the history information and use it to infer the future situation with traditional methods. In this scenario, a model is needed to handle the spatial and temporal details stored in the past human motions and construct the future action based on limited accessible human demonstrations. In this paper, we apply an autoencoder-based deep learning framework for human action construction, merging into the RNN pipeline for human action prediction. This contrasts with traditional approaches which use hand-crafted features and different domain outputs. We implement the proposed framework on a model vehicle seat assembly task. Our experiment results indicate that the proposed model is effective in capturing the historical details that are necessary for future human action prediction. In addition, the proposed model synthesizes the prior information from human demonstrations and generates the corresponding future action by those spatial-temporal features successfully.
UR - http://www.scopus.com/inward/record.url?scp=85064696957&partnerID=8YFLogxK
U2 - 10.4271/2019-01-0509
DO - 10.4271/2019-01-0509
M3 - Conference article
AN - SCOPUS:85064696957
SN - 0148-7191
VL - 2019-April
JO - SAE Technical Papers
JF - SAE Technical Papers
IS - April
T2 - SAE World Congress Experience, WCX 2019
Y2 - 9 April 2019 through 11 April 2019
ER -