Abstract:Human pose estimation is one of the important supporting technologies for Industrial Manufacturing 5.0, which has already been applied in various scenarios such as action recognition, human-computer interaction, and digital twin. However, in complex industrial scenes, objects such as notice boards, pipes, and columns can easily cause local or global occlusions for workers, leading to errors in joint points localization by human pose estimation models and a decrease in the performance of the human pose estimation model. To address this problem, this article proposes a human pose estimation performance enhancement method for complex industrial scenes, which firstly structurally models the key points of the human body based on VQ-VAE model, mapping joint features to a quantized latent space to improve the accuracy of human pose estimation when occlusion occurred. Then, to address the problem of insufficient worker occlusion dataset, a dynamic data augmentation and training method is innovatively proposed. In the process of model training, industrial scene-specific worker occlusion images are generated dynamically using real industrial scene occlusion objects by evaluating the human pose estimation results of the model for the next model training, further enhancing the model′s robustness in human pose estimation tasks. The experimental results show that the method proposed in this article achieves an average precision (AP) improvement of 3.8% and an average recall (AR) improvement of 2.7% over the advanced method PCT on the self-constructed dataset and is able to effectively cope with the human occlusion problem in complex industrial scenes.