基于RGB图像的3D手势姿态估计毕业论文

2021-10-26 21:58:58

摘要

最近几年，与手势估计休戚相关的功能（包括手姿态估计与手势识别等）在人与机器的互动中的应用愈加普及，比如手语辨别、电脑游戏和AR技术等。随着研究人员的不断探索，深度神经网络取得了巨大的进步,因而与手势相关联的技术也更加成熟了。然而,目前还没有一个能够投入实用的手势相关的系统。最开始的手势识别和手势估计要实现对手势的分类和姿态估计首先需要依据深度信息将手部裁剪出来。目前手姿态估计相关的技术主要是基于深度图像的，比如现在手机上普遍应用的TOF技术，这种技术可以获取设备与物体之间的距离。然而这种基于深度图像的技术需要深度摄像头的支持，所以成本相对比较高不便于技术的普及。

随着卷积神经网络的不断进步，近年来开始出现了基于RGB图像的手势姿态估。这种技术只需要一张彩色图片便可以完成对手势姿态的估计。本文提出了一个深度网络，该网络可以连同图像中检测到的关键点一起可以很好地估计3D姿势。本文还引入了基于合成手模型的大规模3D姿势数据集来训练所涉及的网络。该网络包括三个子网络，第一个网络用于定位手部，并将其分割出来，第二个网络的作用是定位手的关节点，第三个网络用学习到的先验知识从2D关键点估计出3D坐标，从而估计出手的姿态。在包括手语识别在内的各种测试集上进行的实验证明了在单个彩色图像上进行3D手部姿势估计的可行性。

关键词：手势估计；手势识别；深度图像；RGB图像

Abstract

In recent years, applications related to gesture estimation (including hand pose estimation and gesture recognition, etc.) have become increasingly popular in human-machine interaction, such as sign language recognition, computer games, and AR technology. As researchers continue to explore, deep neural networks have made tremendous progress, so the technology associated with gestures has become more mature. However, there is currently no practical gesture-related system. The first gesture recognition and gesture estimation to achieve gesture classification and gesture estimation first need to cut out the hand according to the depth information. The current technologies related to hand pose estimation are mainly based on depth images, such as the TOF technology commonly used on mobile phones, which can obtain the distance between the device and the object. However, this depth image-based technology requires the support of a depth camera, so the relatively high cost is not convenient for the popularization of the technology.

With the continuous progress of convolutional neural networks, gesture pose estimation based on RGB images has begun to appear in recent years. This technology only needs a color picture to complete the gesture pose estimation. This paper proposes a deep network that can estimate the 3D pose well together with the key points detected in the image. This paper also introduces a large-scale 3D pose data set based on synthetic hand models to train the involved networks. The network includes three sub-networks. The first network is used to locate and segment the hand. The second network is used to locate the joints of the hand. The third network uses the learned a priori knowledge from the 2D key The point estimates the 3D coordinates, thereby estimating the gesture of the hand. Experiments conducted on various test sets including sign language recognition have proved the feasibility of 3D hand pose estimation on a single color image.

Key Words: Gesture estimation; gesture recognition; depth image; RGB image

第一章绪论 1

1.1 研究目的及意义 1

1.2 国内外研究现状 2

1.3 研究基本内容 3

第二章传统的手势估计方法 4

2.1 手势姿态估计概述 4

2.2 手势估计方法 4

2.2.1生成方法 4

2.2.2 判别方法 5

2.2.3 混合方法 5

2.3 基于深度图像的手势估计方法 5

2.3.1 手部位置检测 5

2.3.2 手部的分割 7

2.3.3 2D关键点检测 8

2.4 本章小结 10

第三章基于单张RGB图像的3D手势估计方法 11

3.1 手势表示 11

3.2 3D手势的估计 11

3.2.1使用HandSegNet进行手分割 11

3.2.2 使用PoseNet的关键点得分图 12

3.2.3 3D hand pose with the PosePrior network 12

3.2.4 网络训练 13

3.3 本章小结 14

第四章仿真与分析 15

4.1 手势估计数据集 15

4.2 仿真及分析 16

4.2.1 仿真环境配置 16

4.2.2 二维关键点检测 16

4.2.3 将估算提升到3D 18

4.3 本章小结 20

第五章总结与展望 21

参考文献 22

致谢 25

附录 26

training_handsegnet.py 26

training_posenet.py 28

training_lifting.py 30

ColorHandPose3DNetwork.py 33

PosePriorNetwork.py 43

run.py 49

您需要先支付 50元 才能查看全部内容！立即支付

注册

找回密码