2020-04-29 18:50:39
早在2001年,Viola和Jones[4]共同提出的一种人脸检测框架利用了Haar-Like特性和AdaBoost来训练级联分类器,这是第一个实时的人脸检测算法,它实现了实时效率的良好性能。然而,不少研究[5][6][7] 表明,即使使用更先进的特征和分类器,这种检测器在真实世界的应用中可能会显着降低人脸的视觉变化。除了级联结构外,Mathias和Benenson[8][9][10]等人引入了用于人脸检测的可变性的组件模型(DPM),并取得了显着的性能。但是,他们需要高昂的计算成本,并且在训练阶段通常可能需要昂贵的标注。
面部对齐也吸引了广泛的兴趣。基于回归的方法[13][14][15] 和模板拟合方法[16][17][10]是两个流行的类别。最近,Z.Zhang[18]等人提出使用面部属性识别作为辅助任务来增强使用深度卷积神经网络的人脸对准性能。
本设计中的主要技术就是多任务卷积神经网络[21],并且提出了一个新的框架,使用统一的多任务学习级联卷积神经网络来整合人脸检测和人脸对齐这两个任务。本设计的意义在于建立一个实时性的轻量级卷积神经网络架构来减少人工操作,提高和改善性能,进一步简化了面部识别的方法,可以详细了解面部图像的规律,不但能够快速学习,而且所消耗的时间也更短,具有较高的识别效率,能够在人脸识别领域中得到广泛应用 。
3. 参考文献
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097-1105.
[2] [美]Ian Goodfellow,[加]Yoshua Bengio著;深度学习,人民邮电出版社,2017-08
[3] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” in Advances in Neural Information Processing Systems, 2014, pp. 1988-1996.
[4] P. Viola and M. J. Jones, “Robust real-time face detection. International journal of computer vision,” vol. 57, no. 2, pp. 137-154, 2004
[5] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel eatures for multi-view face detection,” in IEEE International Joint Conference on Biometrics, 2014, pp. 1-8.
[6] M. T. Pham, Y. Gao, V. D. D. Hoang, and T. J. Cham, “Fast polygonal integration and its application in extending haar-like features to improve object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 942-949.
[7] Q. Zhu, M. C. Yeh, K. T. Cheng, and S. Avidan, “Fast human detection using a cascade of histograms of oriented gradients,” in IEEE Computer Conference on Computer Vision and Pattern Recognition, 2006, pp. 1491-1498.
[8] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, “Face detection without bells and whistles,” in European Conference on Computer Vision, 2014, pp. 720-735.
[9] J. Yan, Z. Lei, L. Wen, and S. Li, “The fastest deformable part model for object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2497-2504.
[10] X. Zhu, and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2879-2886.
[11] S. Yang, P. Luo, C. C. Loy, and X. Tang, “From facial parts responses to face detection: A deep learning approach,” in IEEE International Conference on Computer Vision, 2015, pp. 3676-3684.
[12] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural network cascade for face detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5325-5334
[13] X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark estimation under occlusion,” in IEEE International Conference on Computer Vision, 2013, pp. 1513-1520.
[14] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International Journal of Computer Vision, vol 107, no. 2, pp.177-190, 2012.
[15] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment,” in European Conference on Computer Vision, 2014, pp. 1-16.
[16] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
[17] X. Yu, J. Huang, S. Zhang, W. Yan, and D. Metaxas, “Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model,” in IEEE International Conference on Computer Vision, 2013, pp. 1944-1951.
[18] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European Conference on Computer Vision, 2014, pp. 94-108.
[19] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun, “Joint cascade face detection and alignment,” in European Conference on Computer Vision, 2014, pp.109-122.
[20] C. Zhang, and Z. Zhang, “Improving multiview face detection with multi-task deep convolutional neural networks,” IEEE Winter Conference on Applications of Computer Vision, 2014, pp. 1036-1041.
[21] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Processing Letters, Volume: 23, Issue: 10, Oct. 2016
早在2001年,Viola和Jones[4]共同提出的一种人脸检测框架利用了Haar-Like特性和AdaBoost来训练级联分类器,这是第一个实时的人脸检测算法,它实现了实时效率的良好性能。然而,不少研究[5][6][7] 表明,即使使用更先进的特征和分类器,这种检测器在真实世界的应用中可能会显着降低人脸的视觉变化。除了级联结构外,Mathias和Benenson[8][9][10]等人引入了用于人脸检测的可变性的组件模型(DPM),并取得了显着的性能。但是,他们需要高昂的计算成本,并且在训练阶段通常可能需要昂贵的标注。
面部对齐也吸引了广泛的兴趣。基于回归的方法[13][14][15] 和模板拟合方法[16][17][10]是两个流行的类别。最近,Z.Zhang[18]等人提出使用面部属性识别作为辅助任务来增强使用深度卷积神经网络的人脸对准性能。
本设计中的主要技术就是多任务卷积神经网络[21],并且提出了一个新的框架,使用统一的多任务学习级联卷积神经网络来整合人脸检测和人脸对齐这两个任务。本设计的意义在于建立一个实时性的轻量级卷积神经网络架构来减少人工操作,提高和改善性能,进一步简化了面部识别的方法,可以详细了解面部图像的规律,不但能够快速学习,而且所消耗的时间也更短,具有较高的识别效率,能够在人脸识别领域中得到广泛应用 。
2. 研究的基本内容与方案
本设计利用多任务卷积神经网络(Multi-task Convolutional Neural Networks, MTCNN)在图像与视频中检测人脸,包括:MTCNN模型包含的三个网络P-Net,R-Net,O-Net的功能与结构,MTCNN模型构建与训练,利用导出模型设计人脸检测与人脸对齐算法。
在此基础上综合运用所学的C 等编程知识完成系统的设计与实现工作。
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097-1105.
[2] [美]Ian Goodfellow,[加]Yoshua Bengio著;深度学习,人民邮电出版社,2017-08
[3] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representation by joint identification-verification,” in Advances in Neural Information Processing Systems, 2014, pp. 1988-1996.
[4] P. Viola and M. J. Jones, “Robust real-time face detection. International journal of computer vision,” vol. 57, no. 2, pp. 137-154, 2004
[5] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel eatures for multi-view face detection,” in IEEE International Joint Conference on Biometrics, 2014, pp. 1-8.
[6] M. T. Pham, Y. Gao, V. D. D. Hoang, and T. J. Cham, “Fast polygonal integration and its application in extending haar-like features to improve object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 942-949.
[7] Q. Zhu, M. C. Yeh, K. T. Cheng, and S. Avidan, “Fast human detection using a cascade of histograms of oriented gradients,” in IEEE Computer Conference on Computer Vision and Pattern Recognition, 2006, pp. 1491-1498.
[8] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool, “Face detection without bells and whistles,” in European Conference on Computer Vision, 2014, pp. 720-735.
[9] J. Yan, Z. Lei, L. Wen, and S. Li, “The fastest deformable part model for object detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2497-2504.
[10] X. Zhu, and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2879-2886.
[11] S. Yang, P. Luo, C. C. Loy, and X. Tang, “From facial parts responses to face detection: A deep learning approach,” in IEEE International Conference on Computer Vision, 2015, pp. 3676-3684.
[12] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural network cascade for face detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5325-5334
[13] X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face landmark estimation under occlusion,” in IEEE International Conference on Computer Vision, 2013, pp. 1513-1520.
[14] X. Cao, Y. Wei, F. Wen, and J. Sun, “Face alignment by explicit shape regression,” International Journal of Computer Vision, vol 107, no. 2, pp.177-190, 2012.
[15] J. Zhang, S. Shan, M. Kan, and X. Chen, “Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment,” in European Conference on Computer Vision, 2014, pp. 1-16.
[16] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
[17] X. Yu, J. Huang, S. Zhang, W. Yan, and D. Metaxas, “Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model,” in IEEE International Conference on Computer Vision, 2013, pp. 1944-1951.
[18] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European Conference on Computer Vision, 2014, pp. 94-108.
[19] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun, “Joint cascade face detection and alignment,” in European Conference on Computer Vision, 2014, pp.109-122.
[20] C. Zhang, and Z. Zhang, “Improving multiview face detection with multi-task deep convolutional neural networks,” IEEE Winter Conference on Applications of Computer Vision, 2014, pp. 1036-1041.
[21] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Processing Letters, Volume: 23, Issue: 10, Oct. 2016
剩余内容已隐藏,您需要先支付 5元 才能查看该篇文章全部内容!立即支付