
J Shanghai Jiaotong Univ Sci››2025,Vol. 30››Issue (6): 1103-1113.doi:10.1007/s12204-023-2658-z
• Automation & Computer Technologies •Previous ArticlesNext Articles
TAHIR Rizwana,b, 蔡云泽a,b,c
Received:2022-10-28Accepted:2023-02-10Online:2025-11-21Published:2025-11-26CLC Number:
TAHIR Rizwana, CAI Yunze. Multi-Human Pose Estimation by Deep Learning-Based Sequential Approach for Human Keypoint Position and Human Body Detection[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(6): 1103-1113.
[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//26th Annual Conference on Advance in Neural Information Process System.Lake Tahoe: Curran Assosiates, Inc., 2012: 1-9. [2] SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation [C]//IEEE TransactionsonPatternAnalysisandMachineIntelligence. Boston: IEEE, 2016: 640-651. [3] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//28th Annual Conference on Advances in Neural Information Processing Systems. Quebec: MIT Press, 2015: 91-99. [4] TOSHEV A, SZEGEDY C. DeepPose: Human pose estimation via deep neural networks [C]//2014 IEEEConference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1653-1660. [5] KAMEL A, SHENG B, LI P, et al. Hybrid refinement-correction heatmaps for human pose estimation [J].IEEE Transactions on Multimedia, 2021,23: 1330-1342. [6] CAO Z, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields [J].IEEETransactions on Pattern Analysis and Machine Intelligence, 2021,43(1): 172-186. [7] ARTACHO B, SAVAKIS A. BAPose: Bottom-up pose estimation with disentangled waterfall representations [C]//2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. Waikoloa: IEEE, 2023: 528-537. [8] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]//Proceedings of the 2014 IEEEConference on Computer Vision and Pattern Recognition. New York: ACM, 2014: 580-587. [9] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 936-944. [10] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]//2017 IEEEInternational Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988. [11] LI J E, WANG Z X, QI B, et al. MEMe: A mutually enhanced modeling method for efficient and effective human pose estimation [J].Sensors, 2022,22(2): 632. [12] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition [DB/OL]. (2014-09-04).https://arxiv.org/abs/1409.1556 [13] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]//2016 IEEEConference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. [14] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[M]//European conference on computer vision. Cham: Springer, 2016: 483-499. [15] HUA G G, LI L H, LIU S G. Multipath affinage stacked—Hourglass networks for human pose estimation [J].Frontiers of Computer Science, 2020,14(4): 144701. [16] CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7103-7112. [17] SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5686-5696. [18] MAO W A, GE Y T, SHEN C H, et al. Poseur: direct human pose regression with transformers[M]//European conference on computer vision. Cham: Springer, 2022: 72-88. [19] LUVIZON D C, TABIA H, PICARD D. Human pose regression by combining indirect part detection and contextual information [J].Computers&Graphics, 2019,85: 15-22. [20] LIU H, LIU W, CHI Z, et al.Fast human pose estimation in compressed videos [J].IEEETransactions onMultimedia, 2022,25: 1390-1400. [21] XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking[M]//European conference on computer vision. Cham: Springer, 2018: 472-487. [22] XIAO J, LI H, QU G, et al. Hope: Heatmap and offset for pose estimation[J].Journal of Ambient Intelligence and Humanized Computing, 2022,13: 2937-2949. [23] GKIOXARI G, HARIHARAN B, GIRSHICK R, et al. Using k-poselets for detecting people and localizing their keypoints [C]//2014 IEEEConference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 3582-3589. [24] PISHCHULIN L, ANDRILUKA M, GEHLER P, et al. Poselet conditioned pictorial structures [C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 588-595. [25] PISHCHULIN L, JAIN A, ANDRILUKA M, et al. Articulated people detection and pose estimation: Reshaping the future [C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3178-3185. [26] REN Z H, FANG F Z, YAN N, et al. State of the art in defect detection based on machine vision [J].International Journal ofPrecision Engineering and Manufacturing-Green Technology, 2022,9(2): 661-691. [27] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition [J].International Journal of Computer Vision, 2005,61: 55-79. [28] REN S, HE K, GIRSHICK R, et al.Faster R-CNN: Towards real-time object detection with region proposal networks [C]//28th Annual Conference on Advances in Neural Information Processing Systems. Quebec: MIT Press, 2015: 1-8. [29] PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-person pose estimation in the wild [C]//2017 IEEEConference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3711-3719. [30] PISHCHULIN L, INSAFUTDINOV E, TANG S Y, et al. DeepCut: joint subset partition and labeling for multi person pose estimation [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4929-4937. [31] INSAFUTDINOV E, PISHCHULIN L, ANDRES B, et al. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model[M]//European conference on computer vision. Cham: Springer, 2016: 34-50. [32] INSAFUTDINOV E, ANDRILUKA M, PISHCHULIN L, et al. ArtTrack: articulated multi-person tracking in the wild [C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1293-1301. [33] LI Z Q, BAO J S, LIU T Y, et al. Judging the normativity of PAF based on TFN and NAN [J].Journalof Shanghai Jiao Tong University(Science), 2020,25(5): 569-577. [34] ZHU X, JIANG Y, LUO Z.Multi-person pose estimation for posetrack with enhanced part affinity fields [C]//2017 International Conference on Computer Vision Pose Track Workshop.Venice: IEEE, 2017: 7-11. [35] NEWELL A, HUANG Z, DENG J. Associative embedding: End-to-end learning for joint detection and grouping[C]//Advances in Neural Information Processing Systems. Long Beach: MIT Press, 2017: 2277-2287. [36] KOCABAS M, KARAGOZ S, AKBAS E. MultiPoseNet: fast multi-person pose estimation using pose residual network[M]//European conference on computer vision. Cham: Springer, 2018: 437-453. [37] PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[M]//European conference on computer vision. Cham: Springer, 2018: 282-299. [38] LIN J J, LEE G H. Learning spatial context with graph neural network for multi-person pose grouping[C]//2021 IEEE International Conference on Robotics and Automation. Xi’an: IEEE, 2021: 4230-4236. [39] HARA K, KATAOKA H, SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//IEEE conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6546-6555. [40] PETERSEN P, VOIGTLAENDER F. Optimal approximation of piecewise smooth functions using deep ReLU neural networks [J].Neural Networks, 2018,108: 296-330. [41] ZHONG Y, WANG J, PENG J, et al. Anchor box optimization for object detection[C]//IEEE/CVF Winter Conference on Applications of Computer Vision. Colorado: IEEE, 2020: 1286-1294. [42] CHEN D, ZHANG S S, OUYANG W L, et al. Person search via a mask-guided two-stream CNN model[M]//European conference on computer vision. Cham: Springer, 2018: 764-781. [43] RIZWAN T, CAI Y Z, AHSAN M, et al. Neural network approach for 2-dimension person pose estimation with encoded mask and keypoint detection [J].IEEE Access, 2020,8: 107760-107771. [44] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[M]//European conference on computer vision. Cham: Springer, 2014: 740-755. [45] GU Y L, ZHANG H Y, KAMIJO S. Multi-person pose estimation using an orientation and occlusion aware deep learning network [J].Sensors, 2020,20(6): 1593. [46] WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 4724-4732. [47] CHEN K, GABRIEL P, ALASFOUR A, et al. Patient-specific pose estimation in clinical environments [J].IEEE Journal ofTranslational Engineering in Health and Medicine, 2018,6: 1-11. [48] ZHANG R, ZHU Z, LI P, et al. Exploiting offset-guided network for pose estimation and tracking[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 20-28. |
| [1] | YE Jihua, JIANG Lu, XIAO Shunjie, ZONG Yi, JIANG Aiwen.Multi-Label Image Classification Model Based on Multiscale Fusion and Adaptive Label Correlation[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 889-898. |
| [2] | LIN Xiao, LU Meichen, GAO Mufeng, LI Yan.Lightweight Human Pose Estimation Based on Multi-Attention Mechanism[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 899-910. |
| [3] | DING Leqi, WANG Biyun, YAO Lixiu, CAI Yunze.MAGPNet: Multi-Domain Attention-Guided Pyramid Network for Infrared Small Object Detection[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 935-951. |
| [4] | JIANG Wenbo, ZHENG Hangbin, BAO Jinsong.Novel Multi-Step Deep Learning Approach for Detection of Complex Defects in Solar Cells[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 1050-1064. |
| [5] | LIU Mengge, LIU Hao, HE Xin, JIN Shaohui, CHEN Pengyun, XU Mingliang.Research Advances on Non-Line-of-Sight Imaging Technology[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(5): 833-854. |
| [6] | Fu Zeyu, Fu Zhuang, Guan Yisheng.Vascular Interventional Surgery Path Planning and 3D Visual Navigation[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(3): 472-481. |
| [7] | Wang Baomin, Ding Hewei, Teng Fei, Liu Hongqin.Damage Detection of X-ray Image of Conveyor Belts with Steel Rope Cores Based on Improved FCOS Algorithm[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 309-318. |
| [8] | Wang Gang, Guan Yaonan, Li Dewei.Two-Stream Auto-Encoder Network for Unsupervised Skeleton-Based Action Recognition[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 330-336. |
| [9] | Diao Zijian, Cao Shuai, Li Wenwei, Liang Jianan, Wen Guilin, Huang Weixi, Zhang Shouming.Person Re-Identification Based on Spatial Feature Learning and Multi-Granularity Feature Fusion[J]. J Shanghai Jiaotong Univ Sci, 2025, 30(2): 363-374. |
| [10] | ZHOU Su (周苏), ZHONG Zebin∗(钟泽滨).Real-Time Ranging of Vehicles and Pedestrians for Mobile Application on Smartphones[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(6): 1081-1090. |
| [11] | YAN Congqiang1,2(鄢丛强), GUO Zhengyun3,4(郭正玉), CAI Yunze1,2∗∗ (蔡云泽).Data Augmentation of Ship Wakes in SAR Images Based on Improved CycleGAN[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(4): 702-711. |
| [12] | LONARE Savita1,2*, BHRAMARAMBA Ravi2.Federated Approach for Privacy-Preserving Traffic Prediction Using Graph Convolutional Network[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 509-517. |
| [13] | LV Feng(吕峰), WANG Xinyan*(王新彦), LI Lei(李磊), JIANG Quan(江泉), YI Zhengyang(易政洋).Tree Detection Algorithm Based on Embedded YOLO Lightweight Network[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 518-527. |
| [14] | SONG Liboa(宋立博), FEI Yanqiongb(费燕琼).New Lite YOLOv4-Tiny Algorithm and Application on Crack Intelligent Detection[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(3): 528-536. |
| [15] | SHEN Ao1,2‡(沈傲), HU Jisu2,3‡(胡冀苏), JIN Pengfei4(金鹏飞), ZHOU Zhiyong2(周志勇), QIAN Xusheng2,3(钱旭升), ZHENG Yi2(郑毅), BAO Jie4(包婕), WANG Ximing4∗(王希明), DAI Yakang1,2∗(戴亚康).Ensemble Attention Guided Multi-SEANet Trained with Curriculum Learning for Noninvasive Prediction of Gleason Grade Groups from MRI[J]. J Shanghai Jiaotong Univ Sci, 2024, 29(1): 109-119. |
| Viewed | ||||||
| Full text |
|
|||||
| Abstract |
|
|||||