PODALA: Power-Efficient Object Detection Accelerator With Customized Layer Fusion Engine

1","orcid":"0009-0006-2067-1591","name_cn":"TING YUE","xref_en":"¹","name_en":"TING YUE"},{"deceased":false,"xref":"¹","orcid":"0000-0002-6685-5576","name_cn":"LIANG CHANG","email":"liangchang@uestc.edu.cn","xref_en":"¹","name_en":"LIANG CHANG"},{"deceased":false,"xref":"²","orcid":"0000-0002-0243-6516","name_cn":"HAOBO XU","xref_en":"²","name_en":"HAOBO XU"},{"deceased":false,"xref":"³","orcid":"0000-0002-7606-0347","name_cn":"CHENGZHI WANG","xref_en":"³","name_en":"CHENGZHI WANG"},{"deceased":false,"xref":"¹","orcid":"0000-0003-2296-8146","name_cn":"SHUISHENG LIN","xref_en":"¹","name_en":"SHUISHENG LIN"},{"deceased":false,"xref":"¹","orcid":"0000-0003-2098-9621","name_cn":"JUN ZHOU","email":"zhouj@uestc.edu.cn","xref_en":"¹","name_en":"JUN ZHOU"}],"authorNotesCorresp_en":["⁺ CORRESPONDING AUTHOR: LIANG CHANG; JUN ZHOU (e-mail: liangchang@uestc.edu.cn; zhouj@uestc.edu.cn)."],"referenceList":[{"sourceEn":"Procedia Comput. Sci.","publicationType":"journal","id":"b1","label":"[1]","nian":2018,"citedCount":0,"citationList":[{"personList":[{"name":"A. R. Pathak","personType":"author"},{"name":"M. Pandey","personType":"author"},{"name":"S. Rautaray","personType":"author"}],"content":"A. R.

Pathak

, M.

Pandey

, and S.

Rautaray

, “Application of deep learning for object detection,” Procedia Comput. Sci., vol. 132, pp. 1706-1717, 2018."}]},{"sourceEn":"in Proc. IEEE Int. Solid-State Circuits Conf.","publicationType":"journal","id":"b2","label":"[2]","nian":2023,"citedCount":0,"citationList":[{"personList":[{"name":"Y. Gong","personType":"author"},{"personType":"author"}],"content":"Y.

Gong

et al., “22.7 DL-VOPU: An energy-efficient domain-specific deep-learning-based visual object processing unit supporting multi-scale semantic feature extraction for mobile object detection/tracking applications,” in Proc. IEEE Int. Solid-State Circuits Conf., 2023, pp. 1-3."}]},{"sourceEn":"IEEE Trans. Circuits Syst. I, Reg. Papers","publicationType":"journal","id":"b3","label":"[3]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"Z. Hu","personType":"author"},{"name":"J. Zeng","personType":"author"},{"name":"X. Zhao","personType":"author"},{"name":"L. Zhou","personType":"author"},{"name":"L. Chang","personType":"author"}],"content":"Z.

, J.

Zeng

, X.

Zhao

, L.

Zhou

, and L.

Chang

, “SuperHCA: An efficient deep-learning edge super-resolution accelerator with sparsity-aware heterogeneous core architecture,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 71, no. 12, pp. 5420-5431, Dec. 2024"}]},{"sourceEn":"in Proc. Eur. Conf. Comput. Vis.","publicationType":"journal","id":"b4","label":"[4]","nian":2016,"citedCount":0,"citationList":[{"personList":[{"name":"W. Liu","personType":"author"},{"personType":"author"}],"content":"W.

Liu

et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 21-37."}]},{"sourceEn":"in Proc. IEEE Int. Conf. Comput. Vis.","publicationType":"journal","id":"b5","label":"[5]","nian":2015,"citedCount":0,"citationList":[{"personList":[{"name":"R. Girshick","personType":"author"}],"content":"R.

Girshick

, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440-1448."}]},{"sourceEn":"IEEE Trans. Pattern Anal. Mach. Intell.","publicationType":"journal","id":"b6","label":"[6]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"S. Ren","personType":"author"},{"name":"K. He","personType":"author"},{"name":"R. Girshick","personType":"author"},{"name":"J. Sun","personType":"author"}],"content":"S.

Ren

, K.

, R.

Girshick

, and J.

Sun

, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137-1149, Jun. 2017."}]},{"sourceEn":"in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b7","label":"[7]","nian":2016,"citedCount":0,"citationList":[{"personList":[{"name":"J. Redmon","personType":"author"},{"name":"S. Divvala","personType":"author"},{"name":"R. Girshick","personType":"author"},{"name":"A. Farhadi","personType":"author"}],"content":"J.

Redmon

, S.

Divvala

, R.

Girshick

, and A.

Farhadi

, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779-788."}]},{"sourceEn":"“Fused-layer CNN accelerators,”","publicationType":"journal","id":"b8","label":"[8]","nian":2016,"citedCount":0,"citationList":[{"personList":[{"name":"M. Alwani","personType":"author"},{"name":"H. Chen","personType":"author"},{"name":"M. Ferdman","personType":"author"},{"name":"P. Milder","personType":"author"}],"content":"M.

Alwani

, H.

Chen

, M.

Ferdman

, and P.

Milder

, “Fused-layer CNN accelerators,” in Proc. 9th Annu. IEEE/ACM Int. Symp. Microarchit., 2016, pp. 1-12."}]},{"sourceEn":"IEEE Trans. Very Large Scale Integration Syst.","publicationType":"journal","id":"b9","label":"[9]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"A.-J. Huang","personType":"author"},{"name":"J.-H. Hung","personType":"author"},{"name":"T.-S. Chang","personType":"author"}],"content":"A.-J.

Huang

, J.-H.

Hung

, and T.-S.

Chang

, “Memory bandwidth efficient design for super-resolution accelerators with structure adaptive fusion and channel-aware addressing,” IEEE Trans. Very Large Scale Integration Syst., vol. 31, no. 6, pp. 802-811, Jun. 2023."}]},{"sourceEn":"J. Real-Time Image Process.","publicationType":"journal","id":"b10","label":"[10]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"K. Xu","personType":"author"},{"personType":"author"}],"content":"K.

et al., “A dedicated hardware accelerator for real-time acceleration of YOLOv2,” J. Real-Time Image Process., vol. 18, pp. 481-492, 2021."}]},{"sourceEn":"IEEE Trans. Circuits Syst. I, Reg. Papers","publicationType":"journal","id":"b11","label":"[11]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"X. Xie","personType":"author"},{"name":"J. Lin","personType":"author"},{"name":"Z. Wang","personType":"author"},{"name":"J. Wei","personType":"author"}],"content":"X.

Xie

, J.

Lin

, Z.

Wang

, and J.

Wei

, “An efficient and flexible accelerator design for sparse convolutional neural networks,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 7, pp. 2936-2949, Jul. 2021."}]},{"sourceEn":"in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b12","label":"[12]","nian":2001,"citedCount":0,"citationList":[{"personList":[{"name":"P. Viola","personType":"author"},{"name":"M","personType":"author"},{"name":"Jones","personType":"author"}],"content":"P.

Viola

and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2001, pp. 1-1."}]},{"sourceEn":"in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b13","label":"[13]","nian":2005,"citedCount":0,"citationList":[{"personList":[{"name":"N. Dalal","personType":"author"},{"name":"B","personType":"author"},{"name":"Triggs","personType":"author"}],"content":"N.

Dalal

and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2005, pp. 886-893."}]},{"sourceEn":"in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b14","label":"[14]","nian":2008,"citedCount":0,"citationList":[{"personList":[{"name":"P. Felzenszwalb","personType":"author"},{"name":"D. McAllester","personType":"author"},{"name":"D. Ramanan","personType":"author"}],"content":"P.

Felzenszwalb

, D.

McAllester

, and D.

Ramanan

, “A discriminatively trained, multiscale, deformable part model,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2008, pp. 1-8."}]},{"sourceEn":"in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b15","label":"[15]","nian":2010,"citedCount":0,"citationList":[{"personList":[{"name":"P. F. Felzenszwalb","personType":"author"},{"name":"R. B. Girshick","personType":"author"},{"name":"D. McAllester","personType":"author"}],"content":"P. F.

Felzenszwalb

, R. B.

Girshick

, and D.

McAllester

, “Cascade object detection with deformable part models,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2010, pp. 2241-2248."}]},{"sourceEn":"IEEE Trans. Pattern Anal. Mach. Intell.","publicationType":"journal","id":"b16","label":"[16]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"P. F. Felzenszwalb","personType":"author"},{"name":"R. B. Girshick","personType":"author"},{"name":"D. McAllester","personType":"author"},{"name":"D. Ramanan","personType":"author"}],"content":"P. F.

Felzenszwalb

, R. B.

Girshick

, D.

McAllester

, and D.

Ramanan

, “Object detection with discriminatively trained part-based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627-1645, Sep. 2010."}]},{"sourceEn":"in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b17","label":"[17]","nian":2014,"citedCount":0,"citationList":[{"personList":[{"name":"R. Girshick","personType":"author"},{"name":"J. Donahue","personType":"author"},{"name":"T. Darrell","personType":"author"},{"name":"J. Malik","personType":"author"}],"content":"R.

Girshick

, J.

Donahue

, T.

Darrell

, and J.

Malik

, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580-587."}]},{"sourceEn":"IEEE Trans. Pattern Anal. Mach. Intell.","publicationType":"journal","id":"b18","label":"[18]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"R. B. Girshick","personType":"author"},{"name":"J. Donahue","personType":"author"},{"name":"T. Darrell","personType":"author"},{"name":"J. Malik","personType":"author"}],"content":"R. B.

Girshick

, J.

Donahue

, T.

Darrell

, and J.

Malik

, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, pp. 142-158, Jan. 2016."}]},{"publicationType":"journal","id":"b19","label":"[19]","nian":2020,"citedCount":0,"citationList":[{"personList":[{"name":"P. Adarsh","personType":"author"},{"name":"P. Rathi","personType":"author"},{"name":"M. Kumar","personType":"author"}],"content":"P.

Adarsh

, P.

Rathi

, and M.

Kumar

, “YOLO V3-tiny: Object detection and recognition using one stage improved model,” in Proc. 6th IEEE Int. Conf. Adv. Comput. Commun. Syst., 2020, pp. 687-694."}]},{"sourceEn":"Berlin/Heidelberg","publicationType":"book","id":"b20","label":"[20]","nian":1804,"citedCount":0,"citationList":[{"personList":[{"name":"J. Redmon","personType":"author"},{"name":"A","personType":"author"}],"content":"J.

Redmon

and A. Farhadi, “YOLOv3:An incremental improvement,” in Computer Vision and Pattern Recognition. Berlin/Heidelberg, Germany: Springer, 2018, vol. 1804, pp. 1-6."}]},{"publicationType":"journal","id":"b21","label":"[21]","nian":2020,"citedCount":0,"citationList":[{"personList":[{"name":"A. Bochkovskiy","personType":"author"},{"name":"C.-Y. Wang","personType":"author"},{"name":"H.-Y. M. Liao","personType":"author"}],"content":"A.

Bochkovskiy

, C.-Y.

Wang

, and H.-Y. M.

Liao

, “YOLOv4: Optimal speed and accuracy of object detection,” 2020, arXiv:2004.10934."}]},{"sourceEn":"in Proc. IEEE Int. Conf. Comput. Vis.","publicationType":"journal","id":"b22","label":"[22]","nian":2017,"citedCount":0,"citationList":[{"personList":[{"name":"T.-Y. Lin","personType":"author"},{"name":"P. Goyal","personType":"author"},{"name":"R. B. Girshick","personType":"author"},{"name":"K. He","personType":"author"},{"name":"P. Dollár","personType":"author"}],"content":"T.-Y.

Lin

, P.

Goyal

, R. B.

Girshick

, K.

, and P.

Dollár

, “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2999-3007."}]},{"sourceEn":"Int. J. Comput. Vis.","publicationType":"journal","id":"b23","label":"[23]","nian":2018,"citedCount":0,"citationList":[{"personList":[{"name":"H. Law","personType":"author"},{"name":"J. Deng","personType":"author"}],"content":"H.

Law

and J.

Deng

, “CornerNet: Detecting objects as paired key-points,” Int. J. Comput. Vis., vol. 128, pp. 642-656, 2018."}]},{"sourceEn":"IEEE Trans. Neural Netw. Learn. Syst.","publicationType":"journal","id":"b24","label":"[24]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"Z.-Q. Zhao","personType":"author"},{"name":"P. Zheng","personType":"author"},{"name":"S. T. Xu","personType":"author"},{"name":"X. Wu","personType":"author"}],"content":"Z.-Q.

Zhao

, P.

Zheng

, S. T.

, and X.

, “Object detection with deep learning: A review,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, pp. 3212-3232, Nov. 2019."}]},{"sourceEn":"in Proc. Eur. Conf. Comput. Vis.","publicationType":"journal","id":"b25","label":"[25]","nian":2020,"citedCount":0,"citationList":[{"personList":[{"name":"N. Carion","personType":"author"},{"name":"F. Massa","personType":"author"},{"name":"G. Synnaeve","personType":"author"},{"name":"N. Usunier","personType":"author"},{"name":"A. Kirillov","personType":"author"},{"name":"S. Zagoruyko","personType":"author"}],"content":"N.

Carion

, F.

Massa

, G.

Synnaeve

, N.

Usunier

, A.

Kirillov

, and S.

Zagoruyko

, “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 213-229."}]},{"sourceEn":"IEEE Trans. Pattern Anal. Mach. Intell.","publicationType":"journal","id":"b26","label":"[26]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"K. He","personType":"author"},{"name":"X. Zhang","personType":"author"},{"name":"S. Ren","personType":"author"},{"name":"J. Sun","personType":"author"}],"content":"K.

, X.

Zhang

, S.

Ren

, and J.

Sun

, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904-1916, Sep. 2015."}]},{"sourceEn":"in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b27","label":"[27]","nian":2017,"citedCount":0,"citationList":[{"personList":[{"name":"T.-Y. Lin","personType":"author"},{"name":"P. Dollár","personType":"author"},{"name":"R. Girshick","personType":"author"},{"name":"K. He","personType":"author"},{"name":"B. Hariharan","personType":"author"},{"name":"S. Belongie","personType":"author"}],"content":"T.-Y.

Lin

, P.

Dollár

, R.

Girshick

, K.

, B.

Hariharan

, and S.

Belongie

, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 936-944."}]},{"publicationType":"journal","id":"b28","label":"[28]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"L. Yu","personType":"author"},{"name":"Q. Zhao","personType":"author"},{"name":"Z. Wang","personType":"author"}],"content":"L.

, Q.

Zhao

, and Z.

Wang

, “Attention mechanism driven YOLOv3 on FPGA acceleration for efficient vision based defect inspection,” in Proc. 5th Int. Conf. Comput. Sci. Appl. Eng., 2021, pp. 1-5."}]},{"sourceEn":"IEEE Access","publicationType":"journal","id":"b29","label":"[29]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"D. Pestana","personType":"author"},{"personType":"author"}],"content":"D.

Pestana

et al., “A full featured configurable accelerator for object detection with YOLO,” IEEE Access, vol. 9, pp. 75864-75877, 2021."}]},{"sourceEn":"in Proc. IEEE Int. Symp. Circuits Syst.","publicationType":"journal","id":"b30","label":"[30]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"J. Zhang","personType":"author"},{"personType":"author"}],"content":"J.

Zhang

et al., “A low-latency FPGA implementation for real-time object detection,” in Proc. IEEE Int. Symp. Circuits Syst., 2021, pp. 1-5."}]},{"publicationType":"journal","id":"b31","label":"[31]","nian":2020,"citedCount":0,"citationList":[{"personList":[{"name":"Z. Zhou","personType":"author"},{"name":"Y. Liu","personType":"author"},{"name":"Y. Xu","personType":"author"}],"content":"Z.

Zhou

, Y.

Liu

, and Y.

,“Design and implementation of YOLOv3-tiny accelerator based on PYNQ-Z2 heterogeneous platform,” in Proc. 4th Int. Conf. Electron. Inf. Technol. Comput. Eng., 2020, pp. 1097-1102."}]},{"sourceEn":"IEEE Access","publicationType":"journal","id":"b32","label":"[32]","nian":2020,"citedCount":0,"citationList":[{"personList":[{"name":"Z. Wang","personType":"author"},{"name":"K. Xu","personType":"author"},{"name":"S. Wu","personType":"author"},{"name":"L. Liu","personType":"author"},{"name":"L. Liu","personType":"author"},{"name":"D. Wang","personType":"author"}],"content":"Z.

Wang

, K.

, S.

, L.

Liu

, L.

Liu

, and D.

Wang

, “Sparse-YOLO: Hardware/software co-design of an FPGA accelerator for YOLOv2,” IEEE Access, vol. 8, pp. 116569-116585, 2020."}]},{"sourceEn":"in Proc. IEEE Int. Symp. Circuits Syst.","publicationType":"journal","id":"b33","label":"[33]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"A. Ahmad","personType":"author"},{"name":"M. A. Pasha","personType":"author"},{"name":"G. J. Raza","personType":"author"}],"content":"A.

Ahmad

, M. A.

Pasha

, and G. J.

Raza

, “Accelerating tiny YOLOv3 using FPGA-based hardware/software co-design,” in Proc. IEEE Int. Symp. Circuits Syst., Oct. 2020, pp. 1-5."}]},{"publicationType":"journal","id":"b34","label":"[34]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"H. Zhang","personType":"author"},{"name":"J. Jiang","personType":"author"},{"name":"Y. Fu","personType":"author"},{"name":"Y. Chang","personType":"author"}],"content":"H.

Zhang

, J.

Jiang

, Y.

, and Y.

Chang

, “YOLOv3-tiny object detection SOC based on FPGA platform,” in Proc. 6th IEEE Int. Conf. Integr. Circuits Microsyst., 2021, pp. 291-294."}]},{"sourceEn":"IEEE Trans. Circuits Syst. II, Exp. Briefs","publicationType":"journal","id":"b35","label":"[35]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"S. Ki","personType":"author"},{"name":"J. Park","personType":"author"},{"name":"H. Kim","personType":"author"}],"content":"S.

, J.

Park

, and H.

Kim

, “Dedicated FPGA implementation of the Gaussian tiny YOLOv3 accelerator,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 70, no. 10, pp. 3882-3886, Oct. 2023."}]},{"sourceEn":"“MobileNets: Efficient convolutional neural networks for mobile vision applications,”","publicationType":"journal","id":"b36","label":"[36]","nian":2017,"citedCount":0,"citationList":[{"personList":[{"name":"A. G. Howard","personType":"author"},{"personType":"author"}],"content":"A. G.

Howard

et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” 2017, arXiv:1704.04861."}]},{"sourceEn":"in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b37","label":"[37]","nian":2018,"citedCount":0,"citationList":[{"personList":[{"name":"X. Zhang","personType":"author"},{"name":"X. Zhou","personType":"author"},{"name":"M. Lin","personType":"author"},{"name":"J. Sun","personType":"author"}],"content":"X.

Zhang

, X.

Zhou

, M.

Lin

, and J.

Sun

, “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6848-6856."}]},{"sourceEn":"in Proc. Eur. Conf. Comput. Vis.","publicationType":"journal","id":"b38","label":"[38]","nian":2018,"citedCount":0,"citationList":[{"personList":[{"name":"N. Ma","personType":"author"},{"name":"X. Zhang","personType":"author"},{"name":"H. Zheng","personType":"author"},{"name":"J. Sun","personType":"author"}],"content":"N.

, X.

Zhang

, H.

Zheng

, and J.

Sun

, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 116-131."}]},{"publicationType":"journal","id":"b39","label":"[39]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"S. Qian","personType":"author"},{"name":"C. Ning","personType":"author"},{"name":"Y. Hu","personType":"author"}],"content":"S.

Qian

, C.

Ning

, and Y.

, “MobileNetV3 for image classification,” in Proc. IEEE 2nd Int. Conf. Big Data, Artif. Intell. Internet Things Eng., 2021, pp. 490-497."}]},{"sourceEn":"in Proc. Int. Conf. Mach. Learn.","publicationType":"journal","id":"b40","label":"[40]","nian":2021,"citedCount":0,"citationList":[{"personList":[{"name":"M. Tan","personType":"author"},{"name":"Q","personType":"author"},{"name":"V. Le","personType":"author"}],"content":"M.

Tan

and Q. V.

, “EfficientNetV2: Smaller models and faster training,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 10096-10106."}]},{"sourceEn":"in IEEE/CVF Conf. Comput. Vis. Pattern Recognit.","publicationType":"journal","id":"b41","label":"[41]","nian":2018,"citedCount":0,"citationList":[{"personList":[{"name":"M. Sandler","personType":"author"},{"name":"A. G. Howard","personType":"author"},{"name":"M. Zhu","personType":"author"},{"name":"A. Zhmoginov","personType":"author"},{"name":"L.-C. Chen","personType":"author"}],"content":"M.

Sandler

, A. G.

Howard

, M.

Zhu

, A.

Zhmoginov

, and L.-C.

Chen

, “MobileNetV2: Inverted residuals and linear bottlenecks,” in IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510-4520."}]},{"sourceEn":"“Microsoft COCO: Common objects in context,”","publicationType":"journal","id":"b42","label":"[42]","nian":2014,"citedCount":0,"citationList":[{"personList":[{"name":"T.-Y. Lin","personType":"author"},{"personType":"author"}],"content":"T.-Y.

Lin

et al., “Microsoft COCO: Common objects in context,” in Proc. 13th Eur. Conf. Comput. Vis., 2014, pp. 740-755."}]},{"sourceEn":"Int. J. Comput. Vis.","publicationType":"journal","id":"b43","label":"[43]","nian":2015,"citedCount":0,"citationList":[{"personList":[{"name":"M. Everingham","personType":"author"},{"name":"S. M. A. Eslami","personType":"author"},{"name":"L. Van Gool","personType":"author"},{"name":"C. K. I. Williams","personType":"author"},{"name":"J. Winn","personType":"author"},{"name":"A. Zisserman","personType":"author"}],"content":"M.

Everingham

, S. M. A.

Eslami

, L. Van

Gool

, C. K. I.

Williams

, J.

Winn

, and A.

Zisserman

, “The pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 31, pp. 98-136, 2015."}]},{"publicationType":"journal","id":"b44","label":"[44]","nian":2024,"citedCount":0,"citationList":[{"personList":[{"name":"A. Wang","personType":"author"},{"personType":"author"}],"content":"A.

Wang

et al., “YOLOv10: Real-time end-to-end object detection,” 2024, arXiv:2405.14458."}]},{"sourceEn":"“Deep convolutional neural network inference with floating-point weights and fixed-point activations,”","publicationType":"journal","id":"b45","label":"[45]","nian":2017,"citedCount":0,"citationList":[{"personList":[{"name":"L. Lai","personType":"author"},{"name":"N. Suda","personType":"author"},{"name":"V. Chandra","personType":"author"}],"content":"L.

Lai

, N.

Suda

, and V.

Chandra

, “Deep convolutional neural network inference with floating-point weights and fixed-point activations,” 2017, arXiv:1703.03073."}]},{"sourceEn":"in Proc. IEEE Int. Conf. Field Program. Technol.","publicationType":"journal","id":"b46","label":"[46]","nian":2023,"citedCount":0,"citationList":[{"personList":[{"name":"A. Montgomerie-Corcoran","personType":"author"},{"name":"P. Toupas","personType":"author"},{"name":"Z. Yu","personType":"author"},{"name":"C.-S. Bouganis","personType":"author"}],"content":"A.

Montgomerie-Corcoran

, P.

Toupas

, Z.

, and C.-S.

Bouganis

, “SA-TAY: A streaming architecture toolflow for accelerating YOLO models on FPGA devices,” in Proc. IEEE Int. Conf. Field Program. Technol., 2023, pp. 179-187."}]},{"sourceEn":"IEEE Trans. Very Large Scale Integration Syst.","publicationType":"journal","id":"b47","label":"[47]","nian":0,"citedCount":0,"citationList":[{"personList":[{"name":"W. Lee","personType":"author"},{"name":"K. Kim","personType":"author"},{"name":"W. Ahn","personType":"author"},{"name":"J. Kim","personType":"author"},{"name":"D. Jeon","personType":"author"}],"content":"W.

Lee

, K.

Kim

, W.

Ahn

, J.

Kim

, and D.

Jeon

, “A real-time object detection processor with XNOR-based variable-precision computing unit,” IEEE Trans. Very Large Scale Integration Syst., vol. 31, no. 6, pp. 749-761, Jun. 2023."}]}],"journal":{"issn":"2995-1968","qiKanWangZhi":"//www.sghhindu.com/www.qk/ics","qiKanMingCheng_CN":"Integrated Circuits and Systems","id":22,"qiKanMingCheng_EN":"Integrated Circuits and Systems"},"authorList":[{"deceased":false,"xref":"¹","orcid":"0009-0006-2067-1591","name_cn":"TING YUE","xref_en":"¹","name_en":"TING YUE"},{"deceased":false,"xref":"¹","orcid":"0000-0002-6685-5576","name_cn":"LIANG CHANG","email":"liangchang@uestc.edu.cn","xref_en":"¹","name_en":"LIANG CHANG"},{"deceased":false,"xref":"²","orcid":"0000-0002-0243-6516","name_cn":"HAOBO XU","xref_en":"²","name_en":"HAOBO XU"},{"deceased":false,"xref":"³","orcid":"0000-0002-7606-0347","name_cn":"CHENGZHI WANG","xref_en":"³","name_en":"CHENGZHI WANG"},{"deceased":false,"xref":"¹","orcid":"0000-0003-2296-8146","name_cn":"SHUISHENG LIN","xref_en":"¹","name_en":"SHUISHENG LIN"},{"deceased":false,"xref":"¹","orcid":"0000-0003-2098-9621","name_cn":"JUN ZHOU","email":"zhouj@uestc.edu.cn","xref_en":"¹","name_en":"JUN ZHOU"}],"affList_en":["¹ School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China","² Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China","³ National Innovation Institute of Defense Technology, Academy of Management Studies, Beijing 100071, China"],"authorNotes_en":["⁺ CORRESPONDING AUTHOR: LIANG CHANG; JUN ZHOU (e-mail: liangchang@uestc.edu.cn; zhouj@uestc.edu.cn)."],"fundList_en":["the National Natural Science Foundation of China(62104025, 62104229, 62104259)"],"backFnGroupList":[{}],"article":{"juan":"1","endNoteUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/article/getTxtFile.do?fileType=EndNote&id=49230","bibtexUrl_cn":"//www.sghhindu.com/www.qk/ics/CN/article/getTxtFile.do?fileType=BibTeX&id=49230","articleType":"A","abstractUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/10.23919/ICS.2024.3506511","qi":"4","id":49230,"nian":2024,"bianHao":"1736417291956-2120880174","juanUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/Y2024","shouCiFaBuRiQi":"2025-01-09","qiShiYe":"196","accepted":"2024-11-15","received":"2024-10-19","qiUrl_cn":"https://www.qk.sjtu.edu.cn/ics/CN/Y2024/V1/I4","pdfSize":"1693KB","risUrl_cn":"//www.sghhindu.com/www.qk/ics/CN/article/getTxtFile.do?fileType=Ris&id=49230","doi":"10.23919/ICS.2024.3506511","jieShuYe":"205","keywordList_en":["Customized layer fusion","lightweight network","object detection","streaming architecture."],"endNoteUrl_cn":"//www.sghhindu.com/www.qk/ics/CN/article/getTxtFile.do?fileType=EndNote&id=49230","zhaiyao_en":"

The object detection algorithm based on convolutional neural networks (CNNs) significantly enhances accuracy by expanding network scale. As network parameters increase, large-scale networks demand substantial memory resources, making deployment on hardware challenging. Although most neural network accelerators utilize off-chip storage, frequent access to external memory restricts processing speed, hindering the ability to meet the frame rate requirements for embedded systems. This creates a trade-off in which the speed and accuracy of embedded target detection accelerators cannot be simultaneously optimized. In this paper, we propose PODALA, an energy-efficient accelerator developed through the algorithm-hardware co-design methodology. For object detection algorithm, we develop an optimized algorithm combined with the inverse-residual structure and depth wise separable convolution, effectively reducing network parameters while preserving high detection accuracy. For hardware accelerator, we develop a custom layer fusion technique for PODALA to minimize memory access requirements. The overall design employs a streaming hardware architecture that combines a computing array with a refined ping-pong output buffer to execute different layer fusion computing modes efficiently. Our approach substantially reduces memory usage through optimizations in both algorithmic and hardware design. Evaluated on the Xilinx ZCU102 FPGA platform, PODALA achieves 78 frames per second (FPS) and 79.73 GOPS/W energy efficiency, underscoring its superiority over state-of-the-art solutions.

","bibtexUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/article/getTxtFile.do?fileType=BibTeX&id=49230","abstractUrl_cn":"https://www.qk.sjtu.edu.cn/ics/CN/10.23919/ICS.2024.3506511","juanUrl_cn":"https://www.qk.sjtu.edu.cn/ics/CN/Y2024","lanMu_en":"Original article","qiUrl_en":"//www.sghhindu.com/www.qk/ics/EN/Y2024/V1/I4","risUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/article/getTxtFile.do?fileType=Ris&id=49230","title_en":"PODALA: Power-Efficient Object Detection Accelerator With Customized Layer Fusion Engine","revised":"2024-11-08","hasPdf":"true"},"authorList_en":[{"deceased":false,"xref":"¹","orcid":"0009-0006-2067-1591","name_cn":"TING YUE","xref_en":"¹","name_en":"TING YUE"},{"deceased":false,"xref":"¹","orcid":"0000-0002-6685-5576","name_cn":"LIANG CHANG","email":"liangchang@uestc.edu.cn","xref_en":"¹","name_en":"LIANG CHANG"},{"deceased":false,"xref":"²","orcid":"0000-0002-0243-6516","name_cn":"HAOBO XU","xref_en":"²","name_en":"HAOBO XU"},{"deceased":false,"xref":"³","orcid":"0000-0002-7606-0347","name_cn":"CHENGZHI WANG","xref_en":"³","name_en":"CHENGZHI WANG"},{"deceased":false,"xref":"¹","orcid":"0000-0003-2296-8146","name_cn":"SHUISHENG LIN","xref_en":"¹","name_en":"SHUISHENG LIN"},{"deceased":false,"xref":"¹","orcid":"0000-0003-2098-9621","name_cn":"JUN ZHOU","email":"zhouj@uestc.edu.cn","xref_en":"¹","name_en":"JUN ZHOU"}]}">