PDF(1693 KB)
The object detection algorithm based on convolutional neural networks (CNNs) significantly enhances accuracy by expanding network scale. As network parameters increase, large-scale networks demand substantial memory resources, making deployment on hardware challenging. Although most neural network accelerators utilize off-chip storage, frequent access to external memory restricts processing speed, hindering the ability to meet the frame rate requirements for embedded systems. This creates a trade-off in which the speed and accuracy of embedded target detection accelerators cannot be simultaneously optimized. In this paper, we propose PODALA, an energy-efficient accelerator developed through the algorithm-hardware co-design methodology. For object detection algorithm, we develop an optimized algorithm combined with the inverse-residual structure and depth wise separable convolution, effectively reducing network parameters while preserving high detection accuracy. For hardware accelerator, we develop a custom layer fusion technique for PODALA to minimize memory access requirements. The overall design employs a streaming hardware architecture that combines a computing array with a refined ping-pong output buffer to execute different layer fusion computing modes efficiently. Our approach substantially reduces memory usage through optimizations in both algorithmic and hardware design. Evaluated on the Xilinx ZCU102 FPGA platform, PODALA achieves 78 frames per second (FPS) and 79.73 GOPS/W energy efficiency, underscoring its superiority over state-of-the-art solutions.
","bibtexUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/article/getTxtFile.do?fileType=BibTeX&id=49230","abstractUrl_cn":"https://www.qk.sjtu.edu.cn/ics/CN/10.23919/ICS.2024.3506511","juanUrl_cn":"https://www.qk.sjtu.edu.cn/ics/CN/Y2024","lanMu_en":"Original article","qiUrl_en":"//www.sghhindu.com/www.qk/ics/EN/Y2024/V1/I4","risUrl_en":"https://www.qk.sjtu.edu.cn/ics/EN/article/getTxtFile.do?fileType=Ris&id=49230","title_en":"PODALA: Power-Efficient Object Detection Accelerator With Customized Layer Fusion Engine","revised":"2024-11-08","hasPdf":"true"},"authorList_en":[{"deceased":false,"xref":"1","orcid":"0009-0006-2067-1591","name_cn":"TING YUE","xref_en":"1","name_en":"TING YUE"},{"deceased":false,"xref":"1","orcid":"0000-0002-6685-5576","name_cn":"LIANG CHANG","email":"liangchang@uestc.edu.cn","xref_en":"1","name_en":"LIANG CHANG"},{"deceased":false,"xref":"2","orcid":"0000-0002-0243-6516","name_cn":"HAOBO XU","xref_en":"2","name_en":"HAOBO XU"},{"deceased":false,"xref":"3","orcid":"0000-0002-7606-0347","name_cn":"CHENGZHI WANG","xref_en":"3","name_en":"CHENGZHI WANG"},{"deceased":false,"xref":"1","orcid":"0000-0003-2296-8146","name_cn":"SHUISHENG LIN","xref_en":"1","name_en":"SHUISHENG LIN"},{"deceased":false,"xref":"1","orcid":"0000-0003-2098-9621","name_cn":"JUN ZHOU","email":"zhouj@uestc.edu.cn","xref_en":"1","name_en":"JUN ZHOU"}]}">
PODALA: Power-Efficient Object Detection Accelerator With Customized Layer Fusion Engine
TING YUE, LIANG CHANG, HAOBO XU, CHENGZHI WANG, SHUISHENG LIN, JUN ZHOU
Integrated Circuits and Systems››2024, Vol. 1››Issue (4): 196-205.
PDF(1693 KB)
PDF(1693 KB)
PODALA: Power-Efficient Object Detection Accelerator With Customized Layer Fusion Engine
({{custom_author.role_en}}),{{javascript:window.custom_author_en_index++;}}| {{custom_ref.label}} |
{{custom_citation.content}}
{{custom_citation.annotation}}
|
/
| 〈 | 〉 |