A 32×32 16-bit integer matrix multiplication hardware acceleration design and evaluation on 45nm PDK

Authors

  • Phan Hong Minh (Corresponding Author) Academy of Military Science and Technology
  • Nguyen Manh Cuong Academy of Military Science and Technology
  • Nguyen Truong Son Information Technology Office, Military Region 2

DOI:

https://doi.org/10.54939/1859-1043.j.mst.IITE.2025.45-53

Keywords:

MAC; Hardware acceleration; Cadence EDA tools; FreePDK45nm CMOS; VLSI; RTL to GDSII.

Abstract

This paper presents the design and implementation of a 32×32 matrix multiplier using 16-bit integer data, targeted for hardware acceleration applications. The design is described in VHDL and synthesized using Cadence EDA toolchain with a FreePDK45nm CMOS process for ASIC implementation. The proposed architecture employs pipelining and parallelism techniques to optimize speed and power consumption. Post-layout Place and Rout results demonstrate that the design achieves a maximum operating frequency of 200 MHz, occupies an area of 107,240 μm², and consumes 350.24 mW of power under typical conditions. The experimental results validate the feasibility of the design for high-performance embedded systems, edge devices and digital signal processing applications.

References

[1]. Thejaswini, Gautham Suresh, Chiraag, and Sukumar Nandi. “Approximate CNN Hardware Accelerators for Resource Constrained Devices.” IEEE Access, (2025). DOI: 10.1109/ACCE-SS.2025.3529668

[2]. Bhajantri, and Hiremath. “Design of Area and Power Efficient MAC Architecture Using CNN for DSP Applications.” International Journal of Intelligent Systems and Applications in Engineering (IJISAE), 12(14s), 141–147, (2024).

[3]. Zhi-Gang Liu, Paul N. Whatmough, and Matthew Mattina. “Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration.” arXiv preprint arXiv:2009.02381v2 [cs.AR], (2020).

[4]. Kevin Kiningham, Michael Graczyk, and Athul Ramkumar. “Design and Analysis of a Hardware CNN Accelerator.” Computer Science, Engineering, Stanford University, (2017).

[5]. Cristina Silvano, et al. “A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms.” arXiv preprint arXiv:2306.15552v3 [cs.AR], (2025).

[6]. James Garland and David Gregg. “Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks.” IEEE Computer Architecture Letters, 16(2), (2017).

[7]. Hongxiang Fan, Martin Feriancy, et al. “Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator.” arXiv preprint arXiv:2111.12787v1 [cs.LG], (2021).

[8]. Ehab M. Ibrahim, Linyan Mei, and Marian Verhelst. “Survey and Benchmarking of Precision-Scalable MAC Arrays for Embedded DNN Processing.” arXiv preprint arXiv:2108.04773v1 [cs.DC], (2021).

[9]. Cadence Design Systems, Inc. “Innovus User Guide.” United States of America, (2022).

[10]. Cadence Design Systems, Inc. “Genus User Guide for Legacy UI.” United States of America, (2018).

[11]. FreePDK45 Process Design Kit. [Online]. Available: https://eda.ncsu.edu/freepdk/freepdk45/

Downloads

Published

30-10-2025

How to Cite

[1]
Phan Hong Minh, Nguyen Manh Cuong, and Nguyen Truong Son, “A 32×32 16-bit integer matrix multiplication hardware acceleration design and evaluation on 45nm PDK”, JMST, no. IITE, pp. 45–53, Oct. 2025.

Issue

Section

Electronic Engineering