面向高精度推理的PLRAM存算一体化芯片设计与实现

Enhancing Inference Accuracy through Linear-Region Optimization in the PLRAM Compute-in-Memory Chip

  • 摘要: 新型存算一体架构通过在存储阵列内直接执行计算操作,有望突破冯·诺依曼架构的能效瓶颈。聚焦于提升可编程线性随机存取存储器(programmable linearrandom-access memory, PLRAM)在神经网络推理中的计算精度。通过优化器件结构,显著拓宽了晶体管Id-Vd输出特性的线性范围。实验数据表明,与传统闪存芯片相比,改进型闪存芯片在相近能耗下实现了更集中的推理误差分布,且各层芯片的推理精度均优于前代芯片,最终推理准确率达到94.6%,较传统闪存芯片提升了约4.5%。结果表明,扩展晶体管的线性工作区能够有效抑制前向推理过程中的非线性误差累积,从而显著提高模拟计算的可靠性与芯片推理精度,为后续高精度低功耗神经网络加速器的设计提供新的器件优化方向。

     

    Abstract: Novel compute-in-memory architectures, which execute computations directly within memory arrays, present a promising solution to overcoming the von Neumann bottleneck. This study focuses on enhancing the computational precision of PLRAM for neural network inference. By refining the device structure, the linear region of the transistor Id–Vd output characteristics has been substantially extended. Experimental evaluations from neural network tasks indicate that, under comparable energy consumption, the improved flash chip achieves a more concentrated inference error distribution and consistently superior inference accuracy across all network layers. The overall inference accuracy attains 94.6%, marking an enhancement of approximately 4.5% over conventional flash chips. These findings demonstrate that extending the linear operating region effectively mitigates nonlinear error accumulation during forward inference, substantially enhances the reliability of analog computation, and improves inference accuracy. This research offers a novel device-level optimization approach for the development of future high-accuracy and low-power neural network accelerators.

     

/

返回文章
返回