Abstract:
Novel compute-in-memory architectures, which execute computations directly within memory arrays, present a promising solution to overcoming the von Neumann bottleneck. This study focuses on enhancing the computational precision of PLRAM for neural network inference. By refining the device structure, the linear region of the transistor
Id–Vd output characteristics has been substantially extended. Experimental evaluations from neural network tasks indicate that, under comparable energy consumption, the improved flash chip achieves a more concentrated inference error distribution and consistently superior inference accuracy across all network layers. The overall inference accuracy attains 94.6%, marking an enhancement of approximately 4.5% over conventional flash chips. These findings demonstrate that extending the linear operating region effectively mitigates nonlinear error accumulation during forward inference, substantially enhances the reliability of analog computation, and improves inference accuracy. This research offers a novel device-level optimization approach for the development of future high-accuracy and low-power neural network accelerators.