Abstract:
Spiking Neural Networks (SNNs), recognized as the third generation of neural network models, exhibit substantial potential in the domain of object detection. This potential arises from their biologically inspired spiking mechanisms, event-driven asynchronous computational characteristics, and low-power consumption benefits. This paper presents a systematic review of SNN methodologies for visual object detection, encompassing their biological foundations, neuron models, neural encoding techniques, dataset classifications, and prevailing algorithmic frameworks. Concerning neuron models, the balance between biological plausibility and computational efficiency is analyzed across models such as IF, LIF, Izhikevich, and Hodgkin-Huxley. Regarding encoding mechanisms, input encoding techniques like Poisson encoding and intensity-latency encoding, along with decoding strategies including rate, time, and population decoding, are thoroughly discussed. At the dataset level, the attributes and limitations of four dataset categories—static, neuromorphic conversion, neuromorphic acquisition, and simulation-generated—are systematically examined. For SNN algorithmic frameworks, methods based on ANN-to-SNN conversion and direct training using surrogate gradients are explored in detail, with comparisons of their performance in terms of accuracy, energy efficiency, latency, and hardware compatibility. Finally, this paper delineates future trends in SNN development, including training-hardware co-optimization, innovative architecture design, multimodal expansion, and toolchain ecosystem advancement, offering insights into the research and application of low-power, high-efficiency spiking vision systems.