Research on Bio-inspired Self-balancing Control Based on LIF Network
Keywords:
Spiking Neural Network, Dopamine-modulated Synaptic Plasticity, Autonomous learning, RewardAbstract
Human balance is a skill gradually established through a sensory-action-feedback loop, relying on repetitive training, trial-and-error mechanisms, and the dynamic plasticity of synaptic connections. In this process, sensory signals are continuously transmitted to the central nervous system, where stable motor paths are formed through learning, enabling action reuse without complex calculations. Inspired by this mechanism, this paper proposes a balance learning method based on brain-like spiking neural networks and dopamine-modulated synaptic plasticity for self-learning control of the classic inverted pendulum system. The method connects the one-hot encoded sensory neuron group with motor neurons and utilizes a reward-driven synaptic weight update mechanism to gradually master the stable control of the inverted pendulum without the need for prior models or training data. Unlike traditional control algorithms such as PID or LQR, this approach features biological realism, strong adaptability, and self-organizing behavior, providing a new perspective on bio-inspired learning strategies for artificial intelligence in continuous control tasks.
References
Baydin, A. G., Pearlmutter, B. A., Syme, D., Wood, F., & Torr, P. (2022). Gradients without backpropagation. arXiv preprint arXiv:2202.08587.
Burms, J., Caluwaerts, K., & Dambre, J. (2015). Reward-modulated Hebbian plasticity as leverage for partially embodied control in compliant robotics. Frontiers in neurorobotics, 9, 9.
Barto, A. G. (2019). Reinforcement learning: Connections, surprises, and challenge. AI Magazine, 40(1), 3-15.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540.
Chen, W. (2022). Neural circuits provide insights into reward and aversion. Frontiers in Neural Circuits, 16, 1002485.
Chevtchenko, S. F., Bethi, Y., Ludermir, T. B., & Afshar, S. (2024, June). A neuromorphic architecture for reinforcement learning from real-valued observations. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE.
Barto, A. G., Sutton, R. S., & Anderson, C. W. (2012). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics, (5), 834-846.
Doya, K. (2000). Complementary roles of basal ganglia and cerebellum in learning and motor control. Current opinion in neurobiology, 10(6), 732-739.
Gerstner, W., Kistler, W. M., Naud, R., & Paninski, L. (2014). Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press.
Gewaltig, M. O., & Diesmann, M. (2007). Nest (neural simulation tool). Scholarpedia, 2(4), 1430.
Jitsev, J., Abraham, N., Morrison, A., & Tittgemeyer, M. (2012, September). Learning from delayed reward und punishment in a spiking neural network model of basal ganglia with opposing D1/D2 plasticity. In International Conference on Artificial Neural Networks (pp. 459-466). Berlin, Heidelberg: Springer Berlin Heidelberg.
Mathews, M. A., Camp, A. J., & Murray, A. J. (2017). Reviewing the role of the efferent vestibular system in motor and vestibular circuits. Frontiers in Physiology, 8, 552.
Markov, B., & Koprinkova-Hristova, P. (2024, September). Reinforcement Learning Control of Cart Pole System with Spike Timing Neural Network Actor-Critic Architecture. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications (pp. 54-63). Cham: Springer Nature Switzerland.
Shim, M. S., & Li, P. (2017, May). Biologically inspired reinforcement learning for mobile robot collision avoidance. In 2017 International Joint Conference on Neural Networks (IJCNN) (pp. 3098-3105). IEEE.
Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral cortex, 17(10), 2443-2452.
Akl, M., Sandamirskaya, Y., Ergene, D., Walter, F., & Knoll, A. (2022, July). Fine-tuning deep reinforcement learning policies with r-stdp for domain adaptation. In Proceedings of the International Conference on Neuromorphic Systems 2022 (pp. 1-8).
Bing, Z., Meschede, C., Huang, K., Chen, G., Rohrbein, F., Akl, M., & Knoll, A. (2018, May). End to end learning of spiking neural network based on r-stdp for a lane keeping vehicle. In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 4725-4732). IEEE.
Han, Z., Chen, N., Xu, J., & Li, W. (2021). Research on intelligent control of inverted pendulum based on BP neural network. Experimental Technology and Management, 38(06), 101-106.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Dev, A., Chowdhury, K. R., & Schoen, M. P. (2024, May). Q-learning based Control for Swing-up and Balancing of Inverted Pendulum. In 2024 Intermountain Engineering, Technology and Computing (IETC) (pp. 209-214). IEEE.
Shianifar, J., Schukat, M., & Mason, K. (2024, June). Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control. In International Conference on Practical Applications of Agents and Multi-Agent Systems (pp. 293-304). Cham: Springer Nature Switzerland.
Bhourji, R. S., Mozaffari, S., & Alirezaee, S. (2024). Reinforcement learning DDPG–PPO agent-based control system for rotary inverted pendulum. Arabian Journal for Science and Engineering, 49(2), 1683-1696.
Dawane, M. K., & Malwatkar, G. M. (2025). Theoretical and experimental implementation of PID and sliding mode control on an inverted pendulum system. Bulletin of Electrical Engineering and Informatics, 14(2), 920-930.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529-533.
Wang Haiyi.(2021). 神经网络自适应控制在倒立摆系统中的应用研究[Research on the Application of Neural Network Adaptive Control in Inverted Pendulum System]. Hebei University of Science and Technology. https://doi.org/10.27107/d.cnki.ghbku.2021.000658.[in Chinese]
Li Xinda.(2017). 倒立摆系统的神经网络控制研究[Research on Neural Network Control of Inverted Pendulum System]. Science & Technology Vision. (08), 163-164. https://doi.org/10.19694/j.cnki.issn2095-2457.2017.08.116. [in Chinese]
Sboev, A., Vlasov, D., Rybka, R., Davydov, Y., Serenko, A., & Demin, V. (2021). Modeling the dynamics of spiking networks with memristor-based STDP to solve classification tasks. Mathematics, 9(24), 3237.
Hasegan, D., Deible, M., Earl, C., D’Onofrio, D., Hazan, H., Anwar, H., & Neymotin, S. A. (2021). Multi-timescale biological learning algorithms train spiking neuronal network motor control. bioRxiv, 2021-11.
Fernández, J. G., Ahmad, N., & van Gerven, M. (2025). Noise-based reward-modulated learning. arXiv preprint arXiv:2503.23972.
Vlasov, D. S., Rybka, R. B., Serenko, A. V., & Sboev, A. G. (2024). Spiking Neural Network Actor–Critic Reinforcement Learning with Temporal Coding and Reward-Modulated Plasticity. Moscow University Physics Bulletin, 79(Suppl 2), S944-S952.
Yang, Z., Guo, S., Fang, Y., & Liu, J. K. (2022). Biologically plausible variational policy gradient with spiking recurrent winner-take-all networks. arXiv preprint arXiv:2210.13225.
Liu, Y., & Pan, W. (2023). Spiking neural-networks-based data-driven control. Electronics, 12(2), 310.
Vlasov, D., Rybka, R., Sboev, A., Serenko, A., Minnekhanov, A., & Demin, V. (2022, September). Reinforcement learning in a spiking neural network with memristive plasticity. In 2022 6th Scientific School Dynamics Of Complex Networks And Their Applications (DCNA) (pp. 300-302). IEEE.
Rodriguez-Garcia, A., Mei, J., & Ramaswamy, S. (2024). Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling. arXiv preprint arXiv:2407.04525.
Feng, H., & Zeng, Y. (2022). A brain-inspired robot pain model based on a spiking neural network. Frontiers in Neurorobotics, 16, 1025338.
Mozafari, M., Kheradpisheh, S. R., Masquelier, T., Nowzari-Dalini, A., & Ganjtabesh, M. (2018). First-spike-based visual categorization using reward-modulated STDP. IEEE transactions on neural networks and learning systems, 29(12), 6178-6190.
Fife, T. D. (2010). Overview of anatomy and physiology of the vestibular system. Handbook of Clinical Neurophysiology, 9, 5-17.
Goldberg, J. M. (2012). The vestibular system: a sixth sense. Oxford University Press, USA.
Oteiza, P., & Baldwin, M. W. (2021). Evolution of sensory systems. Current Opinion in Neurobiology, 71, 52-59.
Latash, M. L., Levin, M. F., Scholz, J. P., & Schöner, G. (2010). Motor control theories and their applications. Medicina (Kaunas, Lithuania), 46(6), 382.
Imai, T., Moore, S. T., Raphan, T., & Cohen, B. (2001). Interaction of the body, head, and eyes during walking and turning. Experimental brain research, 136(1), 1-18.
Zeff, S., Weir, G., Hamill, J., & van Emmerik, R. (2022). Head control and head-trunk coordination as a function of anticipation in sidestepping. Journal of Sports Sciences, 40(8), 853-862.
Thao, N. G. M., Nghia, D. H., & Phuc, N. H. (2010, October). A PID backstepping controller for two-wheeled self-balancing robot. In International Forum on Strategic Technology 2010 (pp. 76-81). IEEE.
Houk, J. C., & Adams, J. L. (1995). 13 a model of how the basal ganglia generate and use neural signals that. Models of information processing in the basal ganglia, 249.
Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral cortex, 17(10), 2443-2452.
Izhikevich, E. M., Gally, J. A., & Edelman, G. M. (2004). Spike-timing dynamics of neuronal groups. Cerebral cortex, 14(8), 933-944.
Akl, M., Sandamirskaya, Y., Ergene, D., Walter, F., & Knoll, A. (2022, July). Fine-tuning deep reinforcement learning policies with r-stdp for domain adaptation. In Proceedings of the International Conference on Neuromorphic Systems 2022 (pp. 1-8).
