Reinforcement learning for closed-loop optimisation of spatiotemporal stimulation in patterned neuronal networks

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Reinforcement learning for closed-loop optimisation of spatiotemporal stimulation in patterned neuronal networks

Authors

Maurer, B.; Vasiliauskaite, V.; Hengsteler, J.; Cathomen, G.; Ruff, T.; Schmid, C.; Vörös, J.; Ihle, S. J.

Abstract

Understanding how neuronal circuits transform inputs into outputs requires systematic perturbation under controlled conditions. In vitro neuronal networks can be cultured on microelectrode arrays (MEAs) that allow stimulation and recording, and microfluidic patterning can constrain network topology to yield stable stimulation-evoked responses. Yet, the space of possible spatiotemporal stimulation patterns remains too large for exhaustive exploration. Additionally, the evoked responses depend on prior stimulation history. Here, we embedded topologically constrained biological neuronal networks in a closed-loop reinforcement learning (RL) framework that sends electrical stimuli to the MEA and evaluates the evoked responses to efficiently identify stimulation patterns that evoke specific target activity motifs. We extend inkube, a low-cost, open-source electrophysiology system from off-the-shelf components, with closed-loop electrophysiology. This enables reliable delivery of stimulation at single-sample precision with millisecond-range round-trip times. It also allows independent RL agents to control multiple networks simultaneously. We first demonstrated that stimulation-evoked responses in engineered recurrent networks were stable and separable across the action space over hours of continuous operation. We characterised the dependency of responses on prior stimulation history, finding state dependence in a subset of stimulus pairs. We then benchmarked different RL agents, multi-armed bandits (MABs) and linear contextual bandits (LCBs), on the task of identifying stimulation patterns that maximise the length of clockwise-circular firing sequences. All agents learned to improve reward during training with respect to random stimulation. Agents converged on non-trivial stimulation patterns that span the full action space rather than mirroring the target motif. LCBs exploited the identified state dependence through action switching, yielding measurable reward benefits for specific action pairs, though this did not translate into overall performance gains over MABs. All hardware designs and software are publicly available, providing an accessible platform for goal-directed functional characterisation of engineered neuronal networks at single-spike resolution.

Follow Us on

0 comments

Add comment