Exact sciences | Info for companies | University of Antwerp

Teaching intelligent robots the art of delegation

Leveraging parallelization for reinforcement learning systems in distributed environments.

The University of Antwerp has developed a novel computer procedure that allows the deployment of parallel reinforcement learning systems in distributed computing infrastructure with no prior domain knowledge and low communication overhead. The procedure can be used by companies active in the domains of smart cities, smart cars, smart communication, robotics, and smart factory sectors.

Software agents that can learn how to solve a given problem by only interacting with a physical or simulated environment is a fundamental step towards the creation of smart and autonomous systems. This approach is called Reinforcement Learning (RL). However, RL algorithms with a single agent are often impractical in problems with many system states and actions. To solve this problem, a new procedure developed by the University of Antwerp leverages parallelization and scalability of the learning task by using multiple agents with no-prior knowledge of the problem structure and with reduced communication overhead when running on distributed infrastructure.

Video: Teaching intelligent robots the art of delegation

Situation before

Reducing the number of iterations that a reinforcement learning algorithm requires to solve a given task is achieved by partitioning the problem (state-action space) into smaller sub-problems and then using independent software agents to solve each sub-problem in parallel. If the partition is optimal, i.e., the number of states that are shared by multiple sub-problems is minimal, then agents will not need to interact among them, and the actual execution time is also reduced by having a low communication overhead. However, finding an optimal partitioning is a hard-computational problem that traditionally is simplified by using experts with domain knowledge that perform the partitioning manually. As a result, the usability of Parallel Reinforcement Learning algorithms to solve problems with a large state-action space is nowadays very limited.

Technology

State-action space Dynamic Partitioning for Parallel Reinforcement Learning (DynPaRL) is a novel procedure to leverage automatic partitioning of large Q-Tables in distributed environments that does not require a priori knowledge of the problem structure and minimizes the agents' communication overhead over time. The Q-Table, i.e., the centralized storage where software agents store the shared experience while learning in parallel, becomes a bottleneck when it is centralized, and the number of software agents updating it increases.

DynPaRL is the result of an intelligent combination of a dynamic partitioning strategy with an efficient heuristic that performs a co-allocation of processing, i.e., the Reinforcement Learning (RL) software agent, and storage, i.e., partitions of the Q-table, that minimizes the communication among agents. DynPaRL procedure uses the agent's exploration capabilities to build the necessary domain knowledge automatically to divide the state-action space into multiple small and loosely-coupled partitions and assign a partition to the agent that has the best affinity to it, i.e., the one that exploits it the most and reduces the communication overhead.

DynPaRL can be integrated transparently into any parallel implementation of Table-based Reinforcement Learning since it does not affect the convergence guarantee of the RL algorithm running in the software agent. In fact, DynPaRL ensures that software agents that are running in distributed environments and solving single RL tasks in parallel, represented by having a unique Q-Table, will incur almost no communication when the agents converge to a solution scalable in the number of RL agents solving the problem.

A patent application was published under number WO2020/074689

DynPaRL is especially advantageous for any company interested in developing new smart applications where learning by interacting with an environment has to be fast and scalable in distributed environments. E.g. smart cities, smart cars, smart communication, robotics, and smart industry.

About the researchers - research group

DynPaRL is the result of a combined team effort of IDlab (researchers: Miguel Camelo, Maxim Claeys & Prof. Steven Latré) to develop state-of-the-art techniques that enhance the efficiency of RL algorithms. IDLab (http://idlab.technology/) is a core research group of IMEC. Part of IDLab's research activities are embedded in the University of Antwerp within the departments of Computer Science and Electronics-ICT (faculty of Applied Engineering). IDLab-Antwerp has an extensive track record in the domains of wireless networking and artificial intelligence.

More information

University of Antwerp

Valorisation Office

Middelheimlaan 1

2020 Antwerp - Belgium

Valorisatie@uantwerpen.be