Abstract:Abstract:In order to improve the efficiency and convergence speed of reinforcement learning in multirobot behavior optimal decision making control, the distributed Markov modeling and control strategy for multirobots are studied in this paper. According to the limited perception ability of the robots, an individualcooperative trigger perception function is designed. The individual robot calculates the individualcooperative trigger response probability from the environment observation results, and defines that after a trigger process the joint strategy calculation starts, which reduces the communication amount and computing resources among robots. The Qlearning algorithm is improved through introducing the duallearning rate strategy, which is applied to the behavior decisionmaking of robots. The simulation experiment results show that the algorithm proposed in this paper has quite high cooperative efficiency when the number of robots in the group is about 20. The unit time step ratio is 1085 0. At the same time, the distance adjustment parameter η has an influence on the cooperative search efficiency of the robot. When η is 0008, the required moving time step ratio and average moving distance can reach minimum. Through introducing the double learning rate, the proposed algorithm possesses higher learning efficiency and applicability compared with the reinforcement learning algorithm based on environment model, the average performance improvement reaches about 35%. The proposed algorithm has a high theoretical significance and application value for improving the autonomous cooperative ability of multirobots. .txt