对抗制                        
                
                                
                        
                            任务(项目管理)                        
                
                                
                        
                            强化学习                        
                
                                
                        
                            模仿                        
                
                                
                        
                            人气                        
                
                                
                        
                            计算机科学                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            基线(sea)                        
                
                                
                        
                            过程(计算)                        
                
                                
                        
                            功能(生物学)                        
                
                                
                        
                            机器学习                        
                
                                
                        
                            心理学                        
                
                                
                        
                            社会心理学                        
                
                                
                        
                            工程类                        
                
                                
                        
                            海洋学                        
                
                                
                        
                            操作系统                        
                
                                
                        
                            生物                        
                
                                
                        
                            地质学                        
                
                                
                        
                            进化生物学                        
                
                                
                        
                            系统工程                        
                
                        
                    
            作者
            
                Guangyu Xiang,Shaodong Li,Feng Shuang,Fang Gao,Xiaogang Yuan            
         
                    
            出处
            
                                    期刊:IEEE robotics and automation letters
                                                                        日期:2024-02-14
                                                        卷期号:9 (4): 3179-3186
                                                
         
        
    
            
            标识
            
                                    DOI:10.1109/lra.2024.3366023
                                    
                                
                                 
         
        
                
            摘要
            
            Adversarial Inverse Reinforcement Learning (AIRL) has gained popularity as an alternative to supervised imitation learning, addressing the distributional bias issue of the latter. However, it still faces significant challenges in long-horizon tasks due to the lack of effective exploration. In our letter, we demonstrate that standard AIRL strategies end exploration prematurely during online reinforcement learning and fail to learn the entire task due to their inability to fully conform to the expert distribution, which is particularly detrimental to real-world robots. To address these challenges, we introduce the SC-AIRL approach. It decomposes long-horizon tasks into logical subtasks which reduces the agent's need for rich exploration. SC-AIRL utilizes expert demonstrations for performing multiple subtasks and shares a single critic and identical reward function across different subtask trainings. Additionally, we incorporate a human intervention mechanism during the subtask learning process to keep exploration from ending prematurely. Our experiments in challenging robot manipulation tasks demonstrate that SC-AIRL outperforms our baselines significantly. Furthermore, we conduct an exploratory experiment and an empirical analysis, emphasizing the potential of the model to manage complex tasks and the advantages of SC-AIRL over the baseline, respectively.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI