Authors:
            
                    Theodora-Augustina Drăgan
                    
                        
                                1
                            
                    
                    ; 
                
                    Maureen Monnet
                    
                        
                                1
                            
                    
                    ; 
                
                    Christian Mendl
                    
                        
                                2
                            
                                ; 
                            
                                3
                            
                    
                     and
                
                    Jeanette Lorenz
                    
                        
                                1
                            
                    
                    
                
        
        
            Affiliations:
            
                    
                        
                                1
                            
                    
                    Fraunhofer Institute for Cognitive Systems IKS, Munich, Germany
                
                    ; 
                
                    
                        
                                2
                            
                    
                    Technical University of Munich, Department of Informatics, Boltzmannstraße 3, 85748 Garching, Germany
                
                    ; 
                
                    
                        
                                3
                            
                    
                    Technical University of Munich, Institute for Advanced Study, Lichtenbergstraße 2a, 85748 Garching, Germany
                
        
        
        
        
        
             Keyword(s):
            Quantum Reinforcement Learning, Proximal Policy Optimization, Parametrizable Quantum Circuits, Frozen Lake, Expressibility, Entanglement Capability, Effective Dimension.
        
        
            
                
                
            
        
        
            
                Abstract: 
                Quantum reinforcement learning (QRL) models augment classical reinforcement learning schemes with quantum-enhanced kernels. Different proposals on how to construct such models empirically show a promising performance. In particular, these models might offer a reduced parameter count and shorter times to reach a solution than classical models. It is however presently unclear how these quantum-enhanced kernels as subroutines within a reinforcement learning pipeline need to be constructed to indeed result in an improved performance in comparison to classical models. In this work we exactly address this question. First, we propose a hybrid quantum-classical reinforcement learning model that solves a slippery stochastic frozen lake, an environment considerably more difficult than the deterministic frozen lake. Secondly, different quantum architectures are studied as options for this hybrid quantum-classical reinforcement learning model, all of them well-motivated by the literature. They a
                ll show very promising performances with respect to similar classical variants. We further characterize these choices by metrics that are relevant to benchmark the power of quantum circuits, such as the entanglement capability, the expressibility, and the information density of the circuits. However, we find that these typical metrics do not directly predict the performance of a QRL model.
                (More)