Authors:
            
                    Georges Lebboss
                    
                        
                                1
                            
                    
                    ; 
                
                    Gilles Bernard
                    
                        
                                1
                            
                    
                    ; 
                
                    Noureddine Aliane
                    
                        
                                1
                            
                    
                     and
                
                    Mohammad Hajjar
                    
                        
                                2
                            
                    
                    
                
        
        
            Affiliations:
            
                    
                        
                                1
                            
                    
                    LIASD and Paris 8 University, France
                
                    ; 
                
                    
                        
                                2
                            
                    
                    Lebanese University and IUT, Lebanon
                
        
        
        
        
        
             Keyword(s):
            Semantic Relations, Semantic Arabic Resources, Arabic WordNet, Synsets, Arabic Corpus, Data Preprocessing, Word Vectors, Word Classification, Self Organizing Maps.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Artificial Intelligence
                    ; 
                        Biomedical Engineering
                    ; 
                        Biomedical Signal Processing
                    ; 
                        Computational Intelligence
                    ; 
                        Health Engineering and Technology Applications
                    ; 
                        Human-Computer Interaction
                    ; 
                        Learning Paradigms and Algorithms
                    ; 
                        Methodologies and Methods
                    ; 
                        Neural Networks
                    ; 
                        Neurocomputing
                    ; 
                        Neurotechnology, Electronics and Informatics
                    ; 
                        Pattern Recognition
                    ; 
                        Physiological Computing Systems
                    ; 
                        Self-Organization and Emergence
                    ; 
                        Sensor Networks
                    ; 
                        Signal Processing
                    ; 
                        Soft Computing
                    ; 
                        Theory and Methods
                    
            
        
        
            
                Abstract: 
                This paper presents a method aiming to enrich Arabic WordNet with semantic clusters extracted from a large general corpus. As the Arabic language is poor in open digital linguistic resources, we built such a corpus (more than 7.5 billion words) with ad-hoc tools. We then applied GraPaVec, a new method for word vectorization using automatically generated frequency patterns, as well as state-of-the-art Word2Vec and Glove methods. Word vectors were fed to a Self Organizing Map neural network model; the clusterings produced were then compared for evaluation with Arabic WordNet existing synsets (sets of synonymous words). The evaluation yields a F-score of 82.1 % for GrapaVec, 55.1 % for Word2Vec's Skipgram, 52.2 % for CBOW and 56.6 % for Glove, which at least shows the interest of the context that GraPaVec takes into account. We end up by discussing parameters and possible biases.