Authors:
            
                    Bilel Elayeb
                    
                        
                                1
                            
                                ; 
                            
                                2
                            
                    
                    ; 
                
                    Mohamed Firas Ettih
                    
                        
                                3
                            
                    
                     and
                
                    Raja Ayed
                    
                        
                                2
                            
                                ; 
                            
                                4
                            
                    
                    
                
        
        
            Affiliations:
            
                    
                        
                                1
                            
                    
                    Liwa College of Technology, P.O. Box 41009, Abu Dhabi, U.A.E.
                
                    ; 
                
                    
                        
                                2
                            
                    
                    RIADI Research Laboratory, ENSI, Manouba University, Tunisia
                
                    ; 
                
                    
                        
                                3
                            
                    
                    Université Paris-Est Créteil, Paris 12 Val de Marne, France
                
                    ; 
                
                    
                        
                                4
                            
                    
                    Faculty of Economics and Management of Nabeul, Carthage University, Tunisia
                
        
        
        
        
        
             Keyword(s):
            Morphological Disambiguation, Arabic Text, Machine-Learning Algorithms, Data Transformation, Morphological Feature, Classification.
        
        
            
                
                
            
        
        
            
                Abstract: 
                Arabic language is characterized by its complexity and its morphological and orthographic variations including syntactic and semantic diversity of a word. This specificity may cause Arabic morphological ambiguity. We present in this paper a new architecture for morphological disambiguation of Arabic texts. The latter can be treated as a classification problem where the set of morphological features’ values represent classes, and a classification algorithm is used to assign a class to each word’s occurrence based on the context. The first step consists of identifying the correct morphological analysis of a non-vocalized Arabic word using the morphological dependencies extracted from the corpus of vocalized texts. Then, we propose a method of transforming imperfect training datasets into perfect data having precise attributes and certain classes. We experiment this architecture on a set of machine-learning classifiers using a corpus of classic Arabic texts. Results highlight some stati
                stically significant improvement of SVM and Naïve Bayes classifiers in terms of disambiguation rate.
                (More)