Exact and Approximate Rule Extraction from Neural Networks with Boolean Features

Fawaz Mereani, Jacob Howe


Rule extraction from classifiers treated as black boxes is an important topic in explainable artificial intelligence (XAI). It is concerned with finding rules that describe classifiers and that are understandable to humans, having the form of (I f...Then...Else). Neural network classifiers are one type of classifier where it is difficult to know how the inputs map to the decision. This paper presents a technique to extract rules from a neural network where the feature space is Boolean, without looking at the inner structure of the network. For such a network with a small feature space, a Boolean function describing it can be directly calculated, whilst for a network with a larger feature space, a sampling method is described to produce rule-based approximations to the behaviour of the network with varying granularity, leading to XAI. The technique is experimentally assessed on a dataset of cross-site scripting (XSS) attacks, and proves to give very high accuracy and precision, comparable to that given by the neural network being approximated.


Paper Citation