Explainable Artificial Intelligence (XAI) and interpretable machine learning with kLime+ELI5+SHAP+InterpretML
In machine learning complex model has big issue with transparency, we don’t have any strong prove why model give that prediction and which feature are impacting the model prediction, which features are strongly contributing, and which are negative contribution for model prediction. By feature importance graph we can see which features importance by passing complete training and test dataset, but for single row of features or for any given instance it is very difficult to understand why and how model predict output.
Linear model are simple and easy to understand by human but complex model such as deep learning, neural network are more complex and truly black box model. In such case model prediction cannot be blindly acceptable even it has high accuracy we always have a Question Why should I trust on model prediction.
This Article is divided in two part :
Interpretable machine learning Theory
Explainability vs Interpretability
Interpretability : means you are able to predict what is going to happen by changing the inputs. Interpretable framework can go in depth in ML model and see what is going on behind the black box.
Explainability (XAI) : Explain deep ML model in terms of human understandability. Explainability is able to quite literally explain what is happening.
In other words, we can say that Machine Learning model available today capable of showing great accuracy but not able to give any proper justification of its predictions. Hence the following Question arises while dealing with these Machine Learning algorithms: –
 Why should I trust on model prediction?
 How to interpret our model’s failure and when to disregard our model’s output?
 When does our model carried out prediction as opposed to others?
 What are the key factors that made our ML Model a success?
Basically, Most of the Machine Learning models are referred to as blackboxes in terms of interpretability.
So Model explainability in turm of human understanding has high priority challenge in today machine learning community. If as a ML&AI model builder you don’t know what your model is predicting it is very difficult to sale that product because if you don’t trust model how your client trust and invest money. In order to build accuracy we can do kfold cross validation, feature engineering, model parameter tuning, bias and variance tuning but it does not help to understand why some prediction are correct and some went wrong, which features value help to make model prediction correct.
Top 10 most popular Explainable and Interpretable technique.
In this Article will cover top 4 Interpretable framework handson and try to understand and compare their advantages and disadvantages and python code.
What is LIME ?
LIME ( Local Interpretable Modelagnostic Explanations ) is a explanation technique that explains the prediction of any machine learning classifier in an interpretable and faithful manner by learning a interpretable model locally around the prediction.
Why LIME is so popular?
We want an explainer which is faithful and able to explain model output to even nonexpert guy. LIME has below properties which make its faithful: –
 Local :
 Interpretable
 Model Agnostic
 Human Explanation
LIME is a local surrogated model which normally use Linear regression or decision tree model to explain the prediction at local boundary. In advance You need to select the K number which is kernel weight and number of features the lower K value easier it is to interpret the model, and the Higher K value produces models with higher fidelity. LIME currently uses an exponential smoothing kernel to define the neighbourhood. A smoothing kernel is a function that takes two data instances and returns a proximity measure. The kernel width determines how large the neighbourhood is: A small kernel width means that an instance must be very close to influence the local model, a larger kernel width means that instances that are farther away also influence the model. Kernel width is 0.75 times the square root of the number of columns of the training data.
The general approach LIME takes to achieving goal is as follows:
 For each prediction to explain, permute the observation n times.
 Let the complex model predict the outcome of all permuted observations.
 Calculate the distance from all permutations to the original observation.
 Convert the distance to a similarity score.
 Select m features best describing the complex model outcome from the permuted data.
 Fit a simple model to the permuted data, explaining the complex model outcome with the m features from the permuted data weighted by its similarity to the original observation.
 Extract the feature weights from the simple model and use these as explanations for the complex models local behaviour.
LIME Advantages and Disadvantages

Advantages 
Disadvantages 
1 
LIME is modelagnostic, meaning that it can be applied to any machine learning model. 
For each application you must try different kernel settings and see for yourself if the explanations make sense. 
2 
local interpretability the output of LIME showing the contribution of each feature visualize list format and the prediction for a data sample. it also allows to determine which feature changes will have most impact on the prediction. 
In the current implementation, only linear models are used to explain local behaviour. To a certain extent, this approach is correct when looking at a very small region around the data sample. By expanding this region however, it is possible that a linear model might not be powerful enough to explain the behaviour of the complex model. 
3 
LIME is one of the only interpretation techniques that works for tabular, text and image data. 
Lack of stability, consistency. Kernel weight has impact on features contribution. 
4 
LIME Output is very fast even for large dataset. 
If you repeat the sampling process, then explanation may be different for very close data point. 
LIME and SHAP are surrogate models. It means they still use the blackbox machine learning models. They tweak the input slightly and test the changes in prediction. This tweak must be small so that it is still close to the original data point . LIME and SHAP models are surrogate models that model the changes in the prediction (on the changes in the input). For example, if the model prediction does not change much by tweaking the value of a variable, that variable for that particular data point may not be an important predictor.
SHAP
SHAP stands for SHapley Additive exPlanation. SHAPE has additive properties it means sum of SHAP value of all the variables for a data point is equal to final prediction. SHAP build improve the advantage of LIME. In SHAP we not build a local linear module like LIME instead of use some function to calculate shapely value. Shapley value gives guarantees aa fair distribution of each variable. SHAP value is NOT the difference between the prediction with and without a variable, instead it is a contribution of a variable to the difference between the actual prediction and the mean prediction. SHAP use Kernel SHAP and Tree SHAP inspired by local surrogated models to estimation for Shapley values.
SHAP goal is to explain the prediction of an given instance x by computing the contribution of each feature to the prediction. The feature values of a data instance act as players in a coalitional game theory. SHAP prediction output is a fair distribution of all the feature Shapley values. Shapely value is actually distribution, it’s a average of model contribution made by each player(features) over all permutation of player(features).The baseline for Shapley values is the average of all predictions. In the plot, each Shapley value is an arrow that pushes to increase (positive value) or decrease (negative value) the prediction.
LIME Vs SHAP

LIME 
SHAP 
1 
LIME is modelagnostic, and it can be applied to any machine learning model. 
SHAP guarantee properties like consistency and local accuracy. 
2 
LIME assumes that the local model is linear or Decision tree. 
Shapley values consider all possible predictions for an instance using all possible combinations of inputs. 
3 
LIME Output is very fast even for large dataset. 
SHAP value calculation is very time expensive as it checks all the possible combinations. 
4 
LIME value is difference between the prediction with and without a variable 
SHAP value is contribution of a variable to the difference between the actual prediction and the mean prediction. 
ELI5:
ELI5 help to debug machine learning classifier and explain their top prediction via a easy to understand and good visualize way. However, it doesn’t support true modelagnostic interpretations and support for models are mostly limited to treebased and other parametric\linear models. When you want to predict ELI5 does this by showing weights for each feature depicting how influential it might have been in contributing to the final prediction decision across all trees. ELI5 provides an independent implementation of this algorithm for XGBoost and most scikitlearn tree ensembles which is definitely on the path towards modelagnostic interpretation but not purely modelagnostic like LIME.
The prediction can be described as the sum of the feature donations + the “bias” (i.e. the mean given by the topmost region that covers the entire training set).
InterpretML
InterpretML is a open source python package developed by Microsoft team by using Plotly Scikitlearn, LIME, SHAP, Salib, Treeinterprater, joblib and other packeges for training interpretable machine learning models and explaining black box model. Till now we seen accuracy is main concern for interpretable machine learning models, and most accurate model is not well explainable, So Microsoft developed an algorithm called the Explainable Boosting Machine (EBM) which use bagging and boosting technique by which has both high accuracy and interpretability.