Machine Learning Interpretability (SHAP)

Table Of Content

1. Intro To SHAP
2. SHAP Summary Plot
3. SHAP Force Plot
4. SHAP Dependence Plot
5. SHAP Waterfall Plot
6. SHAP Interaction Plot
7. SHAP Decision Plot
8. SHAP Beeswarm Plot

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import shap
import lightgbm as lgb
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_regression
from sklearn.feature_selection import mutual_info_classif

# Load and prepare data
df = pd.read_csv('D:/1_Work/4_DataScience/3_ML/1_Data/cleand_df.csv')
df = df.dropna()
X = df.drop('TenYearCHD', axis=1)
y = df['TenYearCHD']
# Set random seed
random_state = np.random.seed(42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)

1. Intro to SHAP

SHAP (SHapley Additive exPlanations) is a powerful framework for interpreting machine learning models, helping data scientists and machine learning practitioners understand the predictions of complex models. It provides a unified approach to explain the output of any machine learning model, making it easier to interpret and trust predictions, especially in high-stakes fields such as healthcare, finance, and legal systems.

What is SHAP?

SHAP is based on Shapley values, a concept from cooperative game theory introduced by Lloyd Shapley in 1953. The idea behind Shapley values is to fairly distribute a "reward" (in the case of machine learning, a prediction) among the contributing features. It assigns each feature a value that reflects its contribution to the final prediction, considering all possible combinations of features.

In machine learning, SHAP values are used to explain the output of any model by breaking down a prediction into the contribution of each individual feature, making it more interpretable. SHAP extends this concept beyond simple models to complex machine learning algorithms such as decision trees, random forests, and neural networks.

Why is SHAP Important?

1. Model Transparency

SHAP helps to demystify black-box machine learning models. While models like deep learning or ensemble methods can be powerful, they often lack interpretability. SHAP provides a transparent explanation for each prediction, which is crucial for understanding why a model made a specific decision.

2. Feature Importance

SHAP values give insights into which features are driving the predictions. By analyzing the magnitude and direction of SHAP values, practitioners can understand which features are most influential and how they impact the model's outcomes. This helps with feature selection and improves model trustworthiness.

3. Fairness and Bias Detection

SHAP helps to identify and mitigate biases in the model. By examining how different features influence predictions, especially in sensitive areas like hiring, lending, or healthcare, users can detect potential biases against certain groups or features, ensuring fairness.

4. Regulatory Compliance

In sectors like healthcare, finance, and law, decision-making must be explainable to meet regulatory requirements. SHAP provides a clear, quantifiable explanation for predictions, which is essential for demonstrating compliance with legal and ethical standards.

5. Improved Model Debugging

SHAP can also help with debugging machine learning models. If a model is behaving unexpectedly, SHAP values can highlight which features are contributing to incorrect predictions, making it easier to identify and fix issues.

Key Benefits of Using SHAP

Model Agnostic: SHAP can be used with any machine learning model, making it a versatile tool for model interpretability.
High Accuracy: SHAP values provide highly accurate explanations, offering precise details on feature contributions.
Fairness Auditing: By understanding how each feature affects the model's predictions, users can audit the model for fairness and correct any unintended biases.
Ease of Use: SHAP integrates seamlessly with popular machine learning libraries like Scikit-learn, XGBoost, and LightGBM, making it accessible for users with varying levels of expertise.

2. SHAP Summary Plot

# Train models
lgb_model = lgb.LGBMRegressor(random_state=random_state)
lgb_model.fit(X_train, y_train)

# Calculate SHAP values
explainer = shap.TreeExplainer(lgb_model)
shap_values = explainer.shap_values(X_test)

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000169 seconds. You can set force_row_wise=true to remove the overhead. And if memory is not enough, you can set force_col_wise=true. [LightGBM] [Info] Total Bins 968 [LightGBM] [Info] Number of data points in the train set: 2926, number of used features: 14 [LightGBM] [Info] Start training from score 0.148667

plt.figure(figsize=(12, 8))
shap.summary_plot(shap_values, X_test, show=False)
plt.title('SHAP Summary Plot')
plt.tight_layout()
plt.show()

Key Elements to Explain

1. Feature Importance Ranking

Features are ordered by their absolute SHAP value impact.
Top features have the largest influence on model predictions.
Position on the y-axis reflects their relative importance.

2. Impact Distribution

Each point represents a single prediction instance.
The spread shows the range of impact for each feature.
Wider spreads indicate variable impacts across different predictions.

3. Color Coding

Red: Represents high feature values.
Blue: Represents low feature values.
Color gradients reveal the relationship direction (positive/negative).

4. Business Insights

Pinpoint key features driving predictions.
Analyze feature value relationships and their effects on outcomes.
Use insights to guide feature engineering and selection strategies.

Common Patterns to Look For:

Clustered colors: Indicate strong monotonic relationships.
Mixed colors: Suggest complex or non-linear relationships.
Wide vs. narrow distributions: Highlight consistent vs. variable impacts.

3. SHAP Force Plot

The Force Plot shows how each feature contributes to pushing the prediction from the base value to the final prediction for a single instance.

# Create force plot
plt.figure(figsize=(20, 3))
shap.force_plot(explainer.expected_value, shap_values[0,:], X_test.iloc[0,:], 
                matplotlib=True, show=False)
plt.title('SHAP Force Plot')
plt.tight_layout()
plt.show()

Key Elements to Explain

1. Base Value

Starting point: The average model output.
Reference point: Helps in understanding feature impacts.
Model's default prediction: Reflects what the model predicts without any feature contributions.

2. Feature Contributions

Red arrows: Push prediction higher.
Blue arrows: Push prediction lower.
Arrow width: Represents the magnitude of impact.
Order: Features are arranged by their absolute impact on the prediction.

3. Final Prediction

Sum of base value and feature contributions.
Shows how the final prediction is derived.
Helps to validate model logic and understand the reasoning behind predictions.

Using Force Plots Effectively

Explain individual predictions to stakeholders with clarity.
Debug unexpected model behavior and identify potential issues.
Validate feature importance for specific cases or predictions.
Show how different features either compete or complement each other in influencing the model outcome.

4. SHAP Dependence Plot

Dependence Plots show how the SHAP value of a feature changes based on the feature's value, revealing non-linear relationships and interaction effects.

# Select most important feature
feature_idx = np.argmax(np.abs(shap_values).mean(0))
feature_name = X_test.columns[feature_idx]

plt.figure(figsize=(10, 6))
shap.dependence_plot(feature_name, shap_values, X_test, show=False)
plt.title(f'SHAP Dependence Plot for {feature_name}')
plt.tight_layout()
plt.show()

Key Insights to Extract

1. Relationship Pattern

Overall trend: Can be linear, non-linear, or monotonic.
Turning points or thresholds: Indicate critical changes in model behavior.
Regions of high/low impact: Highlight areas with significant influence on predictions.

2. Interaction Effects

Color gradient: Shows the interaction of the feature with another.
Vertical spread at any x-value: Reflects the strength of the interaction.
Patterns in color distribution: Reveal underlying relationships between features.

3. Value Ranges

Distribution of feature values: Illustrates how values are spread across predictions.
Sparse vs. dense data areas: Identifies areas with fewer or more concentrated data points.
Outliers or unusual patterns: Pinpoints anomalies that might need further investigation.

Business Applications

Identify optimal feature value ranges to maximize model accuracy.
Understand feature interaction effects to enhance decision-making.
Guide feature transformation decisions to improve performance.
Support business rule development with insights from interactions and trends.

5. SHAP Waterfall Plot

The Waterfall Plot shows how we build up to the final prediction one feature at a time, starting from the base value.

plt.figure(figsize=(10, 8))
shap.waterfall_plot(shap.Explanation(values=shap_values[0,:],
                                    base_values=explainer.expected_value,
                                    data=X_test.iloc[0,:],
                                    feature_names=X_test.columns.tolist()),
                    show=False)
plt.title('SHAP Waterfall Plot')
plt.tight_layout()
plt.show()

Key Components to Explain

1. Base Value

Starting point: Represents the initial prediction.
Average model output: Derived from the training data's baseline.
Reference: Provides context for understanding the impact of individual features.

2. Feature Contributions

Each bar: Represents the contribution of one specific feature.
Red: Indicates a positive impact on prediction.
Blue: Indicates a negative impact on prediction.
Order: Features are arranged in descending order of impact.

3. Cumulative Effect

Running total: Shown as features are added sequentially.
Final value: Represents the model's ultimate prediction.
Build-up process: Demonstrates how each feature contributes to the final outcome.

Effective Usage

Explain prediction build-up step by step to stakeholders.
Identify key decision points along the way.
Show feature contribution magnitude to assess feature importance.
Validate model logic sequence to ensure coherence in decision-making.

6. SHAP Interaction Plot

The Interaction Plot reveals how pairs of features work together to affect predictions.

# Calculate interaction values
interaction_values = explainer.shap_interaction_values(X_test.iloc[:100,:])

plt.figure(figsize=(12, 8))
shap.summary_plot(interaction_values, X_test.iloc[:100,:], max_display=10, show=False)
plt.title('SHAP Interaction Plot')
plt.tight_layout()
plt.show()

Key Elements to Highlight

1. Path Analysis

Shows prediction path for each instance.
Reveals feature contribution sequence.
Demonstrates cumulative effects as features are added.

2. Feature Impacts

Line slopes: Indicate the magnitude of each feature’s impact.
Color: Represents the identity of the feature.
Crossing lines: Show how feature importance changes over the course of the prediction.

3. Model Behavior

Highlights overall prediction patterns.
Identifies key decision points where the model’s outcome changes.
Marks feature value thresholds that influence predictions.

Practical Applications

Analyze decision paths to understand how the model reaches its conclusions.
Compare similar instances to identify patterns.
Identify critical features that influence predictions.
Validate model logic by examining how feature contributions evolve.

7. SHAP Decision Plot

The Decision Plot shows how multiple features contribute to a prediction across their value ranges.

plt.figure(figsize=(12, 8))
shap.decision_plot(explainer.expected_value, shap_values[:10], X_test.iloc[:10], show=False)
plt.title('SHAP Decision Plot')
plt.tight_layout()
plt.show()

Key Elements to Highlight

1. Path Analysis

Shows prediction path for each instance.
Reveals feature contribution sequence.
Demonstrates cumulative effects as features are added.

2. Feature Impacts

Line slopes: Indicate the magnitude of each feature’s impact.
Color: Represents the identity of the feature.
Crossing lines: Show how feature importance changes over the course of the prediction.

3. Model Behavior

Highlights overall prediction patterns.
Identifies key decision points where the model’s outcome changes.
Marks feature value thresholds that influence predictions.

Practical Applications

Analyze decision paths to understand how the model reaches its conclusions.
Compare similar instances to identify patterns.
Identify critical features that influence predictions.
Validate model logic by examining how feature contributions evolve.

8. SHAP Beeswarm Plot

The Beeswarm Plot provides a detailed view of how feature values affect SHAP values across the dataset.

plt.figure(figsize=(12, 8))
shap.plots.beeswarm(shap.Explanation(values=shap_values,
                                     base_values=np.repeat(explainer.expected_value, len(X_test)),
                                     data=X_test,
                                     feature_names=X_test.columns.tolist()),
                    show=False)
plt.title('SHAP Beeswarm Plot')
plt.tight_layout()
plt.show()

Key Insights to Extract

1. Distribution Analysis

Examine feature value distributions to identify patterns and variations.
Analyze SHAP value ranges to understand the range of feature impacts.
Detect outliers that may indicate unusual or problematic data points.

2. Feature Relationships

Assess value-impact correlations to determine how feature values influence predictions.
Identify non-linear patterns that may suggest complex relationships.
Observe clustering effects that highlight group behaviors or shared characteristics.

3. Global Patterns

Rank features by importance based on SHAP value magnitudes.
Evaluate impact consistency across different predictions.
Investigate how value ranges affect model outputs.

Business Applications

Identify value thresholds for critical decision-making.
Detect anomalies to improve data quality and model reliability.
Guide feature binning to optimize feature representation.
Support data quality analysis by uncovering hidden patterns or errors.