Ensuring AI Quality and Fairness: A Comprehensive Guide to Giskard's Testing Framework - Part 2

AI is driving critical decisions, but is your model fair, secure, and reliable? Giskard, the open-source testing framework, ensures your machine learning models meet the highest standards. Let's dive in.

Ensuring the quality, fairness, and security of artificial intelligence (AI) models is paramount as they become increasingly integral to decision-making processes across various industries. Giskard, an open-source testing framework, offers a comprehensive solution to evaluate and enhance machine learning (ML) models. This article provides an in-depth guide to Giskard's framework, its benefits, and a step-by-step tutorial to get started.

What is Giskard?

Giskard is an open-source Python library designed to automatically detect performance, bias, and security issues in AI applications. It supports a wide range of models, including those based on Large Language Models (LLMs) and traditional ML models for tabular data. ‍

Key Features:

‍Automated Vulnerability Detection: Giskard scans models to identify issues such as hallucinations, harmful content generation, prompt injection, robustness problems, sensitive information disclosure, and discrimination. ‍
Retrieval Augmented Generation Evaluation Toolkit (RAGET): For Retrieval Augmented Generation applications, Giskard offers RAGET to automatically generate evaluation datasets and assess the accuracy of RAG application responses. ‍
Seamless Integration: Giskard is compatible with various models and environments, integrating smoothly with existing tools and workflows.

Benefits of Giskard

‍Comprehensive Model Evaluation: Beyond traditional metrics, Giskard assesses models for fairness, robustness, and explainability, ensuring a holistic evaluation. ‍
Built-in Bias Detection: Giskard identifies and helps mitigate biases across different demographic or categorical groups, promoting equitable model performance. ‍
Automated Testing Pipelines: It integrates seamlessly with Continuous Integration/Continuous Deployment (CI/CD) workflows, enabling continuous testing and monitoring of models. ‍
Interpretability: Giskard provides insights into model predictions, enhancing transparency and trust in AI systems.

Getting Started with Giskard

Installation and Setup

To install Giskard, ensure you have Python 3.9, 3.10, or 3.11. Use pip to install the latest version:

pip install "giskard[llm]" -U

This command installs Giskard along with support for LLM models.

First Steps

Import the Library: Begin by importing Giskard into your Python environment:

import giskard

Set Up a Giskard Project: Create a new project to test your ML model.

project = giskard.Project("my_project")

Load Your Model and Dataset:

For Scikit-Learn Models:

from giskard.models import ScikitLearnModel
from giskard.datasets import Dataset

# Example: Using a scikit-learn model and dataset
model = ScikitLearnModel(your_model, model_type="classification")
dataset = Dataset(your_data, target_column="target")

project.add_model(model)
project.add_dataset(dataset)

For PyTorch Models:

from giskard.models import PyTorchModel
from giskard.datasets import Dataset

# Example: Using a PyTorch model and dataset
model = PyTorchModel(your_model, model_type="classification")
dataset = Dataset(your_data, target_column="target")

Step-by-Step Example: Building and Testing a Simple Agent

Here’s a practical example of using Giskard with a logistic regression model for binary classification.

Step 1: Train a Simple Model

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
import pandas as pd

# Load the dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
labels = (iris.target == 0).astype(int)  # Binary classification for class 0

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

Step 2: Initialize Giskard

from giskard.models import ScikitLearnModel
from giskard.datasets import Dataset

# Wrap the model and dataset for Giskard
wrapped_model = ScikitLearnModel(model, model_type="classification")
dataset = Dataset(X_test, y_test, target="target")

Step 3: Bias Detection

Detect biases in the model’s predictions.

import giskard

# Initialize the Giskard client
client = giskard.Client(url="http://localhost:19000", username="your_username", password="your_password")

# Upload the model and dataset to the Giskard server
project = client.create_project("my_project", "My Project")
project.upload_model(wrapped_model, "logistic_regression_model")
project.upload_dataset(dataset, "iris_test_dataset")

# Run bias detection
bias_report = project.scan(wrapped_model, dataset, only="bias")

# Print bias results
print(bias_report)

Step 4: Quality Testing

Run automated quality tests to ensure the model meets performance standards.

# Run performance tests
performance_report = project.scan(wrapped_model, dataset, only="performance")

# Display quality metrics
print(performance_report)

Step 5: Generate a Report

Create a comprehensive report to share with stakeholders.

# Generate and download the report
project.generate_report(output_dir="./giskard_report")

Advanced Features and Customization

Giskard offers a range of advanced functionalities that allow for tailored testing and evaluation of machine learning models.

1. Customized Scan Configurations:

Giskard's scanning capabilities can be fine-tuned to focus on specific aspects of your model.

Limiting to Specific Detectors:

If you want to run only a specific detector or a group of detectors, you can use the only argument.

import giskard as gsk

# Run only robustness detectors
report = gsk.scan(my_model, my_dataset, only="robustness")

# Run multiple detectors
report = gsk.scan(my_model, my_dataset, only=["robustness", "performance"])

Focusing on Selected Features:

To limit the scan to specific features of your model, use the features argument.

report = gsk.scan(my_model, my_dataset, features=["feature_1", "feature_2"])

Advanced Detector Configuration:

Customize the configuration of specific detectors using the params argument.

params = {
    "performance_bias": {"threshold": 0.04, "metrics": ["accuracy", "f1"]},
    "ethical_bias": {"output_sensitivity": 0.5},
}
report = gsk.scan(my_model, my_dataset, params=params)

These configurations allow for a more focused and efficient evaluation process.

2. Creating Custom Tests:

‍Beyond built-in tests, Giskard enables the creation of custom tests to address domain-specific requirements.

Defining a Custom Test:

You can create a custom test by decorating a Python function with @giskard.testing.test.

import giskard
from giskard.testing import test, TestResult

@test(name="Custom Test for Feature Range")
def test_feature_range(model, dataset):
    # Implement your test logic here
    passed = True  # Replace with actual condition
    return TestResult(passed=passed, metric=0.95)

This approach allows you to incorporate domain knowledge into your testing framework.

3. Integration with CI/CD Pipelines:

Integrating Giskard into your Continuous Integration/Continuous Deployment (CI/CD) pipelines ensures continuous monitoring and validation of your models.

Automated Testing:

Set up automated tests that run during your CI/CD processes to catch issues early.‍

Example with GitHub Actions:

Create a workflow that installs dependencies and runs Giskard tests.

name: Giskard Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: |
          pip install giskard
          # Install other dependencies
      - name: Run Giskard tests
        run: |
          # Command to run your tests

4. Test Suites and Data Slicing:

Giskard allows the creation of test suites and data slices to evaluate model performance on specific subsets of data.

Creating Data Slices:

Define slicing functions to focus tests on particular segments of your dataset.

import pandas as pd
from giskard import slicing_function

@slicing_function(name="High Income Group")
def high_income_slice(row: pd.Series) -> bool:
    return row["income"] > 50000

Building Test Suites:

Combine multiple tests into a suite for comprehensive evaluation.

from giskard import Suite

suite = Suite("Income Model Test Suite", tests=[test_feature_range, another_test])
suite.run()

This structured approach ensures thorough testing across various data segments.

By leveraging these advanced features, Giskard empowers data scientists and machine learning engineers to conduct thorough and customized evaluations of their models, ensuring robustness, fairness, and reliability in AI applications.

Real-World Applications

L'Oréal's AI Model Evaluation:
- L'Oréal collaborated with Giskard to enhance their facial landmark detection models. By evaluating multiple models under diverse conditions, Giskard helped ensure reliable and inclusive predictions across different user demographics, improving the accuracy and robustness of L'Oréal's digital services.
Citibeats' Ethical AI Implementation:
- Citibeats utilized Giskard to test their Natural Language Processing (NLP) models for ethical biases. This proactive approach allowed them to identify and mitigate potential biases, maintaining trust with their clients and the public.

Best Practices for Utilizing Giskard

Regular Testing: Incorporate Giskard's testing routines into the regular development cycle to promptly identify and address issues.
Collaborative Evaluation: Use Giskard's reporting features to facilitate discussions among stakeholders, ensuring that model evaluations consider diverse perspectives.
Stay Informed: Keep abreast of updates to Giskard and its documentation to leverage new features and improvements.

By delving into these advanced features and applications, teams can harness Giskard's full potential, ensuring that AI models are not only effective but also fair, secure, and aligned with ethical standards.

Final thoughts

In conclusion, Giskard offers a comprehensive, open-source framework for evaluating and enhancing the quality, fairness, and security of machine learning models. By integrating Giskard into your ML workflows, you can proactively detect and mitigate biases, validate performance across diverse scenarios, and ensure your models adhere to ethical standards. The platform's user-friendly interface and robust reporting tools facilitate collaboration among stakeholders, promoting transparency and trust in AI systems. As AI continues to play a pivotal role in decision-making processes, leveraging tools like Giskard becomes essential for developing responsible and reliable AI applications.

‍

Cohorte Team

January 27, 2025