Ensuring AI Quality and Fairness with Giskard’s Testing Framework

AI models are powerful, but are they fair, secure, and robust? Giskard’s open-source framework helps uncover hidden biases, vulnerabilities, and performance flaws in ML models. From automated testing to bias detection, this guide walks you through using Giskard to evaluate and improve your AI systems. Here's what you need to know:

As artificial intelligence continues to impact decision-making processes across industries, ensuring the quality and fairness of AI models is critical. Giskard’s testing framework empowers teams to assess and improve the performance and fairness of machine learning (ML) models effectively. This article outlines Giskard’s framework, its benefits, and provides a detailed, step-by-step guide to get started with Giskard.

What is Giskard?

Giskard is an open-source Python library designed to automatically detect performance, bias, and security issues in AI applications. It caters to a wide range of models, including those based on Large Language Models (LLMs) and traditional ML models for tabular data.

Key Features:

Automated Vulnerability Detection: Giskard scans models to identify issues such as hallucinations, harmful content generation, prompt injection, robustness problems, sensitive information disclosure, and discrimination.
Retrieval Augmented Generation Evaluation Toolkit (RAGET): For Retrieval Augmented Generation applications, Giskard offers RAGET to automatically generate evaluation datasets and assess the accuracy of RAG application responses.
Seamless Integration: Giskard is compatible with various models and environments, integrating smoothly with existing tools and workflows.

Benefits of Giskard

Comprehensive Model Evaluation: Beyond traditional metrics, Giskard assesses models for fairness, robustness, and explainability, ensuring a holistic evaluation.
Built-in Bias Detection: Giskard identifies and helps mitigate biases across different demographic or categorical groups, promoting equitable model performance.
Automated Testing Pipelines: It integrates seamlessly with Continuous Integration/Continuous Deployment (CI/CD) workflows, enabling continuous testing and monitoring of models.
Interpretability: Giskard provides insights into model predictions, enhancing transparency and trust in AI systems.

Getting Started

Installation and Setup

To install Giskard, ensure you have Python 3.9, 3.10, or 3.11. Use pip to install the latest version:

pip install giskard

This command installs Giskard along with support for LLM models.

First Steps

Import the Library: Start by importing Giskard into your Python environment.

import giskard

Set Up a Giskard Project: Create a new project to test your ML model.

project = giskard.Project("my_project")

Load Your Model and Dataset:

from giskard.models import ScikitLearnModel
from giskard.datasets import Dataset

# Example: Using a scikit-learn model and dataset
model = ScikitLearnModel(your_model, model_type="classification")
dataset = Dataset(your_data, target_column="target")

project.add_model(model)
project.add_dataset(dataset)

Step-by-Step Example: Building and Testing a Simple Agent

Let's walk through an example of using Giskard with a logistic regression model for binary classification.

Step 1: Train a Simple Model

We'll use the Iris dataset for this example, focusing on binary classification.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load the dataset
iris = load_iris()
data = iris.data
labels = (iris.target == 0).astype(int)  # Binary classification for class 0

# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

Step 2: Initialize Giskard

Wrap the trained model and dataset for Giskard.

from giskard.models import ScikitLearnModel
from giskard.datasets import Dataset

# Wrap the model and dataset for Giskard
wrapped_model = ScikitLearnModel(model, model_type="classification")
dataset = Dataset(pd.DataFrame(X_test, columns=iris.feature_names), y_test, target="target")

Step 3: Bias Detection

Detect biases in the model's predictions.

from giskard.bias import BiasTester

bias_tester = BiasTester(wrapped_model, dataset)
results = bias_tester.detect_bias("target")

# Print bias results
print(results)

This scan will identify potential biases and other issues in the model.

Step 4: Quality Testing

Run automated quality tests to ensure the model meets performance standards.

from giskard.quality import QualityTester

quality_tester = QualityTester(wrapped_model, dataset)
quality_results = quality_tester.run_tests()

# Display quality metrics
for test, result in quality_results.items():
    print(f"{test}: {result}")

This process validates the model's performance across various scenarios.

Step 5: Generate a Report

Create a comprehensive report to share with stakeholders.

project.generate_report(output_dir="./giskard_report")

Advanced Features of Giskard

Building upon the foundational understanding of Giskard's capabilities, it's beneficial to explore advanced features, real-world applications, and best practices to fully leverage this testing framework.

Customized Test Suites:
- Giskard allows the creation of tailored test suites to address specific challenges in AI models, such as misinformation prevention, harmful content detection, and prompt injection assessment. This customization ensures that models are evaluated against criteria most relevant to their intended applications.
Continuous Integration/Continuous Deployment (CI/CD) Integration:
- Integrating Giskard into CI/CD pipelines facilitates continuous monitoring and validation of models. This ensures that any updates or changes to the model are automatically tested, maintaining consistent quality and performance standards.
Advanced Scan Customization:
- Users can fine-tune the scanning process by selecting specific detectors, limiting the scan to particular features, and adjusting parameters like minimum slice size. This level of control enables more focused and efficient testing.

Real-World Applications

L'Oréal's AI Model Evaluation:
- L'Oréal collaborated with Giskard to enhance their facial landmark detection models. By evaluating multiple models under diverse conditions, Giskard helped ensure reliable and inclusive predictions across different user demographics, improving the accuracy and robustness of L'Oréal's digital services.
Citibeats' Ethical AI Implementation:
- Citibeats utilized Giskard to test their Natural Language Processing (NLP) models for ethical biases. This proactive approach allowed them to identify and mitigate potential biases, maintaining trust with their clients and the public.

Best Practices for Utilizing Giskard

Regular Testing: Incorporate Giskard's testing routines into the regular development cycle to promptly identify and address issues.
Collaborative Evaluation: Use Giskard's reporting features to facilitate discussions among stakeholders, ensuring that model evaluations consider diverse perspectives.
Stay Informed: Keep abreast of updates to Giskard and its documentation to leverage new features and improvements.

By delving into these advanced features and applications, teams can harness Giskard's full potential, ensuring that AI models are not only effective but also fair, secure, and aligned with ethical standards.

‍

Final Thoughts

In conclusion, Giskard stands out as a comprehensive solution for ensuring the quality, fairness, and security of AI models. Its open-source nature, combined with features like automated vulnerability detection, bias mitigation, and seamless integration into existing workflows, makes it an invaluable tool for data scientists and AI practitioners. By adopting Giskard, organizations can proactively address potential issues, foster collaboration among stakeholders, and align their AI systems with ethical standards and regulatory requirements. As the AI landscape continues to evolve, leveraging tools like Giskard will be essential in building trustworthy and responsible AI applications.

You can also check out our previous article on using Giskard for testing RAG systems.

‍

Cohorte Team

January 24, 2025