Documentation

Everything you need to detect, track, and eliminate flaky tests. From installation to CI/CD integration and dashboard setup.

Installation

Install the DeFlaky CLI globally with npm. Requires Node.js 18 or later.

Terminal

$ npm install -g deflaky-cli

# Verify installation

$ deflaky-cli --version

You can also use npx to run without installing globally:

$ npx deflaky-cli --help

Quick Start

Wrap your existing test command with deflaky-cli run. DeFlaky runs it multiple times and identifies flaky tests by comparing results.

Terminal

# Run Playwright tests 5 times (default)

$ deflaky-cli run -- npx playwright test

# Run 10 times with a custom threshold

$ deflaky-cli run --runs 10 --threshold 95 -- npx playwright test

Basic Usage & Flags

The most common flags you will use day to day:

Terminal

# Specify number of runs

$ deflaky-cli run --runs 5 -- npx playwright test

# Set flakiness threshold (fail if FlakeScore is below)

$ deflaky-cli run --threshold 90 -- npx playwright test

# Output as JSON

$ deflaky-cli run --format json -- npx playwright test

# Output as JUnit XML

$ deflaky-cli run --format junit -- pytest

# Save report to a file

$ deflaky-cli run --format json --output report.json -- npx jest

CLI Commands

deflaky-cli run

Run your test command N times and detect flaky tests. This is the primary command you will use.

$ deflaky-cli run --runs 5 -- npx playwright test

deflaky-cli push

Push a previously generated report to the DeFlaky dashboard. Useful when you want to separate detection from reporting.

$ deflaky-cli push --file report.json --token df_abc123

deflaky-cli config

View or update your local DeFlaky configuration. Creates a .deflaky.config.json file in your project root.

$ deflaky-cli config set runs 10

Flags & Options

Flag	Description	Default
--runs, -r	Number of test iterations	5
--threshold, -t	Minimum FlakeScore to pass (0-100)	disabled
--format, -f	Report format: json, junit, auto	auto
--output, -o	Save report to file	stdout
--push	Push results to the dashboard after run	false
--token	Dashboard API token	$DEFLAKY_API_TOKEN
--project	Project slug for dashboard	auto-detected
--verbose	Show detailed output per run	false
--parallel	Max parallel test executions	1
--fail-on-flaky	Exit with code 1 if any flaky test found	false
--help, -h	Show help	—
--version, -v	Show CLI version	—

Environment Variables

Environment variables can be used instead of CLI flags. Flags always take precedence over environment variables.

Variable	Description
DEFLAKY_API_TOKEN	Dashboard API token for authentication
DEFLAKY_API_URL	Custom API endpoint (self-hosted instances)
DEFLAKY_RUNS	Default number of test iterations
DEFLAKY_THRESHOLD	Default FlakeScore threshold
DEFLAKY_FORMAT	Default report format

Playwright

Playwright is a first-class citizen in DeFlaky. Results are parsed automatically from Playwright's built-in JSON reporter.

Terminal

# Basic detection

$ deflaky-cli run -- npx playwright test

# Run specific test file 10 times

$ deflaky-cli run --runs 10 -- npx playwright test tests/login.spec.ts

# With a specific project (e.g. chromium only)

$ deflaky-cli run -- npx playwright test --project=chromium

# Push results to dashboard

$ deflaky-cli run --push -- npx playwright test

Cypress

DeFlaky works with Cypress in headless mode. Make sure you are using cypress run (not cypress open).

Terminal

# Detect flaky Cypress tests

$ deflaky-cli run -- npx cypress run

# Specific spec file

$ deflaky-cli run --runs 5 -- npx cypress run --spec cypress/e2e/checkout.cy.ts

# With a specific browser

$ deflaky-cli run -- npx cypress run --browser chrome

Selenium (Java / Maven)

For Java-based Selenium projects using Maven and JUnit/TestNG, DeFlaky parses the Surefire XML reports automatically.

Terminal

# Run Maven tests

$ deflaky-cli run -- mvn test

# Specific test class

$ deflaky-cli run -- mvn test -Dtest="LoginTest"

# With Gradle

$ deflaky-cli run -- gradle test

Jest

DeFlaky supports Jest out of the box. Use the --forceExit flag if your Jest tests hang after completion.

Terminal

# Run all Jest tests

$ deflaky-cli run -- npx jest

# Specific test file

$ deflaky-cli run --runs 10 -- npx jest src/__tests__/api.test.ts

# With coverage disabled for speed

$ deflaky-cli run -- npx jest --no-coverage

Pytest

For Python projects, DeFlaky wraps your pytest command and parses JUnit XML output.

Terminal

# Run all pytest tests

$ deflaky-cli run -- pytest

# Specific test module

$ deflaky-cli run --runs 5 -- pytest tests/test_auth.py

# With JUnit XML output for richer reports

$ deflaky-cli run -- pytest --junitxml=report.xml

# Run in verbose mode

$ deflaky-cli run -- pytest -v

Mocha

Mocha tests work seamlessly with DeFlaky. Use the --exit flag to ensure Mocha exits cleanly.

Terminal

# Run all Mocha tests

$ deflaky-cli run -- npx mocha

# With specific test directory

$ deflaky-cli run -- npx mocha "test/**/*.spec.js" --exit

# With TypeScript

$ deflaky-cli run -- npx mocha --require ts-node/register 'test/**/*.spec.ts'

TestNG

For TestNG projects with Maven, point Surefire to your testng.xml suite file.

Terminal

# Run TestNG suite

$ deflaky-cli run -- mvn test -Dsurefire.suiteXmlFiles=testng.xml

# Specific test group

$ deflaky-cli run -- mvn test -Dgroups="smoke"

GitHub Actions

Add a flaky test check to every pull request. The workflow installs DeFlaky, runs your tests multiple times, and fails the check if the FlakeScore drops below your threshold.

For the complete guide with PR comments, framework-specific examples, reusable workflows, and troubleshooting, see the dedicated GitHub Actions documentation.

.github/workflows/flaky-check.yml

name: Flaky Test Check

on: [pull_request]

jobs:

flaky-check:

runs-on: ubuntu-latest

steps:

- uses: actions/checkout@v4

- name: Setup Node.js

uses: actions/setup-node@v4

with:

node-version: 20

- name: Install dependencies

run: npm ci

- name: Install Playwright browsers

run: npx playwright install --with-deps

- name: Run DeFlaky

run: npx deflaky-cli run --runs 3 --threshold 90 --push -- npx playwright test

env:

DEFLAKY_API_TOKEN: ${{ secrets.DEFLAKY_API_TOKEN }}

GitLab CI

.gitlab-ci.yml

flaky_check:

stage: test

image: mcr.microsoft.com/playwright:v1.44.0-jammy

script:

- npm ci

- npx deflaky-cli run --runs 3 --threshold 90 --push -- npx playwright test

variables:

DEFLAKY_API_TOKEN: $DEFLAKY_API_TOKEN

only:

- merge_requests

Jenkins Pipeline

Jenkinsfile

pipeline {

agent { docker { image 'node:20' } }

environment {

DEFLAKY_API_TOKEN = credentials('deflaky-token')

}

stages {

stage('Install') {

steps {

sh 'npm ci'

}

stage('Flaky Check') {

steps {

sh 'npx deflaky-cli run --runs 3 --threshold 90 --push -- npx playwright test'

}

Generic CI Setup

DeFlaky works in any CI environment that supports Node.js. The general pattern is:

Any CI

# 1. Install your project dependencies

$ npm ci

# 2. Run DeFlaky with your test command

$ npx deflaky-cli run --runs 3 --threshold 90 --push -- <your-test-command>

# Make sure DEFLAKY_API_TOKEN is set in your CI environment

# DeFlaky exits with code 1 if FlakeScore is below --threshold

Connecting CLI to Dashboard

The DeFlaky dashboard gives you a visual overview of your test suite's reliability over time. To connect the CLI, you need an API token.

Terminal

# Option 1: Pass token as a flag

$ deflaky-cli run --push --token df_abc123 -- npx playwright test

# Option 2: Set as environment variable (recommended for CI)

$ export DEFLAKY_API_TOKEN=df_abc123

$ deflaky-cli run --push -- npx playwright test

# Option 3: Save in config file

$ deflaky-cli config set token df_abc123

$ deflaky-cli run --push -- npx playwright test

Creating Projects & API Tokens

Each project in the dashboard has its own API token. To create a new project:

Go to the Dashboard and sign in.
Click New Project and enter a name.
Copy the generated API token. This is your DEFLAKY_API_TOKEN.
Store the token securely in your CI secrets (never commit it to source control).

Understanding FlakeScore

FlakeScore is a 0-100 metric that represents the overall reliability of your test suite:

95 - 100

Excellent

Minimal flakiness. Ship with confidence.

80 - 94

Needs Attention

Some flaky tests. Prioritize fixing them.

0 - 79

Critical

High flakiness. Tests are unreliable.

The score is calculated as: (stable tests / total tests) * 100, weighted by run count and recency.

Viewing Flaky Test History

The dashboard tracks every test run and shows you trends over time. You can filter by date range, test name, or status. Each flaky test entry shows:

Test name and file path
Pass rate across runs (e.g., 3/5 passed)
First seen and last seen dates
FlakeScore trend (improving or degrading)
Stack traces from failed runs

AI Root Cause Analysis (Pro)

Available on the Pro plan, AI Root Cause Analysis automatically analyzes your flaky test failures and categorizes them into:

Infrastructure — network timeouts, resource limits, environment drift
Application bug — race conditions, state leaks, timing issues
Test code — poor selectors, missing waits, shared state between tests
Non-deterministic — random data, date/time dependencies, order-dependent tests

The AI also suggests concrete fixes with code examples. Bring your own API key from Anthropic, OpenAI, Groq, OpenRouter, or Ollama.

Configuration File

Create a .deflaky.config.json file in your project root to set default options. The CLI automatically picks it up.

.deflaky.config.json

{

"runs": 5,

"threshold": 90,

"format": "json",

"push": true,

"token": "df_abc123",

"project": "my-app",

"apiUrl": "https://api.deflaky.com",

"parallel": 1,

"verbose": false,

"failOnFlaky": false

}

You can also generate this file interactively:

$ deflaky-cli config init

Available Options

Key	Type	Description	Default
runs	number	Number of test iterations	5
threshold	number	Minimum FlakeScore to pass (0-100)	disabled
format	string	Report format: json, junit, auto	auto
push	boolean	Push results to dashboard	false
token	string	Dashboard API token	—
project	string	Project slug for dashboard	auto-detected
apiUrl	string	Custom API endpoint	https://api.deflaky.com
parallel	number	Max parallel test executions	1
verbose	boolean	Show detailed output	false
failOnFlaky	boolean	Exit code 1 if any flaky test found	false

Need help? Open an issue on GitHub or reach out on support@deflaky.com.