Documentation
Everything you need to detect, track, and eliminate flaky tests. From installation to CI/CD integration and dashboard setup.
Installation
Install the DeFlaky CLI globally with npm. Requires Node.js 18 or later.
$ npm install -g deflaky-cli
# Verify installation
$ deflaky-cli --version
You can also use npx to run without installing globally:
$ npx deflaky-cli --help
Quick Start
Wrap your existing test command with deflaky-cli run. DeFlaky runs it multiple times and identifies flaky tests by comparing results.
# Run Playwright tests 5 times (default)
$ deflaky-cli run -- npx playwright test
# Run 10 times with a custom threshold
$ deflaky-cli run --runs 10 --threshold 95 -- npx playwright test
Basic Usage & Flags
The most common flags you will use day to day:
# Specify number of runs
$ deflaky-cli run --runs 5 -- npx playwright test
# Set flakiness threshold (fail if FlakeScore is below)
$ deflaky-cli run --threshold 90 -- npx playwright test
# Output as JSON
$ deflaky-cli run --format json -- npx playwright test
# Output as JUnit XML
$ deflaky-cli run --format junit -- pytest
# Save report to a file
$ deflaky-cli run --format json --output report.json -- npx jest
CLI Commands
deflaky-cli run
Run your test command N times and detect flaky tests. This is the primary command you will use.
$ deflaky-cli run --runs 5 -- npx playwright testdeflaky-cli push
Push a previously generated report to the DeFlaky dashboard. Useful when you want to separate detection from reporting.
$ deflaky-cli push --file report.json --token df_abc123deflaky-cli config
View or update your local DeFlaky configuration. Creates a .deflaky.config.json file in your project root.
$ deflaky-cli config set runs 10Flags & Options
| Flag | Description | Default |
|---|---|---|
| --runs, -r | Number of test iterations | 5 |
| --threshold, -t | Minimum FlakeScore to pass (0-100) | disabled |
| --format, -f | Report format: json, junit, auto | auto |
| --output, -o | Save report to file | stdout |
| --push | Push results to the dashboard after run | false |
| --token | Dashboard API token | $DEFLAKY_API_TOKEN |
| --project | Project slug for dashboard | auto-detected |
| --verbose | Show detailed output per run | false |
| --parallel | Max parallel test executions | 1 |
| --fail-on-flaky | Exit with code 1 if any flaky test found | false |
| --help, -h | Show help | — |
| --version, -v | Show CLI version | — |
Environment Variables
Environment variables can be used instead of CLI flags. Flags always take precedence over environment variables.
| Variable | Description |
|---|---|
| DEFLAKY_API_TOKEN | Dashboard API token for authentication |
| DEFLAKY_API_URL | Custom API endpoint (self-hosted instances) |
| DEFLAKY_RUNS | Default number of test iterations |
| DEFLAKY_THRESHOLD | Default FlakeScore threshold |
| DEFLAKY_FORMAT | Default report format |
Playwright
Playwright is a first-class citizen in DeFlaky. Results are parsed automatically from Playwright's built-in JSON reporter.
# Basic detection
$ deflaky-cli run -- npx playwright test
# Run specific test file 10 times
$ deflaky-cli run --runs 10 -- npx playwright test tests/login.spec.ts
# With a specific project (e.g. chromium only)
$ deflaky-cli run -- npx playwright test --project=chromium
# Push results to dashboard
$ deflaky-cli run --push -- npx playwright test
Cypress
DeFlaky works with Cypress in headless mode. Make sure you are using cypress run (not cypress open).
# Detect flaky Cypress tests
$ deflaky-cli run -- npx cypress run
# Specific spec file
$ deflaky-cli run --runs 5 -- npx cypress run --spec cypress/e2e/checkout.cy.ts
# With a specific browser
$ deflaky-cli run -- npx cypress run --browser chrome
Selenium (Java / Maven)
For Java-based Selenium projects using Maven and JUnit/TestNG, DeFlaky parses the Surefire XML reports automatically.
# Run Maven tests
$ deflaky-cli run -- mvn test
# Specific test class
$ deflaky-cli run -- mvn test -Dtest="LoginTest"
# With Gradle
$ deflaky-cli run -- gradle test
Jest
DeFlaky supports Jest out of the box. Use the --forceExit flag if your Jest tests hang after completion.
# Run all Jest tests
$ deflaky-cli run -- npx jest
# Specific test file
$ deflaky-cli run --runs 10 -- npx jest src/__tests__/api.test.ts
# With coverage disabled for speed
$ deflaky-cli run -- npx jest --no-coverage
Pytest
For Python projects, DeFlaky wraps your pytest command and parses JUnit XML output.
# Run all pytest tests
$ deflaky-cli run -- pytest
# Specific test module
$ deflaky-cli run --runs 5 -- pytest tests/test_auth.py
# With JUnit XML output for richer reports
$ deflaky-cli run -- pytest --junitxml=report.xml
# Run in verbose mode
$ deflaky-cli run -- pytest -v
Mocha
Mocha tests work seamlessly with DeFlaky. Use the --exit flag to ensure Mocha exits cleanly.
# Run all Mocha tests
$ deflaky-cli run -- npx mocha
# With specific test directory
$ deflaky-cli run -- npx mocha "test/**/*.spec.js" --exit
# With TypeScript
$ deflaky-cli run -- npx mocha --require ts-node/register 'test/**/*.spec.ts'
TestNG
For TestNG projects with Maven, point Surefire to your testng.xml suite file.
# Run TestNG suite
$ deflaky-cli run -- mvn test -Dsurefire.suiteXmlFiles=testng.xml
# Specific test group
$ deflaky-cli run -- mvn test -Dgroups="smoke"
GitHub Actions
Add a flaky test check to every pull request. The workflow installs DeFlaky, runs your tests multiple times, and fails the check if the FlakeScore drops below your threshold.
name: Flaky Test Check
on: [pull_request]
jobs:
flaky-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: 20
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps
- name: Run DeFlaky
run: npx deflaky-cli run --runs 3 --threshold 90 --push -- npx playwright test
env:
DEFLAKY_API_TOKEN: ${{ secrets.DEFLAKY_API_TOKEN }}
GitLab CI
flaky_check:
stage: test
image: mcr.microsoft.com/playwright:v1.44.0-jammy
script:
- npm ci
- npx deflaky-cli run --runs 3 --threshold 90 --push -- npx playwright test
variables:
DEFLAKY_API_TOKEN: $DEFLAKY_API_TOKEN
only:
- merge_requests
Jenkins Pipeline
pipeline {
agent { docker { image 'node:20' } }
environment {
DEFLAKY_API_TOKEN = credentials('deflaky-token')
}
stages {
stage('Install') {
steps {
sh 'npm ci'
}
}
stage('Flaky Check') {
steps {
sh 'npx deflaky-cli run --runs 3 --threshold 90 --push -- npx playwright test'
}
}
}
}
Generic CI Setup
DeFlaky works in any CI environment that supports Node.js. The general pattern is:
# 1. Install your project dependencies
$ npm ci
# 2. Run DeFlaky with your test command
$ npx deflaky-cli run --runs 3 --threshold 90 --push -- <your-test-command>
# Make sure DEFLAKY_API_TOKEN is set in your CI environment
# DeFlaky exits with code 1 if FlakeScore is below --threshold
Connecting CLI to Dashboard
The DeFlaky dashboard gives you a visual overview of your test suite's reliability over time. To connect the CLI, you need an API token.
# Option 1: Pass token as a flag
$ deflaky-cli run --push --token df_abc123 -- npx playwright test
# Option 2: Set as environment variable (recommended for CI)
$ export DEFLAKY_API_TOKEN=df_abc123
$ deflaky-cli run --push -- npx playwright test
# Option 3: Save in config file
$ deflaky-cli config set token df_abc123
$ deflaky-cli run --push -- npx playwright test
Creating Projects & API Tokens
Each project in the dashboard has its own API token. To create a new project:
- Go to the Dashboard and sign in.
- Click New Project and enter a name.
- Copy the generated API token. This is your
DEFLAKY_API_TOKEN. - Store the token securely in your CI secrets (never commit it to source control).
Understanding FlakeScore
FlakeScore is a 0-100 metric that represents the overall reliability of your test suite:
95 - 100
Excellent
Minimal flakiness. Ship with confidence.
80 - 94
Needs Attention
Some flaky tests. Prioritize fixing them.
0 - 79
Critical
High flakiness. Tests are unreliable.
The score is calculated as: (stable tests / total tests) * 100, weighted by run count and recency.
Viewing Flaky Test History
The dashboard tracks every test run and shows you trends over time. You can filter by date range, test name, or status. Each flaky test entry shows:
- Test name and file path
- Pass rate across runs (e.g., 3/5 passed)
- First seen and last seen dates
- FlakeScore trend (improving or degrading)
- Stack traces from failed runs
AI Root Cause Analysis (Pro)
Available on the Pro plan, AI Root Cause Analysis automatically analyzes your flaky test failures and categorizes them into:
- Infrastructure — network timeouts, resource limits, environment drift
- Application bug — race conditions, state leaks, timing issues
- Test code — poor selectors, missing waits, shared state between tests
- Non-deterministic — random data, date/time dependencies, order-dependent tests
The AI also suggests concrete fixes with code examples. Bring your own API key from Anthropic, OpenAI, Groq, OpenRouter, or Ollama.
Configuration File
Create a .deflaky.config.json file in your project root to set default options. The CLI automatically picks it up.
{
"runs": 5,
"threshold": 90,
"format": "json",
"push": true,
"token": "df_abc123",
"project": "my-app",
"apiUrl": "https://api.deflaky.com",
"parallel": 1,
"verbose": false,
"failOnFlaky": false
}
You can also generate this file interactively:
$ deflaky-cli config init
Available Options
| Key | Type | Description | Default |
|---|---|---|---|
| runs | number | Number of test iterations | 5 |
| threshold | number | Minimum FlakeScore to pass (0-100) | disabled |
| format | string | Report format: json, junit, auto | auto |
| push | boolean | Push results to dashboard | false |
| token | string | Dashboard API token | — |
| project | string | Project slug for dashboard | auto-detected |
| apiUrl | string | Custom API endpoint | https://api.deflaky.com |
| parallel | number | Max parallel test executions | 1 |
| verbose | boolean | Show detailed output | false |
| failOnFlaky | boolean | Exit code 1 if any flaky test found | false |
Need help? Open an issue on GitHub or reach out on support@deflaky.com.