Permutation Test and Statistical Significance

Add an extra layer of statistical confidence with non-parametric permutation testing.

7 min read

The Problem with Standard P-Values

In a Correlation Sweep, each lag's significance is tested using a standard t-distribution p-value, then corrected for multiple testing via Benjamini-Hochberg FDR. This is solid statistical practice — but it makes one assumption that may not hold: it assumes the data follows a normal distribution.

Financial data is famously non-normal. It has fat tails (extreme moves happen more often than a bell curve predicts), skewness (more sharp drops than sharp rallies), and serial correlation (today's return is somewhat dependent on yesterday's). These properties can make parametric p-values either too lenient or too strict.

What the Permutation Test Does

The permutation test provides a non-parametric alternative that makes no distributional assumptions. Here is the process:

Shuffle sentiment data randomly — the time series of sentiment scores is randomly reordered, breaking any real relationship with price while preserving the distribution of sentiment values.
Compute maximum |correlation| across ALL lags — for each shuffled version, find the strongest absolute correlation at any lag. This represents what chance alone can produce.
Repeat 50 times — collect 50 baseline max|r| values from 50 different random shuffles.
Take the 95th percentile — this becomes the null distribution threshold. If your observed correlation exceeds this threshold, it is unlikely to have arisen by chance.

Empirical P-Value

The empirical p-value is the fraction of shuffled datasets where the maximum |correlation| was greater than or equal to your observed maximum |correlation|.

For example: if only 2 out of 50 shuffles produced a correlation as strong as your result, the empirical p-value is 2/50 = 0.04 — significant at the 5% level.

0/50 shuffles beat your result → empirical p < 0.02. Very strong evidence.
2/50 shuffles beat your result → empirical p = 0.04. Significant.
10/50 shuffles beat your result → empirical p = 0.20. Not significant — chance alone can produce correlations this strong 20% of the time.

Why Use It Over Standard P-Values?

The permutation test has several advantages:

No distributional assumptions — works correctly whether your data is normal, skewed, fat-tailed, or anything else.
Accounts for multiple testing inherently — because it uses the maximum |correlation| across all lags in each shuffle, it naturally penalizes the fact that you tested many lags.
More conservative for financial data — in practice, it tends to reject weak signals that parametric tests might let through, which is desirable when you're making financial decisions.

The Trade-off

The main cost is computation time. Instead of computing one set of correlations, the system computes 50 additional sets. This adds approximately 10 seconds to the analysis time — a modest cost for the added confidence.

When to Enable It

Enable the permutation test via the "Use Permutation Test" toggle in the advanced configuration panel. Recommended scenarios:

Before acting on a signal — if you found a promising correlation and plan to incorporate it into a trading strategy, run the permutation test to validate it.
When publishing an experiment — permutation-validated results are more credible and more likely to pass the quality gate for published experiments.
When FDR-corrected p-value is borderline — if your result is significant at p = 0.04 but you want extra confidence, the permutation test provides an independent check.

The Gold Standard: Double Validation

When both validation methods agree — FDR-corrected p-value is significant AND the permutation test confirms significance — you have the strongest possible statistical confidence in your result. This double validation means:

The parametric test found the correlation significant after correcting for multiple testing.
The non-parametric test confirmed that random shuffling almost never produces a correlation this strong.

Results that survive both tests are the ones most worth acting on.

Interpreting Combined Results

Both significant → highest confidence. The signal is real.
FDR significant, permutation not → the parametric assumptions may be inflating significance. Treat with caution.
Permutation significant, FDR not → unusual, but possible if the FDR correction is very strict (many lags tested). The signal may be real but marginal.
Neither significant → no reliable signal at this time. Try different lag ranges, filters, or a longer time window.

Why This Matters

In quantitative finance, the difference between a real signal and a statistical artifact can be worth substantial money. The permutation test is your safeguard against false discovery — a 10-second computation that can save you from acting on noise. When combined with FDR correction, it represents the most rigorous validation available in SentiLab.

← Back to course overview