Permutation Test and Statistical Significance
Add an extra layer of statistical confidence with non-parametric permutation testing.
The Problem with Standard P-Values
In a Correlation Sweep, each lag's significance is tested using a standard t-distribution p-value, then corrected for multiple testing via Benjamini-Hochberg FDR. This is solid statistical practice — but it makes one assumption that may not hold: it assumes the data follows a normal distribution.
Financial data is famously non-normal. It has fat tails (extreme moves happen more often than a bell curve predicts), skewness (more sharp drops than sharp rallies), and serial correlation (today's return is somewhat dependent on yesterday's). These properties can make parametric p-values either too lenient or too strict.
What the Permutation Test Does
The permutation test provides a non-parametric alternative that makes no distributional assumptions. Here is the process:
- Shuffle sentiment data randomly — the time series of sentiment scores is randomly reordered, breaking any real relationship with price while preserving the distribution of sentiment values.
- Compute maximum |correlation| across ALL lags — for each shuffled version, find the strongest absolute correlation at any lag. This represents what chance alone can produce.
- Repeat 50 times — collect 50 baseline max|r| values from 50 different random shuffles.
- Take the 95th percentile — this becomes the null distribution threshold. If your observed correlation exceeds this threshold, it is unlikely to have arisen by chance.
Empirical P-Value
The empirical p-value is the fraction of shuffled datasets where the maximum |correlation| was greater than or equal to your observed maximum |correlation|.
For example: if only 2 out of 50 shuffles produced a correlation as strong as your result, the empirical p-value is 2/50 = 0.04 — significant at the 5% level.
- 0/50 shuffles beat your result → empirical p < 0.02. Very strong evidence.
- 2/50 shuffles beat your result → empirical p = 0.04. Significant.
- 10/50 shuffles beat your result → empirical p = 0.20. Not significant — chance alone can produce correlations this strong 20% of the time.
Why Use It Over Standard P-Values?
The permutation test has several advantages:
- No distributional assumptions — works correctly whether your data is normal, skewed, fat-tailed, or anything else.
- Accounts for multiple testing inherently — because it uses the maximum |correlation| across all lags in each shuffle, it naturally penalizes the fact that you tested many lags.
- More conservative for financial data — in practice, it tends to reject weak signals that parametric tests might let through, which is desirable when you're making financial decisions.
The Trade-off
The main cost is computation time. Instead of computing one set of correlations, the system computes 50 additional sets. This adds approximately 10 seconds to the analysis time — a modest cost for the added confidence.
When to Enable It
Enable the permutation test via the "Use Permutation Test" toggle in the advanced configuration panel. Recommended scenarios:
- Before acting on a signal — if you found a promising correlation and plan to incorporate it into a trading strategy, run the permutation test to validate it.
- When publishing an experiment — permutation-validated results are more credible and more likely to pass the quality gate for published experiments.
- When FDR-corrected p-value is borderline — if your result is significant at p = 0.04 but you want extra confidence, the permutation test provides an independent check.
The Gold Standard: Double Validation
When both validation methods agree — FDR-corrected p-value is significant AND the permutation test confirms significance — you have the strongest possible statistical confidence in your result. This double validation means:
- The parametric test found the correlation significant after correcting for multiple testing.
- The non-parametric test confirmed that random shuffling almost never produces a correlation this strong.
Results that survive both tests are the ones most worth acting on.
Interpreting Combined Results
- Both significant → highest confidence. The signal is real.
- FDR significant, permutation not → the parametric assumptions may be inflating significance. Treat with caution.
- Permutation significant, FDR not → unusual, but possible if the FDR correction is very strict (many lags tested). The signal may be real but marginal.
- Neither significant → no reliable signal at this time. Try different lag ranges, filters, or a longer time window.
Why This Matters
In quantitative finance, the difference between a real signal and a statistical artifact can be worth substantial money. The permutation test is your safeguard against false discovery — a 10-second computation that can save you from acting on noise. When combined with FDR correction, it represents the most rigorous validation available in SentiLab.