Source Predictability — Which Sources Lead the Market
Rank news sources by their predictive power and identify which ones to follow for early signals.

What Is Source Predictability?
Source Predictability is the second analysis type in SentiLab. While Correlation Sweep answers "does sentiment lead price?", Source Predictability answers a more targeted question: "Which specific news sources are the most predictive?"
Not all news sources are equal. Some consistently publish stories before the market reacts (leading sources), while others primarily report on moves that already happened (lagging sources). This analysis ranks them.
How the Algorithm Works
The Source Predictability algorithm follows these steps:
- Group news by source — collects all articles from each news source within the time window.
- Compute hourly sentiment averages — for each source, calculates the average sentiment per hour.
- Align with hourly prices — matches each source's hourly sentiment with hourly price data.
- Test lags -24h to +24h — for each source, tests all 49 time offsets to find where sentiment-price correlation peaks.
- Rank by maximum correlation — sources with the highest peak correlation are ranked first.
Reading the Results
The output table shows key metrics for each source:
- Source Name — the news publisher.
- Article Count — how many articles from this source were in the dataset.
- Best Lag (hours) — the time offset with the strongest correlation.
- Correlation Strength — the Pearson r value at that optimal lag.
Interpreting the Lag Value
The lag value is the most important output to understand:
- Positive lag (e.g., +8h) — the source publishes content before price moves in the sentiment direction. This is a LEADING source — it is predictive. Sentiment from this source today correlates with price 8 hours from now.
- Negative lag (e.g., -4h) — the source publishes content after price has already moved. This is a LAGGING source — it is reactive. It reports on what already happened.
- Zero lag — sentiment and price move simultaneously. The source may be reporting on events in real-time.
Example Results
A typical Source Predictability output might look like:
- CoinDesk — r = 0.52 at +8h lag → strong predictor (publishes 8 hours before price reacts).
- The Block — r = 0.41 at +6h lag → good predictor (6-hour lead time).
- Decrypt — r = 0.28 at -4h lag → follows price (publishes 4 hours after the move).
In this example, prioritizing CoinDesk and The Block over Decrypt would give you earlier exposure to market-relevant information.
Minimum Article Count
Sources must have a minimum number of articles (default: 15) to be included in the ranking. This ensures statistical reliability — a source with only 3 articles might show a perfect correlation purely by chance. The minimum count filter prevents these spurious results from appearing in your leaderboard.
Configuration Parameters
Source Predictability accepts the same parameters as Correlation Sweep:
- Asset — which asset to analyze.
- Time range — the analysis window (90+ days recommended for reliable results).
- Quality filters — minimum quality and credibility scores.
- Predictive Only — filter to forward-looking articles only (see Predictive Only Filter).
Practical Use
The actionable takeaway from Source Predictability is a priority reading list. Once you identify which sources consistently lead price for the assets you trade, you can:
- Set up alerts or RSS feeds for those specific sources.
- Weight their articles more heavily when forming trading decisions.
- Ignore or de-prioritize lagging sources that only add noise after the move is done.
Why This Matters
Information asymmetry is one of the few genuine edges in markets. If you know which sources tend to publish actionable information earliest, you gain a time advantage. Source Predictability turns an abstract concept — "some sources are better than others" — into a quantified, ranked leaderboard.