CourseSentiLab — Basics

Source Predictability — Which Sources Lead the Market

Rank news sources by their predictive power and identify which ones to follow for early signals.

7 min read
Source Predictability — Which Sources Lead the Market

What Is Source Predictability?

Source Predictability is the second analysis type in SentiLab. While Correlation Sweep answers "does sentiment lead price?", Source Predictability answers a more targeted question: "Which specific news sources are the most predictive?"

Not all news sources are equal. Some consistently publish stories before the market reacts (leading sources), while others primarily report on moves that already happened (lagging sources). This analysis ranks them.

SentiLab Quick Analysis — Source Predictability selected

How the Algorithm Works

The Source Predictability algorithm follows these steps:

  1. Group news by source — collects all articles from each news source within the time window.
  2. Compute hourly sentiment averages — for each source, calculates the average sentiment per hour.
  3. Align with hourly prices — matches each source's hourly sentiment with hourly price data.
  4. Test lags -24h to +24h — for each source, tests all 49 time offsets to find where sentiment-price correlation peaks.
  5. Rank by maximum correlation — sources with the highest peak correlation are ranked first.

Reading the Results

The output table shows key metrics for each source:

Source Predictability results — SOL asset
  • Source Name — the news publisher.
  • Article Count — how many articles from this source were in the dataset.
  • Best Lag (hours) — the time offset with the strongest correlation.
  • Correlation Strength — the Pearson r value at that optimal lag.

Interpreting the Lag Value

The lag value is the most important output to understand:

  • Positive lag (e.g., +8h) — the source publishes content before price moves in the sentiment direction. This is a LEADING source — it is predictive. Sentiment from this source today correlates with price 8 hours from now.
  • Negative lag (e.g., -4h) — the source publishes content after price has already moved. This is a LAGGING source — it is reactive. It reports on what already happened.
  • Zero lag — sentiment and price move simultaneously. The source may be reporting on events in real-time.

Example Results

A typical Source Predictability output might look like:

  • CoinDesk — r = 0.52 at +8h lag → strong predictor (publishes 8 hours before price reacts).
  • The Block — r = 0.41 at +6h lag → good predictor (6-hour lead time).
  • Decrypt — r = 0.28 at -4h lag → follows price (publishes 4 hours after the move).

In this example, prioritizing CoinDesk and The Block over Decrypt would give you earlier exposure to market-relevant information.

Complete source leaderboard showing all ranked sources

Minimum Article Count

Sources must have a minimum number of articles (default: 15) to be included in the ranking. This ensures statistical reliability — a source with only 3 articles might show a perfect correlation purely by chance. The minimum count filter prevents these spurious results from appearing in your leaderboard.

Configuration Parameters

Source Predictability accepts the same parameters as Correlation Sweep:

  • Asset — which asset to analyze.
  • Time range — the analysis window (90+ days recommended for reliable results).
  • Quality filters — minimum quality and credibility scores.
  • Predictive Only — filter to forward-looking articles only (see Predictive Only Filter).

Practical Use

The actionable takeaway from Source Predictability is a priority reading list. Once you identify which sources consistently lead price for the assets you trade, you can:

  • Set up alerts or RSS feeds for those specific sources.
  • Weight their articles more heavily when forming trading decisions.
  • Ignore or de-prioritize lagging sources that only add noise after the move is done.

Why This Matters

Information asymmetry is one of the few genuine edges in markets. If you know which sources tend to publish actionable information earliest, you gain a time advantage. Source Predictability turns an abstract concept — "some sources are better than others" — into a quantified, ranked leaderboard.