CourseSentiLab — Advanced

Predictive Only — Filtering for Forward-Looking News

Use LLM intent classification to isolate future-oriented articles and improve predictive signal strength.

5 min read

Article Intent Classification

During deep analysis, SentiSignal's LLM classifies each article's intent — what the article is trying to communicate about time. This classification happens automatically for every processed article and is stored as metadata.

The five intent categories are:

  • future_oriented — Forward-looking statements: guidance, forecasts, plans, predictions, upcoming events. Example: "Fed signals rate cuts in September" or "Ethereum 2.0 upgrade scheduled for Q3."
  • past_oriented — Historical analysis: post-mortems, summaries, what already happened. Example: "Bitcoin rallied 12% last week amid ETF inflows" or "Q2 earnings recap."
  • noise — Promotional content, gossip, speculation without substance. Example: "Top 5 altcoins to buy NOW!" or celebrity endorsement fluff.
  • mixed — Contains both forward and backward-looking elements. Example: an article that recaps last week's events and then forecasts next week.
  • unknown — Classification uncertain. The LLM couldn't confidently assign an intent.

The "Predictive Only" Toggle

In SentiLab's advanced configuration, the "Predictive Only" toggle filters your dataset to include only future_oriented articles. Everything else — past-oriented, noise, mixed, unknown — is excluded.

Why Forward-Looking News Matters

The logic is straightforward: if you want to test whether sentiment predicts price, you should focus on articles that are themselves predictive in nature. An article forecasting Fed rate cuts is more likely to lead price than an article summarizing last week's market action.

  • Future-oriented articles are more likely to lead price — they discuss events that haven't happened yet, giving the market time to react.
  • Past-oriented articles tend to lag price — they report on events the market has already digested.

Effect on Correlation Sweep

Enabling "Predictive Only" in a Correlation Sweep often has a measurable effect:

  • Improved correlation strength at positive lags — the sentiment → price signal gets cleaner because you've removed reactive articles that were adding noise at negative lags.
  • The best lag may shift — without past-oriented noise, the optimal lead time might become more pronounced.
  • Statistical significance may improve — cleaner data often produces lower p-values, even with fewer data points.

Effect on Source Predictability

When applied to Source Predictability, the "Predictive Only" filter helps identify which sources publish the most forward-looking content. A source that ranks high on predictive-only analysis is one that consistently produces forecasting content that leads price.

The Trade-off

Typically, only 30–40% of articles are classified as future_oriented. Enabling this filter therefore reduces your dataset by 60–70%. For assets with sparse news coverage, this can push the dataset below the minimum threshold needed for reliable statistics.

Recommendation: use "Predictive Only" for high-volume assets (Bitcoin, Ethereum, Gold) where you have thousands of articles to filter from. For lower-volume assets, the dataset reduction may hurt more than the signal improvement helps.

When NOT to Use Predictive Only

There are valid use cases for analyzing non-predictive content:

  • Studying media reaction patterns — if you want to understand how media responds to price moves (lag behavior), use past_oriented instead.
  • Narrative Clustering — media convergence events often include a mix of forward and backward-looking content. Filtering to predictive-only may break apart clusters that are meaningful in aggregate.
  • Baseline comparisons — run the same analysis with and without the filter to quantify exactly how much the signal improves.

Why This Matters

Intent classification is one of SentiSignal's unique features. Most sentiment tools treat all articles equally — a forecast and a recap get the same weight. By separating predictive from reactive content, you can isolate the subset of news that is most likely to carry forward-looking signal, making your analyses more precise and actionable.