CourseUnderstanding Sentiment

What Is VADER and How It Differs from LLM

Understand the rule-based VADER sentiment engine — how it works, where it excels, and when to trust it over (or alongside) the LLM.

6 min read

VADER: Rule-Based Sentiment Analysis

VADER stands for Valence Aware Dictionary and sEntiment Reasoner. Unlike the LLM (which is an AI that "reads" text), VADER is a rule-based NLP algorithm that uses a pre-built lexicon of words and their associated sentiment valences.

It was created by researchers at Georgia Tech and is one of the most widely used sentiment analysis tools in the Python ecosystem. SentiSignal runs VADER as a local Python process — it requires no external API calls, no tokens, and no cloud infrastructure.

How VADER Scores Text

VADER analyzes text word by word and returns four scores:

  • neg (0–1) — Proportion of text with negative sentiment
  • neu (0–1) — Proportion of text with neutral sentiment
  • pos (0–1) — Proportion of text with positive sentiment
  • compound (-1 to +1) — An aggregate score computed from the individual word valences, normalized to the -1 to +1 range

SentiSignal uses the compound score directly as the VADER sentiment value. This makes it directly comparable to the LLM's generalSentiment score on the same -1.0 to +1.0 scale.

LLM vs VADER: Side-by-Side Comparison

Dimension VADER LLM (Gemma-3)
Speed Instant (milliseconds) 2–5 seconds per article
Accuracy Moderate — lexicon-based, misses context Higher — context-aware, understands nuance
Cost Free (runs locally) API tokens per article
Asset extraction Regex pattern matching Prompt-based extraction (learned)
Use case Fast supplementary signal Primary deep analysis

Where VADER Excels

VADER performs well on text with clear, explicit sentiment language:

  • "Bitcoin surges to new all-time high" → VADER correctly identifies strong positive sentiment
  • "Market crashes amid panic selling" → VADER correctly identifies strong negative sentiment
  • "Gold prices rally on safe-haven demand" → Clear positive sentiment detected

Where VADER Struggles

VADER's lexicon-based approach has blind spots:

  • Context-dependent meaning: "Bitcoin crashed through resistance" is bullish (breakout), but VADER sees "crashed" and rates it negative.
  • Sarcasm and irony: "Great, another hack" → VADER sees "great" as positive.
  • Domain-specific jargon: "Liquidation cascade" is very bearish in crypto but may not be in VADER's standard lexicon.
  • Per-symbol differentiation: VADER cannot distinguish that "Fed rate cut" is positive for gold but negative for USD. It assigns one compound score to the entire text.

Data Storage

VADER results are stored in news_items_unified.vader_general_sentiment. Unlike the LLM, VADER does not write per-symbol scores to asset_sentiments_unified. It produces only one compound score per article.

When VADER Is Most Useful

Think of VADER as a "second opinion" rather than a primary signal:

  • Agreement signals confidence: When both LLM and VADER rate an article similarly (e.g., both at +0.5), you can be more confident in the sentiment assessment.
  • Divergence signals investigation: When LLM says +0.4 but VADER says -0.2, something interesting is happening. The headline might sound negative (VADER catches that), but the full article content is actually positive (LLM catches that). Or vice versa.
  • Speed advantage: In high-volume news periods, VADER scores appear instantly while LLM processing may queue up. VADER gives you an early read.

On the chart, you can toggle both LLM and VADER sentiment lines to compare them visually. When the purple VADER line and the blue LLM line move together, the sentiment signal is reinforced. When they diverge, it is worth digging into the articles to understand why.

Next, learn how sentiment scores translate into an intuitive visual rating: The Sentiment Star Rating System.