In practice, these three data sources can be understood as three types of “evidence”:
Typical examples include regulatory statements, macro data, exchange announcements, project upgrades, funding and partnership disclosures, etc. Their value lies in providing identifiable time points and event boundaries, making them suitable as “narrative starting points.”
Typical examples include discussion volume, retweet structure, KOL concentration, sentiment polarity, topic clustering, etc. Their value lies in measuring the speed and crowding of narrative diffusion, suitable as “narrative intensity and risk temperature.”
Typical examples include large transfers, exchange net inflows/outflows, stablecoin supply changes, derivatives open interest and funding rates, trade distribution, etc. Their value lies in verifying whether the narrative truly translates into capital action, suitable as the “realization validation layer.”
The key to narrative trading is not to rely solely on one data type but to let all three complement each other: news provides the starting point, social media provides the temperature, on-chain provides the validation.
The advantage of news signals is clear event boundaries, making time series research easier. But common pitfalls are also apparent:
Therefore, news data is better suited as the foundation for an “Event Calendar” and “Narrative Tag Library,” rather than as a direct high-frequency trading trigger.
In practice, news is usually processed into three types of tags:
Social media data is extremely sensitive to narrative trading because it directly captures attention shifts. However, its noise structure is more complex:
Therefore, social media data is better used for generating “diffusion structure indicators,” rather than simple sentiment scores.
More valuable structural dimensions include:
These dimensions are closer to the formation process of capital behavior than simple positive/negative word frequency.
The biggest advantage of on-chain data is verifiability and statistical resistance to forgery, making it suitable as the “realization layer” of narratives. The challenge lies in explaining the chain of causality:
Therefore, on-chain data is better suited to answer “is capital actually moving” rather than “why price must rise.”
In narrative trading frameworks, on-chain indicators typically serve three validation tasks:
To reduce noise and improve actionable quality, a three-layer pyramid structure can be used:
Used to verify whether narratives translate into capital behavior.
Used to measure narrative intensity, crowding, and persistence.
Used to locate narrative starting points and update cadence.
The significance of this structure is that any trading action should strive for “at least two layers of evidence resonance.” Single-layer evidence (especially just social media hype) usually serves only as an observation object—not a stable strategy input.
The three data types have different time granularity: news in minutes/hours, social media in second-level pulses, on-chain in block time.
If time alignment isn’t rigorous, “false correlation” easily arises:
In practice, a unified timeline needs to be established:
Time alignment is a prerequisite for all subsequent scoring models and is the critical threshold for narrative research entering live trading.
Scenario: A token releases positive news
Actual timeline (aligned)
Common Error
Treating “data appearance time” as “actual occurrence time”
Misjudgment result: Price rises first; capital enters afterward; thus incorrectly concluding that on-chain is not the driving factor.
Time travel (using future to explain past)
The issue is introducing future information, distorting backtest results.
Correct Approach
A unified timeline must be established with clear definitions for each time type:
If time isn’t aligned, only superficial correlation is obtained; only within a unified time framework can real driving relationships be identified. This is also the key premise for narrative trading to move from research into live trading.
Before modeling begins, it’s recommended to define data admission rules such as:
Data stacking without admission rules only amplifies overfitting risk.
The long-term competitiveness of narrative trading depends largely on whether data governance is engineered—not on how fancy the indicators are.
This lesson completed the core division of the data layer:
At the same time, this lesson proposed two engineering principles—“evidence pyramid” and “time alignment”—as boundary conditions for subsequent structured modeling.
The next lesson will cover methodology essentials: narrative tags, sentiment scoring, and event mapping—focusing on how to transform unstructured text and on-chain behavior into computable, backtestable, monitorable indicator systems.