Nine Pounds of Ore for an Ounce of Gold
Reading the daily financial news is like mining gold
Last night the pipeline pulled 1,199 financial news articles tagged across nine GICS sectors. It started at 9 PM Mountain Time and finished around 1 AM. By morning we had sorted the catch. One hundred and six articles carried direction. The other one thousand ninety-three were ore.
If you mine gold in the western United States, you process about a ton of rock for every five grams of metal you recover. A financial news feed is a much richer ore than that. About one headline in eleven contains something a portfolio manager might act on. The rest is gravel and dust. But the structure is the same. Most of what you process is not what you came for.
We have been running this pipeline for several months. We know the ratio holds across days and sectors. What we want to write about here is what happens when you ask a sentiment model to do the sorting for you, and what the model sees when it looks at the rock.
What “neutral” means
Both FinBERT and Claude read each headline and output one of three labels: positive, neutral, or negative. The labels look identical on paper. They are different judgments.
For FinBERT, neutral means the language of the headline does not lean in either direction. The model was trained on financial text to recognize tone. A headline with charged vocabulary, like soars or plunges or crisis or opportunity, gets a direction. A headline with flat vocabulary, like reports or announces or files, gets neutral. The judgment is about the words.
For Claude, neutral means a portfolio manager could not plausibly act on this single headline today. The judgment is about the event. A neutral verdict says nothing happened here that should change a position. The vocabulary of the headline is incidental. What matters is whether there is a real action, a real number, or a named analyst direction behind the words.
These two definitions overlap. They also disagree on roughly one article in five. That disagreement is what this post is about.
What 1,093 neutral articles look like
When you read through the day’s neutral pool, the categories sort themselves into recognizable types:
Stock-page listings and options chain pages. “WULF Jun 2026 12.000 call Interactive Stock Chart” is not a news article. It is a quote page.
Post-event recaps and analyst-question summaries. “Five Revealing Analyst Questions From Snap’s Q1 Earnings Call” is commentary on something that already happened. The news was the call. This is what someone wrote afterward.
Wrong-ticker confusion. Bill Ackman discloses a stake in “a megacap tech stock” and Google News routes the article to BILL Holdings, the small-cap fintech. The headline is not about BILL.
Multi-company sector features. “Five AI Stocks Worth Watching This Quarter” name-checks a dozen tickers without saying anything specific about any of them.
Vague analyst mentions. “Five-star analyst resets Nvidia stock price target ahead of earnings” with no figure and no direction.
Rating reiterations. “Barclays Maintains an Overweight Rating on Arista Networks.” Same rating as yesterday. No new information.
Routine insider activity at immaterial dollar amounts. A $6,575 CFO sale at PSQ Holdings does not move PSQH.
None of this is bad reporting. Most of it is the substrate of financial media. Quote pages, opinion columns, retrospective analysis, aggregation, syndication. It serves a purpose. But it is not the kind of news a PM acts on, and a sentiment model that assigns it a direction is mostly modeling chatter.
What 106 non-neutral articles look like
The directional pool is small but substantial.
Some of it is concrete corporate actions. Cisco laying off 4,000 employees in an AI overhaul. Tesla raising Model Y prices for the first time in two years. Boeing’s 200-jet China deal reopening a previously closed market. The FTC opening an antitrust probe into Arm Holdings. BlackRock weighing billions of dollars into a SpaceX IPO.
Some are specific numbers against expectations. CF Industries up 8.9% after a Q1 earnings beat. Suncor up 7.3% on strong Q1 results. EOSE producing a surprise profit when analysts had modeled a loss. Salesforce committing $300 million to Anthropic.
Some are named analyst actions with stated direction. Oppenheimer naming Oracle a top pick and raising the target. BTIG cutting MercadoLibre as margins reset near 7%. Goldman Sachs raising Enterprise Products Partners on stronger-than-expected results.
These are the headlines a PM reads first thing in the morning. They are the headlines that prompt phone calls. They are 8.8% of what arrived in the feed.
What FinBERT sees instead
FinBERT is the standard. It is the model most people reach for when they want financial sentiment scoring, and it has earned that position. It is fast, free, runs on a laptop, and gives interpretable probabilities. We use it ourselves as the first stage of our pipeline because it is good at one specific thing: telling neutral apart from non-neutral on a large stream of articles.
The question is whether FinBERT’s definition of neutral matches Claude’s. We can check.
Among Claude’s 1,093 neutral articles, FinBERT also calls 900 neutral, or 82.3%. It calls 103 positive (9.4%) and 90 negative (8.2%). The alignment is high but not perfect. About one neutral article in six gets a directional FinBERT label.
Among Claude’s 106 non-neutral articles, FinBERT calls 44 neutral, or 41.5%. It calls 39 positive (36.8%) and 23 negative (21.7%). FinBERT misses the direction on more than two of every five articles Claude judged actionable.
Two errors, going in opposite directions. They are not symmetric. We had to think about why.
What FinBERT misses
Here is a sample of headlines Claude labeled positive or negative and FinBERT called neutral:
“Cisco Lays Off 4,000 Employees In AI Overhaul”
“Tesla raises prices of Model Y cars in the US for the first time in two years”
“Eli Lilly Reports New Late-Stage Weight Loss Trial Results”
“Oracle Named Top Pick as Oppenheimer Lifts Price Target”
“Teva Pharmaceutical Reports Q1 Beat, Plans Amylyx Deal”
“BlackRock weighs investing billions in SpaceX IPO”
A person reads these and immediately recognizes them as news. A layoff of 4,000. A first price increase in two years. Late-stage trial results. A top-pick designation with a target lift. An earnings beat plus a deal announcement. Billions into a private-market IPO.
FinBERT sees these and shrugs. The language of corporate news is tonally flat. “Reports Q1 beat” is informationally rich and emotionally bland. “Lifts price target” sounds procedural. “Lays off 4,000 employees” reads as fact rather than as alarm. FinBERT was trained on financial text, and the headline writers it learned from wrote in the same register the headline writers we want to classify still write in: dry, declarative, unemphatic. The directional content lives in the proper nouns and the numbers, and FinBERT is not built to extract that.
What FinBERT hallucinates
The reverse error is just as systematic. Headlines Claude called neutral and FinBERT assigned a direction:
“Comcast’s Rural Broadband Push And Ad Tech Shift Reshape Growth Story” (FinBERT: positive)
“Coinbase Faces New Rules As DeFi And USDC Partnerships Reshape Outlook” (FinBERT: positive)
“Carnival Valuation Check After FTSE Index Removal And Recent Share Price Weakness” (FinBERT: negative)
“Evaluating Gilead Sciences Valuation After Q1 Beat Guidance Shift And New HIV Priority Review” (FinBERT: positive)
Notice the pattern. These are Seeking Alpha and Yahoo Finance opinion pieces. The headlines contain words that sound directional. Reshape. Push. Shift. Weakness. But the articles themselves rehash known information and arrive at a “valuation check” rather than a recommendation. There is no new event, no new number, no named-firm action. They are commentary.
FinBERT picks up on the tonally charged vocabulary and assigns a label. The label is consistent with the headline’s mood. It is not predictive of anything that will happen to the stock tomorrow, because nothing happened today.
Why the errors are not symmetric
The two failure modes look like opposite mistakes but they share a cause. FinBERT is reading the words, not the events.
On the misses, the events are real but the language is bureaucratic. An earnings release reads like an earnings release. A layoff announcement reads like a memo. The words carry no charge. FinBERT, looking for tone, finds none, and concludes there is no signal.
On the hallucinations, the events are absent but the language is loaded. Opinion writers use evaluative vocabulary. Sector commentary leans on directional verbs. Valuation pieces argue for or against the stock. The words carry charge but reference no new fact. FinBERT, looking for tone, finds it, and concludes there is a signal.
Both errors flow from the same place. FinBERT is a tone classifier applied to a domain where tone and information have decoupled. Most directional news now comes in flat language. Most non-directional news comes in charged language. The correlation between how a sentence sounds and what it implies for a stock has weakened, and a model trained on the older correlation is going to be wrong in both directions.
A constructive use for FinBERT
The cross-tab suggests something the “FinBERT is wrong” reading misses. FinBERT’s neutral calls are reliable on the bulk of news, 82% accurate on what Claude also calls neutral. The errors concentrate in the non-neutral verdicts, where FinBERT picks up tone in articles that lack actionable content. A workflow that trusts FinBERT’s neutral verdicts and reroutes its non-neutral verdicts to a more careful reader captures most of the value of either model at a fraction of the cost of running the careful reader on everything. We have built something along those lines and will describe it in a follow-up.
But is neutral really useless?
Reading this back, we notice that we have been treating the 1,093 neutral articles as if they were chatter to be filtered out. That is the right framing for the question we have been asking, which is which articles predict next-day return direction at the ticker level. The neutral pool does not predict that. By construction, neither model thought it should.
But there is a different question we have not asked, and it deserves its own analysis. Is the volume and composition of neutral coverage carrying information of its own?
Consider three sub-questions. First, do tickers that generate a lot of neutral coverage on a given day behave differently than tickers that generate little? Coverage volume tracks attention, and attention is correlated with both volatility and trading volume regardless of tone. A name with thirty neutral articles today is in a different state than a name with three, even if no individual article would prompt a trade.
Second, does a heavy neutral day precede a non-neutral day? It is at least plausible that commentary clusters before catalysts. The chatter about “what to watch” and “valuation check” may reflect analyst attention that resolves into action a day or two later. We do not know if this is true. We have not looked.
Third, within the neutral pool, are there sub-types that carry signal? “Five Reasons to Watch X” articles cluster differently than “X Valuation Check” articles. A second-pass classifier on the neutral pool would tell us whether some of the chatter is informative after all.
None of these questions have answers yet. They are the natural next research direction after this post. The version of the cross-sector matrix we have published treats the neutral pool as a filter target. There is probably a version that treats it as a feature.
What we want to flag here is that “neutral does not predict direction” is a narrower claim than “neutral is useless.” We have been making the narrower claim. The wider one deserves its own piece.
What this means
Three things follow from the cross-tab and the FinBERT error pattern.
First, the directional pool is much smaller than the universe of articles. If you are reading every headline that arrives in a ticker’s feed expecting all of it to inform a position, you are spending nine times as much attention as the directional content justifies. The discipline is filtering, not aggregating.
Second, sentiment models built before 2023 are doing well at a slightly different task than the one we now want to solve. FinBERT was a remarkable piece of work when it was released, and we use it daily. But the test set has shifted. The headline-writing register has flattened. Opinion content has grown faster than reporting. A classifier trained to map words to mood is doing well at mapping words to mood, and that is no longer the same thing as mapping news to action.
Third, the right way to use a tone classifier is as a filter, not as a signal. The asymmetric error pattern, accurate on neutrals and unreliable on non-neutrals, is exactly what makes FinBERT useful in combination with a careful reader and limited on its own.
We have been running this pipeline on our watchlist for several months. We are preparing to make it available to others. Same daily dashboard, same three model labels per article, same Claude reasoning, scoped to a watchlist of your choosing. If you would like to know when that happens, leave a comment.


Fantastic explanation of a complex AI analysis. Thank you!