Why alternative data trials fail

The reality of selling alt data to quant funds

Almost every data provider’s pitch includes a chart that looks like this:

Dataset Backtest Performance

Unfortunately, live results rarely match backtested performance. In our conversations with quant teams at both large funds and small ones, we find that promising datasets often stall on packaging, not signal quality.

The quant fund data evaluation funnel

The core challenge for funds who use alternative data isn’t finding signal, it’s developing enough conviction to act on it.

Many strategies that look promising in-sample fail very expensively in live trading (out-of-sample). Even sound strategies with strong live records experience periods of underperformance, making short-term results difficult to distinguish from noise.

What Quants Fear Most

Funds use backtests to evaluate potential trading strategies, and backtests are only as reliable as the historical inputs they depend on. Those inputs are rarely fixed. Methodologies evolve. Records are revised. Companies enter and exit the coverage universe. Timestamps fail to reflect when funds could have traded on the data. LLM-driven workflows make this problem worse, not better: commercial LLMs tested on Look-Ahead-Bench produce impressive in-sample results that collapse out-of-sample because their training data already contains the test period.

This is why the first questions quant funds ask about third-party data are often about provenance: how methodology has changed over time, how revisions are tracked, and what timestamps mean in practice.

The limits of data trialsThe Limits of Data Trials

Given these challenges — and many investors’ history of false positives from underperforming strategies — funds use live trials to build confidence in third-party data. But trials are a blunt and expensive tool.

Evaluating a dataset via trial requires scarce engineering, research, and operational resources. For large funds, the direct and opportunity cost of a data trial can be substantial — often comparable to the cost of the data itself.

Even when trials occur, they are often too brief to separate signal from noise, causing many datasets to fail trials for reasons unrelated to actual dataset quality.

Why broken causality matters more than messy data

Why broken causality matters more than messy data

Quant investors are used to working with imperfect data. In fact, cleaning messy data can be a source of competitive advantage. Broken causality, however, cannot be easily repaired and is the core failure mode behind many stalled data evaluations.

Broken causality means a fund isn’t sure which version of the data was available at the point-in-time of a backtested trade. Even perfect earnings forecasts are unusable if a fund can’t confirm the data was available before the earnings release. A one-minute timing lag can make the difference between millions in profit and millions in losses.

The problem often stems from routine maintenance — seemingly beneficial data pipeline operations such as historical revisions, methodology updates, versioning, and timestamp adjustments. The same issue occurs when coverage expands, identifiers are remapped, or backfilled values appear in the data before they were actually known.

When causality in third-party data is ambiguous, the chain of cause-and-effect collapses, and investors lose confidence in all derived data and signals.

A recent post from Glassnode shows the problem clearly. Two identical strategies are run using the same underlying input data. One strategy uses the point-in-time series, the other uses the as-revised series. The strategy using as-revised data returns 120%, while the point-in-time version returns 40%.

This isn’t an anomaly — backtests built on as-revised data regularly outperform those using point-in-time data.

For quants, this creates a critical analytical bias and reproducibility problem, and it is why funds routinely walk away from promising high-signal datasets without even a trial.

Case study: when good data fails for the wrong reason

You’ve spent months building a dataset of historical sales indicators for large technology companies. The data is well-constructed, professionally maintained, and shows strong backtest performance across multiple historical periods. A mid-sized systematic fund agrees to trial it.

Three months in, the fund’s live results are modest but inconsistent. Their research team asks about historical revisions and methodology changes after noticing small discrepancies between the historical and live data. You know the discrepancies are benign, but proving it is difficult.

The fund labels the trial inconclusive and passes on the dataset. You’ve spent months supporting the evaluation and received a polite no with no actionable feedback. You don’t know if the signal disappointed or if the fund simply couldn’t develop conviction. And you’re unlikely to get another shot with this fund for years.

How verifiability changes the conversation

Now imagine you could have pointed them to an independently verifiable revision record. The same discrepancies exist, but now the fund can see for themselves that the historical data hasn’t changed.

This also lets the fund evaluate data using the full historical record, not just a narrow trial window. Performance during the trial becomes a small component of the overall purchase decision, rather than the critical indicator of live performance.

Even with identical data and identical live trial performance, the fund’s confidence in its own evaluation of the data is completely different. And that confidence is often the difference between an inconclusive trial and sustained adoption.

The Replay Test

If you’re not sure where your dataset falls on these dimensions, there is a simple diagnostic we call the Replay Test.

Can a buyer quickly reconstruct what your dataset looked like on any past date — including schema, entity mappings, and what changed since then — and begin credible backtesting?

If the answer is “not reliably,” that’s where many evaluations stall. If your dataset passes the Replay Test, you’ve addressed one of the most common reasons that promising data evaluations fail to convert.

Other practical steps that reduce evaluation friction

In systematic workflows, you’re not just selling a dataset; usability and reproducibility often matter as much as the underlying signal.

A few practical tips for providers:

  1. Make coverage and taxonomy unambiguous. Ticker changes, delistings, and corporate actions create hidden failure modes unless entity mapping is explicit and point-in-time.
  2. Treat delivery as part of the product. Keep the schema stable. Version changes. Publish a clear change log.
  3. Be explicit about revisions. If historical values, methodologies or universes can change, say so, and specify what changes, why, and when.

None of this guarantees a deal, but it removes the most common reasons evaluations stall.

At validityBase, we build infrastructure that helps data providers pass the Replay Test and package data to maximize the chance of successful trials.

That includes audit trails for funds to verify what was delivered and when — as well as the packaging, documentation, and delivery infrastructure that funds expect before they’ll commit research resources. The result is shorter evaluation cycles, fewer stalled trials, and a dataset that’s easier for research teams to defend internally.

Get Started
To assess your own readiness for systematic evaluations, see the one-page checklist we use with providers. Or reach out at trials@vbase.com to walk through it together.

Dan Averbukh is the co-founder and CEO of validityBase, where he works on data reliability and evaluation infrastructure for systematic investing.

author avatar
Dan Averbukh
Dan Averbukh is the co-founder and CEO of vBase. He previously founded Clerkenwell Asset Management, a systematic hedge fund, where he worked directly with external datasets, signals, and strategy evaluation. Today, that experience shapes vBase’s practical infrastructure for teams making credible predictive claims.