← Portfolio fitCOMPUTATIONAL MATERIALS DISCOVERYNegatives / eval-data license

Lattice Graph × SandboxAQ

AI + quantum for materials & chemistry simulation

SandboxAQ's large quantitative models for chemistry/materials want benchmark-grade negatives and uncertainty calibration.

Why nowSandboxAQ is actively converting LQM credibility into paid enterprise contracts, and the gap between positives-only benchmarks and real-world accuracy is precisely the vulnerability that surfaces during customer pilots — licensing the negatives corpus and trust signals now, before a model confidently proposes a known dead end to a paying customer, is the difference between a calibrated product story and a reputational correction.

What our platform does for SandboxAQ

Lattice Graph operates a materials knowledge graph spanning millions of compositions, knitting together experimental records, computed properties, synthesis conditions, and patent claims into a single governed structure. Every composition in the graph carries provenance — the source, the method, the conditions — so that any property value can be traced to the evidence that supports it rather than treated as an undifferentiated number. That architecture makes the graph useful not just for lookup but for systematic disagreement analysis: where multiple independent physics engines or literature sources report conflicting values for the same material, we flag the tension and quantify it rather than resolving it arbitrarily. Multi-engine validation sits at the core of how we establish confidence in new candidate materials. We run predictions through multiple machine-learning interatomic potentials — including MACE and CHGNet — alongside density functional theory calculations, requiring phonon stability and thermodynamic consistency to hold across independent methods before a candidate earns a positive signal. The consensus criterion is the key detail: agreement between methods is treated as evidence of reliability, and disagreement is surfaced as a calibrated uncertainty signal rather than hidden. For an organization building enterprise simulation products, that distinction is the difference between a trustworthy engineering tool and a research demo. The asset that most organizations cannot replicate is our labeled negative corpus — more than 23,000 failed-experiment and kill edges catalogued from internal programs and sources that never appear in the published DFT literature. Positive results get published; negative results do not. Our negative atlas directly addresses the selection bias baked into every model trained on the open computational record, giving practitioners a ground truth for what the field has already tried and retired. Paired with cross-source trust signals and the full knowledge-graph API, this constitutes a calibration and benchmark infrastructure that is difficult to assemble from public data alone.

Why Lattice Graph × SandboxAQ

SandboxAQ's decision to build Large Quantitative Models for materials and chemistry simulation represents a bet that quantum-informed, physics-grounded AI can meet enterprise credibility standards — a higher bar than academic benchmarks, because enterprise customers will eventually test model confidence against real experimental programs. The company has the physics depth and computational infrastructure to generate predictions at scale; what is harder to accumulate is the labeled failure record that tells a model what chemistries have already been ruled out, and an independent uncertainty layer that lets a customer trust the model's own reported confidence. Those two signals are structural gaps in the public data ecosystem, not gaps specific to SandboxAQ. Lattice Graph addresses both gaps directly, without competing with SandboxAQ's modeling stack. We are a data and calibration supplier, not a simulation vendor. Our negatives corpus provides the labeled failure record that publication bias systematically excludes, and our cross-source trust and disagreement signals provide a source-aware calibration target that SandboxAQ can score their LQM's uncertainty estimates against. The knowledge graph ties everything to provenance, so when SandboxAQ's enterprise customers ask how a prediction was validated, the answer is auditable rather than opaque. The strategic fit is precise: SandboxAQ is at the moment in its commercial trajectory where a model's benchmark performance needs to hold up under scrutiny from sophisticated buyers in pharma, energy, and advanced materials. A positives-only training and evaluation record looks strong in the lab and becomes fragile the first time a customer asks about a composition that the experimental record has already killed. Licensing the negatives moat and trust layer before paid pilots rather than after a credibility event is the cleanest path to an enterprise-grade accuracy story.

SandboxAQ business lines

  • Large quantitative models (LQMs)
  • AI + quantum simulation for materials/chemistry
  • Enterprise simulation products

Where we fit

Benchmark-grade evals need labeled negatives and calibrated uncertainty. License the negatives/eval atlas + trust & disagreement signals; ground predictions with the KG API. Bespoke data license.

Why nowSandboxAQ is actively converting LQM credibility into paid enterprise contracts, and the gap between positives-only benchmarks and real-world accuracy is precisely the vulnerability that surfaces during customer pilots — licensing the negatives corpus and trust signals now, before a model confidently proposes a known dead end to a paying customer, is the difference between a calibrated product story and a reputational correction.

The Lattice Graph fit for SandboxAQ

SandboxAQ's decision to build Large Quantitative Models for materials and chemistry simulation represents a bet that quantum-informed, physics-grounded AI can meet enterprise credibility standards — a higher bar than academic benchmarks, because enterprise customers will eventually test model confidence against real experimental programs. The company has the physics depth and computational infrastructure to generate predictions at scale; what is harder to accumulate is the labeled failure record that tells a model what chemistries have already been ruled out, and an independent uncertainty layer that lets a customer trust the model's own reported confidence. Those two signals are structural gaps in the public data ecosystem, not gaps specific to SandboxAQ. Lattice Graph addresses both gaps directly, without competing with SandboxAQ's modeling stack. We are a data and calibration supplier, not a simulation vendor. Our negatives corpus provides the labeled failure record that publication bias systematically excludes, and our cross-source trust and disagreement signals provide a source-aware calibration target that SandboxAQ can score their LQM's uncertainty estimates against. The knowledge graph ties everything to provenance, so when SandboxAQ's enterprise customers ask how a prediction was validated, the answer is auditable rather than opaque. The strategic fit is precise: SandboxAQ is at the moment in its commercial trajectory where a model's benchmark performance needs to hold up under scrutiny from sophisticated buyers in pharma, energy, and advanced materials. A positives-only training and evaluation record looks strong in the lab and becomes fragile the first time a customer asks about a composition that the experimental record has already killed. Licensing the negatives moat and trust layer before paid pilots rather than after a credibility event is the cleanest path to an enterprise-grade accuracy story.

The challenge

Name a computational feat you think we can't do.

Here is the specific problem we would take on: give us the chemistry lane where your LQM's predicted formation energies diverge most from experimental or held-out DFT values, tell us the composition range, and we will pull every labeled negative and cross-source disagreement flag from that lane in our knowledge graph, score your training distribution against the failure record, quantify what fraction of your benchmark compositions are adjacent to known dead ends that publication bias would have excluded from your training data, and deliver a calibrated uncertainty map showing which predicted values are likely to hold and which sit in regions where independent sources disagree by more than a defensible threshold — then we will tell you how much your measured benchmark accuracy changes when the evaluation is run on a negatives-inclusive split rather than the positives-only split your current eval likely rests on.

Send us a challenge →

Data & eval products for SandboxAQ

Live data and API products running on our production platform — licensed to your team, with full schemas and access terms on request.

The Negatives and Eval-Data Atlas is the entry point for most SandboxAQ use cases. The atlas catalogs more than 23,000 failed-experiment and kill edges — the labeled negative results that are largely absent from public DFT repositories and open literature, because negative results do not get published. SandboxAQ can use this corpus to build benchmark splits that include realistic failure cases for specific chemistry families such as electrolytes, cathode materials, or heterogeneous catalysts, and to augment LQM training sets so the model has seen labeled dead ends rather than only successful compositions. Wired into a candidate-generation loop, the atlas can screen LQM proposals against the known failure record before surfacing results to a customer, which addresses the most damaging version of the publication-bias failure mode: a confident, wrong recommendation about a chemistry the field has already retired. The Trust and Disagreement Signals product is the uncertainty calibration layer. It returns cross-source disagreement flags and calibrated prediction bounds derived from comparing independent computational and experimental sources. SandboxAQ can use this in two ways: as a training-data QA gate that filters rows where sources disagree significantly — avoiding teaching the model to fit noise — and as an external calibration target against which the LQM's own uncertainty estimates can be scored. For an enterprise simulation product, a model's reported confidence is only as credible as the external reference it is calibrated against, and self-reported confidence calibrated on a positives-only corpus will overfit to the easy cases. The Knowledge-Graph API completes the picture by providing the full provenance, composition-to-structure-to-property-to-patent evidence graph for any material the LQM proposes, along with natural-language query capability for research and eval teams doing triage. Together, these three products transform an opaque benchmark number into a defensible, source-attributed accuracy claim — which is exactly what an enterprise simulation buyer will eventually demand to see.

Negatives & Eval-Data Atlas

23,196 failed-experiment / kill edges plus the honest-negatives set — the labeled negative results most models never see. License for training, eval, and benchmark hardening.

Trust & Disagreement Signals

Cross-source disagreement flags and calibrated prediction bounds — the uncertainty layer for eval pipelines and model QA.

Knowledge-Graph API

Provenance, composition-360, evidence neighborhoods, and natural-language graph queries across the materials knowledge graph.

In the platform for SandboxAQ

For SandboxAQ's research and machine-learning engineering teams, the most practical daily surfaces are the knowledge-graph explorer and the composition-intelligence reports. The graph explorer lets researchers visually navigate the evidence neighborhood around a predicted material — seeing which sources agree, which disagree, and what the disagreement flags look like for the chemistries an LQM is being benchmarked on. Composition-intelligence reports surface a full 360-degree view of a composition: structure variants, property distributions across sources, synthesis conditions, and patent encumbrance, all in one place. These are the tools for triage work: when a benchmark number looks suspicious or an LQM output diverges from graph evidence, the explorer and the composition report are where a researcher goes to understand why. At the workflow level, SandboxAQ's teams can run batches of LQM candidate proposals through the platform's batch screening interface, scoring each against the negatives atlas and the cross-source trust signals in a single pass. The platform's formation-energy predictions serve as an independent cross-check against LQM outputs, and the negatives coverage statistics give eval leads an at-a-glance view of how much of a given chemistry lane is represented in the labeled failure record before they stake a benchmark claim on it. The result is an integrated eval-ops workflow rather than a collection of one-off API calls, which matters as SandboxAQ's eval surface grows across more chemistry families and customer verticals.

How an engagement works

This engagement is structured as a data and evaluation license, not a patent or asset transaction. The natural entry point is a time-boxed scoped evaluation: SandboxAQ licenses the Negatives and Eval-Data Atlas and Trust and Disagreement Signals for one or two chemistry lanes — for example electrolytes and cathode materials — runs them against a current LQM benchmark, and quantifies how labeled negatives and disagreement flags shift measured accuracy and calibration scores. The goal of the scoped evaluation is to produce a documented, lane-specific accuracy delta that SandboxAQ can use internally to justify the expansion and, over time, share with enterprise customers as part of its model validation story. Given SandboxAQ's scale and the breadth of chemistry lanes across its enterprise products, the pricing shape is a bespoke data license rather than a fixed-fee audit; a reasonable starting estimate for an initial scoped engagement is in the low-to-mid six figures, with an expansion path to a recurring full-atlas plus knowledge-graph API license priced against usage volume and lane coverage. The expansion structure would bundle negatives atlas data, trust and disagreement signals, and knowledge-graph API access into a recurring license with refreshed negatives drops as internal programs add to the failure record. Lattice Graph does not co-develop models or take IP positions in SandboxAQ's stack; the relationship is a supplier arrangement — we provide the labeled failure corpus, the calibration signals, and the provenance infrastructure, and SandboxAQ incorporates that into its own training, evaluation, and enterprise product workflows.

Build the SandboxAQ package

Request a sample of the negatives/eval set, the data dictionary, and license terms.

Company names, logos, and trademarks are the property of their respective owners and are referenced here for identification and illustrative purposes only. Their inclusion reflects Lattice Graph's own analysis of where its portfolio may be relevant and does not imply any partnership, endorsement, affiliation, sponsorship, or existing commercial relationship.
Results are informational and should be validated by qualified professionals. See Terms of Service