Catalyst-Q

How Catalyst-Q Compares

A buyer-readable scoreboard for proof quality: Freight RouteOps proof packets first, Exact Chemistry as the premium second room, then flight, grid, and ATC evidence layers as benchmarks mature.

Proof lab Build ROI packet

Why buyers trust it.

Evidence-first verification layer: Catalyst-Q is the system of record for expensive scientific, routing, grid, flight, and safety decisions that teams need to trust before they act.

Bring us the disputed catalyst, material, route plan, dispatch scenario, or safety decision; Catalyst-Q returns a replayable proof packet with baselines, assumptions, ROI math, and the next evidence gate.

What Catalyst-Q gives you.

Prove the decision before you automate it.
Independent verification for black-box scientific and operational AI.
From candidate generation to evidence-backed go/no-go.
Buyer-safe proof packets for teams that cannot afford a wrong answer.

Competitor landscape.

A clear view of incumbent strengths, where Catalyst-Q fits, and which proof packet comes next.

Formal verification

Cajal Technologies / Lean/Coq/Isabelle-based proof-agent stacks

Their strength: They own the hot proof-certificate narrative: AI agents produce machine-checkable mathematical or software correctness artifacts.

Catalyst-Q angle: Specialize in scientific result verification and operational proof packets, not compiled-binary correctness: verify chemistry scope, solver baselines, replay ids, and ROI evidence.

Customer leverage:

Adopt proof-certificate language around packet.verify() while keeping the proof scoped to scientific consistency and replay.
Publish examples where the packet catches missing active-space metadata, broken charge/spin assumptions, or unproven ROI claims.
Make verification understandable to a VP R&D or VP Ops without requiring them to read proof-assistant code.

Next customer proof: Tamper-evident packet demo: original verifies, modified active space or energy claim fails.

Quantum chemistry platforms

Quantinuum InQuanto / QunaSys Qamuy / Phasecraft / Qiskit Nature

Their strength: They have deep quantum-algorithm credibility, hardware/cloud ecosystems, and researcher trust.

Catalyst-Q angle: Win the active-space verification workflow around disputed results: parse customer assumptions, run scoped checks, compare PySCF/OpenFermion-style references, and emit an auditable proof packet.

Customer leverage:

Benchmark small active-space systems against PySCF/OpenFermion/Psi4 and publish exact fixtures.
Package each result as a customer-ready report instead of only a notebook or circuit artifact.
Make the API feel like procurement-safe verification: inputs, assumptions, proof, evidence scope, scientist review.

Next customer proof: Exact-chemistry benchmark packet set for small molecules and transition-metal fragments.

AI materials discovery

CuspAI / Orbital Materials / Periodic Labs / Microsoft MatterGen/MatterSim

Their strength: They are funded and talent-dense candidate-generation engines with strong AI-for-science narratives.

Catalyst-Q angle: Verify the top 0.1% candidates from generated pipelines before wet-lab spend, partner review, or investor diligence.

Customer leverage:

Offer verification-as-a-service to materials teams that already have candidate generators.
Show how a packet reduces false-positive wet-lab spend and flags DFT-sensitive assumptions.
Integrate as an MCP/API verifier that an AI scientist can call before promoting a candidate.

Next customer proof: Candidate triage demo: generated material enters, verification packet ranks evidence gaps and go/no-go confidence.

Enterprise AQ / physics AI

SandboxAQ / Schrodinger / XtalPi / Iambic

Their strength: They bring enterprise trust, domain PhDs, proprietary datasets, and pharma/materials partnerships.

Catalyst-Q angle: Attach an independent proof and replay layer to high-value disputed outputs, especially where the buyer needs a second opinion before committing capital.

Customer leverage:

Lead with narrow paid verification packets instead of trying to match enterprise platform breadth.
Use transparent baselines and claim ledgers as a trust wedge against broad black-box platforms.
Target smaller teams inside large enterprises who need a fast external challenge report.

Next customer proof: Third-party-style report comparing customer approximate methods against a Catalyst-Q verification packet.

Freight route optimization

Google OR-Tools / PyVRP / VROOM / GraphHopper / Onfleet

Their strength: They have mature solvers, routing APIs, dispatch UX, and production integrations.

Catalyst-Q angle: Sell baseline-vs-Catalyst ROI proof: replay current routes, compare public solver baselines, quantify miles, lateness, capacity, route stability, fuel, and emissions proxy.

Customer leverage:

Finish OR-Tools/PyVRP/VROOM benchmark evidence and publish the comparison packet.
Make CSV upload and ROI packet generation effortless for ops leaders.
Price around verified savings and fast paid pilots instead of generic route planning seats.

Next customer proof: Freight proof packet: customer baseline vs OR-Tools/PyVRP/VROOM vs Catalyst-Q with savings math.

Grid, flight, and ATC incumbents

GE Vernova GridOS / Siemens Spectrum Power ADMS / Boeing Jeppesen / Thales TopSky / Frequentis

Their strength: They own regulated workflows, integrations, procurement trust, and safety/compliance posture.

Catalyst-Q angle: Start as advisory proof and simulator/offline replay: verify scenarios, rank options, expose assumptions, and preserve human approval.

Customer leverage:

Lead with offline replay and advisory workflows while building benchmark credibility.
Publish PGLib/MATPOWER, OpenSky/BlueSky, and simulator replay packets.
Sell to innovation teams as a decision-evidence layer that complements installed systems.

Next customer proof: Offline replay packet showing operator-approved alternatives and explicit no-control-action boundaries.

How we prove value.

Separate local synthetic eval signals from public benchmark proof.
Name incumbent platforms and open baselines so buyers see exactly what Catalyst-Q must beat or complement.
Use inspection policies as product guardrails before marketing or sales language is promoted.
Prefer recorded baseline fixtures first, then pinned live solver/simulator runners in CI.

What is ready now.

Freight RouteOps is the first PMF wedge because buyers can inspect one route file, one proof packet, and one approval decision quickly; exact chemistry remains the premium second room; flight ops, grid, and ATC should stay advisory/offline until public benchmark and integration gates mature.

freight-field-routing
exact-chemistry-verification
flight-ops-routing
grid-optimization
atc-decision-support

Evidence buyers can inspect.

https://github.com/CrewRiz/catalyst-q-benchmarks/blob/main/docs/claims_policy.md
https://github.com/CrewRiz/catalyst-q-benchmarks/blob/main/results/full_evidence_package.md
https://github.com/CrewRiz/catalyst-q-benchmarks/blob/main/results/high_qubit_exactness/high_qubit_exactness.md

What it proves today.

Live SDK/API QUBO and Max-Cut smoke records match exact references on named bundled instances
High-qubit targeted exactness is artifact-scoped to named supported query families
Vertical ROI packets and public baseline campaigns are the next evidence batch

Vertical offers.

Each vertical has a clear buyer wedge, a comparison set, and the proof packet required for confident adoption.

verification beachhead

Exact Chemistry Verification

Self-verifying active-space packet now returns a SHA-256 consistency proof, replay id, declared active space, DFT comparison, small-subsystem checks, and a buyer-ready evidence scope.

Competitor baselines:

PySCF: Open quantum-chemistry reference for molecular integrals, active-space workflows, and exact-diagonalization comparisons on small systems.
OpenFermion: Reference toolchain for fermionic Hamiltonian construction and quantum chemistry problem encodings.
CuspAI / AI materials-discovery platforms: AI science/materials companies generate candidates; Catalyst-Q should position as the verification oracle for disputed physics.
Cajal Technologies: Formal-verification-agent trend reference; Catalyst-Q should emulate the trust posture for quantum simulation outputs.

Where Catalyst-Q wins now:

A premium verification packet is easier to monetize than generic self-serve quantum compute.
packet.verify(), .rain replay, and small-subsystem checks directly address black-box skepticism.
The wedge complements AI materials-discovery companies instead of competing with their candidate-generation engines.

What we validate before expansion:

Current packet is a consistency-proof product surface, not yet a published chemistry accuracy benchmark suite.
External PySCF/OpenFermion/DMRG/FCIQMC reference campaigns are needed before strong accuracy claims.
Enterprise chemistry buyers will require scientist review, data-security review, and careful active-space scoping.

Benchmark campaigns underway:

Run small active-space molecules and transition-metal fragments against PySCF/OpenFermion references.
Record symmetry, charge, spin, and small-subsystem replay checks for every packet.
Publish the chemistry evidence policy that defines the exact active-space scope, references, review role, and promotion gates.

Open workbench View evidence dashboard

paid pilot ready

Freight & Field RouteOps

Local synthetic CVRPTW-shaped eval passes with 39.9% objective improvement versus nearest-neighbor baseline, no capacity violation, no lateness, and minimal-change replan passing.

Competitor baselines:

Google OR-Tools: Pinned public VRP baseline for capacity, time-window, pickup-delivery, and routing-constraint feasibility.
PyVRP: Strong open-source VRP solver baseline for CVRPLIB and Solomon-style instances.
VROOM: Fast open-source vehicle-routing optimization engine baseline for operational routing APIs.
GraphHopper Route Optimization API: Commercial routing API reference for routing constraints, integrations, and route optimization user expectations.

Where Catalyst-Q wins now:

Fastest path to a paid ROI pilot because the buyer can measure miles, lateness, vehicles, fuel proxy, and dispatcher workload.
.rain replay and Catalyst Brain memory give the buyer a repeatable proof trail instead of only a black-box route answer.
Cloudflare Browser Run and Pipelines can capture authorized portal evidence and live route/run telemetry for finance-grade ROI packets.

What we validate before expansion:

OR-Tools, PyVRP, and VROOM are live in the Cloudflare benchmark runner; the current production mini fixture uses them as ensemble seeds rather than a best-in-class superiority claim.
The current 39.9% synthetic improvement and live 142.27 open-solver objective need larger CVRPLIB/Solomon evidence before broad solver-superiority language.
Production competitors have mature TMS, driver-app, mapping, and dispatch integrations.

Benchmark campaigns underway:

Use the Worker-native WASM benchmark fallback for no-Docker local smoke only; do not treat it as external solver evidence.
Use the live Cloudflare benchmark runner OR-Tools/PyVRP/VROOM rows as ensemble seeds for freight-routeops-mini-v1, then require Catalyst-Q to beat, match, or explain gaps before dispatcher review.
Run Solomon VRPTW and CVRPLIB cases with normalized distance, lateness, capacity, vehicle, and minimal-change metrics.
Gate marketing language on feasible routes within an agreed objective gap against the best feasible public baseline.

Open workbench View evidence dashboard

offline analysis pilot

Flight Ops Route Intelligence

Aviation decision-support eval passes with synthetic OpenSky/BlueSky-shaped data, ranked resolution options, fuel/delay proxy, and no autonomous clearance language.

Competitor baselines:

NAVBLUE N-Flight Planning: Incumbent flight-planning benchmark for dispatcher workflows, weather/NOTAM integration, and operational scale.
Boeing Jeppesen flight planning: Reference for certified aviation planning expectations and airline operations integrations.
Lufthansa Systems Lido Flight 4D: Reference for flight-plan optimization, airline dispatch integration, and regulated aviation buyer expectations.
OpenSky Network: Public aviation-state data source for offline scenario replay and non-operational benchmark fixtures.

Where Catalyst-Q wins now:

Strong advisory and audit posture: every recommendation remains dispatcher/controller reviewed.
Good fit for offline what-if analysis where route, weather, fuel, and delay tradeoffs need to be explained.
Shared Catalyst-Q scenario packets can reuse freight ROI proof and replay infrastructure.

What we validate before expansion:

Incumbents are certified, integrated, and trusted in operational dispatch environments.
Current eval is synthetic and does not use real airline flight-plan archives.
No weather, NOTAM, overflight-fee, aircraft-performance, or dispatch-system adapter is production-ready yet.

Benchmark campaigns underway:

Add OpenSky-shaped historical replay fixtures and BlueSky simulation scenarios.
Score fuel/time/risk deltas against dispatcher-approved baselines, not only synthetic proxy scenarios.
Keep outputs as offline analysis until legal/safety review approves a narrower operational scope.

Open workbench View evidence dashboard

advisory proof

GridOps Dispatch Intelligence

Mini OPF-shaped eval passes with full 220 MW served, zero unserved load, zero renewable curtailment, zero line/voltage violations, and required operator approval; the PGLib/MATPOWER live lane parses selected public cases up to 2,000 buses, runs the pinned catalyst-dcopf-cg-v1 DC power-flow runner, and includes the bounded catalyst-dcopf-cut-v1 line-constrained redispatch gate.

Competitor baselines:

PGLib-OPF: Public AC optimal-power-flow cases for MATPOWER-compatible benchmark gates.
MATPOWER: Power-system simulation and OPF reference for case parsing, dispatch feasibility, and solver comparisons.
GE Vernova GridOS ADMS: Incumbent ADMS/DERMS platform reference for utility integration and control-room expectations.
Siemens Spectrum Power ADMS: Incumbent grid operations reference for ADMS capabilities, reliability posture, and utility procurement standards.

Where Catalyst-Q wins now:

Clear advisory-plus-evidence wedge for utilities that need replayable dispatch, contingency, and curtailment analysis.
Catalyst-Q can package scenario exploration and approval gates without trying to control SCADA directly.
Strong ROI narrative around curtailment, congestion, operating cost, and avoided violations.

What we validate before expansion:

Current PGLib/MATPOWER proof is selected-case DC screening, not full DCOPF or ACOPF optimization superiority.
Incumbents own ADMS/DERMS/SCADA integrations, utility support processes, and compliance workflows.
No production state-estimation, telemetry-quality, N-1 security-constrained OPF, or utility data adapter is complete yet.

Benchmark campaigns underway:

Expand eval:grid:pglib across larger PGLib-OPF cases and scheduled benchmark runs.
Scale the line-constrained redispatch gate into full larger-case DCOPF, then add ACOPF validation for voltage, reactive power, and nonlinear feasibility.
Require operator-approval and no-control-action language in every grid prompt and API output.

Open workbench View evidence dashboard

simulator training wedge

ATC Simulator Decision Support

Aviation safety eval now passes with 2 detected separation conflicts, 2 ranked resolution-option packets, 3 precise human handoffs, fuel/delay proxy, and no autonomous clearance.

Competitor baselines:

BlueSky Open Air Traffic Simulator: Open simulator loop for replayable conflict-detection and resolution-option benchmark scenarios.
Adacel MaxSim: Commercial ATC simulator/training reference for buyer expectations and scenario-training workflows.
Thales TopSky ATC: Operational ATC platform reference for why Catalyst-Q must stay simulator/training first.
Frequentis OneATM: ATM automation and communications incumbent reference for safety-case and integration expectations.

Where Catalyst-Q wins now:

Clear safety boundary: ranked options, false-positive discipline, and human authority are explicit in the eval contract.
Compelling training/simulator add-on if Catalyst-Q can generate replayable scenarios and explain missed/false conflict behavior.
.rain replay can become a strong safety-case artifact for simulation and research programs.

What we validate before expansion:

Not suitable for live ATC operations.
No certified integration with operational ATM systems or controller workstations.
Needs BlueSky-style scenario replay, workload scoring, and false-positive/false-negative measurement at scale.

Benchmark campaigns underway:

Add BlueSky scenario replay fixtures for conflict detection and resolution-option quality.
Measure missed conflicts, false positives, workload proxy, fuel/delay proxy, and human handoff precision.
Keep all commercial language anchored to simulator/training and offline what-if analysis.

Open workbench View evidence dashboard