Back to Blog
Engineering

Building the Agentic Procurement Pipeline

James Reed|March 25, 2026|12 min read

Key Takeaways

  • -A minimum of seven specialist agents run in parallel on every procurement query — sanctions, CO₂, FX, commodities, trade statistics, shipping, and supplier verification. A sentiment agent and an air-freight agent activate when the query pattern makes them material (roster of nine).
  • -A predictive scoring layer fuses their outputs using a confidence ensemble tuned on backtested procurement decisions.
  • -The feature store is two-layer: Redis hot cache for sub-second query latency, Postgres for durable historical analysis.
  • -Every decision is auditable — we log the agent outputs, scores, and model weights that produced the recommendation.
  • -The MCP server exposes the same agentic engine to AI assistants, so any LLM can query procurement intelligence directly.

Serving a complete procurement decision in seconds requires orchestrating a minimum of seven specialist AI agents simultaneously, drawn from a nine-agent roster. This post walks through the architecture of Sentinel's agentic pipeline — from natural-language query to ranked recommendation.

The pipeline has four stages. First, natural-language parsing: a specialised LLM resolves free-text queries like '15kg raw aluminium' into canonical HS codes, quantity, typical origins, and seed supplier candidates. Parsed results are cached in Redis with a 24-hour TTL so repeat queries return instantly.

Second, parallel agent fan-out. Seven always-on agents run simultaneously against a shared context: the sanctions agent scans consolidated sanctions lists for every candidate origin and supplier. The compliance agent maps the buyer's regime against CBAM, CPTPP, and jurisdiction-specific trade rules. The CO₂ agent computes lifecycle carbon using live emission-factor data and country grid-carbon multipliers. The FX agent scores currency pairs using live ECB and forward-curve data. The commodity agent pulls spot and forecast pricing for the canonical SKU. The shipping agent scores sea routes based on observed AIS lane data and chokepoint risk. The supplier agent verifies named suppliers against global company registries. A sentiment agent clusters GDELT and RSS signals by commodity class, and an air-freight agent evaluates time-critical cargo via IATA spot rates and hub capacity — both activate when the query makes them material.

Third, predictive scoring. Each agent's output is normalised into a confidence signal between 0 and 1. A weighted ensemble — tuned on backtested procurement decisions — fuses the signals into a composite score. The weights are not static; they shift based on sort-by preference (cost, CO₂, balanced) and market regime (stable, volatile, crisis). A VOLATILE regime reduces overall confidence by 15%. A CRISIS regime reduces it 30%. Dissenting agent signals are surfaced explicitly to the buyer.

Fourth, structured output. The final response includes ranked origins with verified suppliers, optimal route with leg-by-leg risk scores, total landed cost broken down by component (product, shipping, insurance, tariff, FX, re-export), CO₂ rating with A-E grade and CBAM compliance notes, and a plain-English explanation generated from the signal weights.

The feature store is two-layer. Redis holds the hot path: latest FX rates, commodity spots, lane risk scores, parsed-query cache, sentiment snapshots, and anomaly flags. A feature recomputer job runs every 5 minutes for lane risk and every hour for FX and commodities, reading from Postgres and updating the Redis cache. The Postgres side keeps the full historical record for backtesting, model retraining, and audit trail reconstruction.

Every decision is auditable. We persist the agent outputs, their confidence signals, the ensemble weights used, and the final score into a decisions table. If a buyer asks 'why did you rank Canada first?' the audit log shows which agent surfaced which signal, how it was weighted, and what the alternative scores were. This is defensible in a procurement review.

The ingestion side is designed for independent failure. Each feed is a self-contained collector. If OpenSanctions times out, the sanctions agent uses the last known good dataset from Redis with an explicit staleness flag in the response. If the ECB feed returns 503, FX falls back to hourly cached rates. Collectors auto-disable after five consecutive failures so a broken feed can't poison the agent pipeline.

On top of the REST API, we expose the same agentic engine via Model Context Protocol. An MCP client running inside Claude, ChatGPT, or any compliant assistant can issue procurement queries directly — "find me the best aluminium supplier for a UK buyer with CBAM exposure" — and receive the same structured response the dashboard renders. Any MCP-compatible assistant can query the procurement engine directly and receive the same structured response the dashboard renders.

The entire pipeline is async Python on FastAPI, with a custom scheduler managing the background recompute jobs and ingest collectors. No Celery. No external queue for the hot path. Async/await throughout, with error boundaries between agents, so a single failed feed does not propagate errors through the agent pipeline.

In a future post, we'll cover how the predictive scoring model is trained — the backtesting framework, the historical decision corpus, and how we handle regime shifts in the training data.