Small AI, Large Impact

A Fusion-GEOINT Pipeline for Critical-Infrastructure Protection.
April 2026

Over the past few decades, the amount of data our society produces has increased by more than 2 million percent, from 25 million GB per month in 1999 to more than 522 billion GB per month in 2025. This exponential jump has profound implications for all data-intensive fields. GEOINT—with use cases ranging from urban planning and consumer analytics, to disease-modeling and defense—is no exception. Today, the challenge of fusion GEOINT is less about access, more about precision. Analysts increasingly grapple with a paradox: more data leads to less clarity. Massive datasets spanning mobile telemetry, open-source infrastructure, and network activity hold critical insights, but extracting them efficiently seems harder than ever. AI approaches have clear promise in breaking down these barriers, and their capabilities are growing by the day. But over-reliance on “black box” systems risks trading one form of opacity (analytic endpoints) for another (decision processes).

With that dilemma in mind, our team recently piloted a continent-scale approach to break through two persistent fusion-GEOINT bottlenecks in a pipeline for critical-infrastructure protection: identifying the subset of locations that matter most across vast areas, and translating raw activity near those locations into actionable intelligence at the individual-entity level. Instead of building a general AI layer, we deployed explainable AI within deterministic systems to surgically cut through these barriers while safeguarding decision transparency.

Continent-Scale Facility Protection as a Test Case

For today’s continent-scale GEOINT applications, emerging challenges loom large:

Fusion datasets are sprawling and noisy, with wide variance in data quality
Storage and compute costs can be enormous
Heavyweight processing introduces lag that slows time-to-decision
Over-reliance on AI middleware risks exploding costs despite unclear value dynamics

Critical-infrastructure protection provides a robust test case to build extensible processes that meet these challenges. Because of their outsized impact, critical-infrastructure attacks like pipeline sabotage, industrial arson, and port bombings cut across ideologies, occur on all points of the conflict continuum, and can take place anywhere. Indeed, these attacks are on the rise,.

But the attack is not the signal. Overwhelmingly, these attacks are enabled by physical reconnaissance, often conducted months, even years, in advance. Using a modular approach that can be adapted to model any set of observable behaviors, our team modeled physical reconnaissance activity as an entity engaging in all of the following behaviors near data centers, ports, or energy facilities:

Multi-site — Was the entity active near 2+ facilities?
Repeat visits — Did the entity visit a facility 3+ times?
Dwell time — Were visits sustained for 15+ minutes?
Off hours — Did visits occur at unusual times?
Multi-industry — Did this behavior span 2+ domains (e.g. ports and energy facilities)?

Building a system capable of detecting these reconnaissance signals in and around critical infrastructure requires analysts to overcome the emerging challenges described above. This is immediately tricky: acquiring reliable facility information is often a choice between purchasing costly vendor data or relying on public datasets. In the latter, facilities are globally distributed and inconsistently labeled. Narrowing the field to the subset of locations that matter most is daunting.

One-day shutdowns at major commercial ports, like Felixstowe, UK, can cause up to $2B in first-order economic impact alone—underscoring the outsized effect of critical-infrastructure attacks.

Our approach began with high-recall extraction of critical infrastructure locations from sources like OpenStreetMap. This approach returns too much data, by design. As the first of two minimalist AI processes, we engineered a local-instance large language model (LLM) to normalize the data, resolve ambiguities, and screen for user-defined criteria (e.g., “facilities of national significance”) — reducing initial returns by more than 90 percent.

The output is not just a collection of location data; rather, it is an organized representation of infrastructure at continental scale.

Pre-index once, then enrich everywhere

Next, we use fast, deterministic operations to build geographic footprints around each facility, and convert those footprints to hierarchical tiles. Every mobile ping, POI, and critical-infrastructure facility share a common spatial key. Now the critical-infrastructure dataset is ready for fast, efficient integration with downstream data and analytics.

Continent-scale tiling operations transform messy OpenStreetMap data into hierarchical tiles. Resulting data is suitable for constant-time spatial joins at billion-datapoint scale across multiple pre-indexed enrichment datasets.

Importantly, this process can be applied to any region, and any type of facility, for any purpose. Whether analysts are interested in building infrastructure to protect data centers in Chile, or to analyze consumer foot-traffic in coffee shops throughout the Philippines, the approach remains the same. What was previously a manual, region-by-region effort becomes a scalable, repeatable, efficient process. Every AI-augmented decision is cross-validated by repeat-run convergence, and explained in plain language in a verbose audit log.

This shift is critical. Fusion analytics depends on context, and that begins with knowing where to look. By compressing the search space from “everywhere” to a prioritized set of priority locations, we create the ideal conditions for high-fidelity downstream analytics. And because every LLM-generated decision is made explicit, analysts can scrutinize the process fully.

Now, the big payoff: We enrich our pre-indexed critical-infrastructure tiles with fusion mobile-device data that passes through our deterministic reconnaissance model.

From Datapoints to Actionable Intelligence

Raw mobility data provides excellent spatiotemporal resolution, but poor behavioral insight, often leaving fusion-GEOINT analysts to reconstruct behavior manually, across millions of devices. At today’s data volumes, this puts even the best analysts in an impossible position—especially in defense, where the consequences of missing a signal can be severe.

A proactive system designed to detect the subtleties of physical reconnaissance needs to be blazing fast and deterministic, producing consistent outputs that form a basis for human-in-the-loop intelligence analysis.

With our behavioral model of reconnaissance activity in place, and our critical-infrastructure tile indices at the ready, we begin enriching those tiles with raw mobility data. This allows us to isolate devices that demonstrate specific, non-random behaviors near those facilities. This triage reduces noise by more than 99.9% while preserving analytically relevant signals, using deterministic, millisecond-speed operations. Results from Türkiye are shown:

Even with a >99% reduction in noisy device data, at continent-scale this approach may still yield millions of datapoints across hundreds of devices. Coupled with enriched spatial context from our infrastructure dataset, pattern-of-life information for a single device may comprise thousands of latitudes, longitudes, timestamps, OpenStreetMap identifiers, etc.—unwieldy for analysts, but ideal for surgical AI.

Machine-readable pattern-of-life data comprising thousands of technical elements is passed into a bespoke LLM for rapid synthesis, validation, and summarization—delivering rapid insight at scale.

We pass those complex pattern-of-life datasets into an offline LLM engineered to generate concise, structured vignettes describing each device’s behavior in plain language. Movement patterns, visit frequencies, international travel, and atypical behaviors are validated, logged, and delivered as analyst-ready narratives.

The result is a scalable layer of interpretation. Instead of reviewing raw telemetry, analysts can quickly understand the individual behind the device, and how they relate to broader patterns in the environment. What once required days of manual analysis can be produced in seconds, across entire populations of devices, with improved accuracy, accountability, and repeatability.

Continent-Scale Precision at Mission Speed

These two capabilities—location triage and narrative generation—are tightly coupled. One defines where to focus, and the other explains what’s happening there. Together, they address a core limitation in fusion GEOINT: the inability to efficiently connect place and behavior at scale. In our example use case for critical-infrastructure protection, how we approach these limitations can be the difference between adjusting security posture around target facilities at the right times, or reacting to an unforeseen attack.

This is where surgical AI proves its value. By applying explainable AI systems selectively—at the largest bottlenecks—we avoid the pitfalls of opaque, overgeneralized AI stacks while delivering significant gains in speed and clarity.

A more subtle implication here is the shift in how analysts interact with data. Instead of navigating raw datasets, our approach allows them to operate on curated, structured, and interpretable layers of information. This enables faster decision-making, improves analytic consistency, and enhances outcomes in mission-critical contexts.

As GEOINT continues to evolve, the organizations that succeed won’t be those with the largest datasets or the most costly token burns, but those that can rapidly deliver meaning, with precision, at scale. Surgical AI within deterministic pipelines offers a path toward this, transforming overwhelming data from a liability into an operational advantage.

Our team is bringing these ideas to life. Let’s connect: dholley@clarityinnovates.com.

1 Committee Executive Directorate, & Organization for Security and Co-operation in Europe. (2025, November). Physical protection of critical infrastructure against terrorist attacks: CTED-OSCE trends report update.

2 CBS News. (2023, March 8). Physical attacks on power grid rose by 71% last year, compared to 2021.

3 UK Dept. for Transport, UNCTAD modeling

Dan Holley, PhD,

Director, AI/ML for Products