US Military Trials LLMs Trained on Classified Data

The US military has always been interested in AI, but the speed at which they've jumped on the generative AI bandwagon is quite surprising to me -- they're typically known to be a slow-moving behemoth and very cautious around new tech.

Bloomberg reports that the US military is currently trialing 5 separate LLMs, all trained on classified military data, through July 26.

Expect this to be the first of many forays militaries around the world make into the world of generative AI.

Why this matters:

The US military is traditionally slow to test new tech: it's been such a problem that the Defense Innovation Unit was recently reorganized in April to report directly to the Secretary of Defense.
There's a tremendous amount of proprietary data for LLMs to digest: information retrieval and analysis is a huge challenge -- going from boolean searching to natural language queries is already a huge step up.
Long-term, the US wants AI to empower military planning, sensor analysis, and firepower decisions. So think of this is as just a first step in their broader goals for AI over the next decade.

What are they testing?

Details are scarce, but here's what we do know:

ScaleAI's Donovan platform is one of them. Donovan is defense-focused AI platform and ScaleAI divulged in May that the XVIII Airborne Corps would trial their LLM.
The four other LLMs are unknown, but expect all the typical players, including OpenAI. Microsoft has a $10B Azure contract with DoD already in place.
LLMs are evaluated for military response planning in this trial phase: they'll be asked to help plan a military response for escalating global crisis that starts small and then shifts into the Indo-Pacific region.
Early results show military plans can be completed in "10 minutes" for something that would take hours to days, a colonel has revealed.

What the DoD is especially mindful of:

Bias compounding: could result in one strategy irrationally gaining preference over others.
Incorrect information: hallucination would clearly be detrimental if LLMs are making up intelligence and facts.
Overconfidence: we've all seen this ourselves with ChatGPT; LLMs like to be sound confident in all their answers.
AI attacks: poisoned training data and other publicly known methods of impacting LLM quality outputs could be exploited by adversaries.

The broader picture:

LLMs aren't the only place the US military is testing AI.

Two months ago, a US air force officer discussed how they had tested autonomous drones, and how one drone had fired on its operator when its operator refused to let it complete its mission. This story gained traction and was then quickly retracted.
Last December, DARPA also revealed they had AI F-16s that could do their own dogfighting.

P.S. If you like this kind of analysis, I write a free newsletter that tracks the biggest issues and implications of generative AI tech. It's sent once a week and helps you stay up-to-date in the time it takes to have your morning coffee.

Today's Technology

Search This Blog