The views expressed by contributors are their own and not the view of The Hill

The case for federally funded AI research

by Dean W. Ball, opinion contributor - 01/10/24 5:30 PM ET

Governments around the world are eager to take action on AI. “We must act with urgency,” said Senate Majority Leader Chuck Schumer last fall, “we cannot afford to wait.” Yet the question of what, exactly, Congress should do is far from clear. Many of the most common AI concerns—things like cybercrime and defamation—are already crimes, and it is not clear that new legislation will add value. Risks such as the spreading of misinformation are unlikely to be addressed successfully through the law because of the First Amendment. Other risks, such as the creation of novel bioweapons, are highly speculative, and it is not clear that new laws should be passed to address as yet purely hypothetical concerns. What’s more, AI is a rapidly evolving field even by Silicon Valley standards, with new model architectures, new approaches to data collection, and new business models evolving constantly.

Thus while policymakers want to show voters that they are on top of things, humility, as is so often the case, is probably the best path forward. But that does not mean Congress has nothing to do in the short term—quite the contrary. Above all else, it should facilitate scientific research into AI and ensure transparency in AI systems.

At first, more AI research may seem unnecessary. The website arXiv, which is the primary repository for public AI research, publishes thousands of AI-related papers per month. The AI research community is known for its embrace of pre-prints and free academic journals—a far cry from the pricey subscriptions charged by journals in other fields. Indeed, this is probably one of the reasons that AI research advances so quickly.

But in recent years, two things have happened that threaten this rich ecosystem: first, the costs of training frontier models like OpenAI’s GPT-4 have skyrocketed. Few academic institutions can give researchers even a small fraction of the computing power that firms like Microsoft, Anthropic, and DeepMind have at their disposal. Second, over the past decade the field has transformed from the academic community it once was to a high-stakes competitive market with hundreds of billions of dollars, if not more, on the line. Even many AI companies have trouble accessing the computing power they require in sufficient quantity. Thus, the best research is now only doable at the biggest firms, and those companies have an incentive to keep much of that work secret. Dwarkesh Patel, a well-connected industry observer, recently wrote that, in the opinion of one employee at a top lab, “nothing worth reading is published.”

This matters because we have yet to answer many foundational scientific questions about AI. To take one example from recent headlines, the New York Times’lawsuit against OpenAI relies on the notion that ChatGPT “memorizes” articles from the Times. Yet the nature of memorization in large language models—the conditions under which it happens and to what extent it happens—is poorly understood by leading academic researchers. OpenAI has vaguely suggested that they have a plan to eliminate memorization, but hasn’t shared any scientific insights. One can imagine that if OpenAI had learned that memorization was impossible to eliminate fully, they might not want to publicize such news.

More broadly, AI researchers do not fully understand how even smaller models actually work at a detailed level: we do not understand how they learn or why they fail. More troublingly, we do not understand how to predict what specific capabilities next-generation models will have when their training is complete. All of these are questions that the largest labs are no doubt exploring feverishly, but again, their incentive is not necessarily to consistently publicize foundational new findings.

These questions, and many others, will be essential to answer for safe, trustworthy AI. What is needed, then, is a publicly funded research infrastructure for exploring foundational topics in AI like interpretability (understanding how the system works), alignment (ensuring that AI systems operate in accord with human values), and safety. This cannot simply take the form of handing grants to academic researchers: their binding constraint is access to GPUs (the computers on which AI research is most commonly performed), and more money without more GPUs will simply exacerbate this problem.

The federal government already builds and operates some of the world’s most advanced supercomputers, so it has experience with the complex task of making such facilities secure and reliable. Indeed, because the equipment needed for AI computing centers tends to be off-the-shelf, it may be less expensive to build these facilities than highly customized federal supercomputers like the Summit system managed by Oak Ridge National Laboratory. AI-specific facilities could be used by academic researchers and government agencies alike, with the only stipulation being that all resultant work be made publicly available.

Fundamentally, what we need from government on AI is not prescriptive legislation—we are simply too early in the development of AI for that to be productive. Instead, we need insight into the deep questions AI poses for all of us. Ambitious, pioneering and open scientific research is one key piece of the puzzle.

Dean W. Ball is the Senior Program Manager for the Hoover Institution’s State and Local Governance Initiative.

Tags Artificial intelligence Chuck Schumer