There is significant interest in the development and application of foundation models for scientific discovery. Foundation models possess the capacity to generate outputs or findings and discern patterns within extensive data sets with data volumes that are considered overwhelming for classical modes of inquiry. Efforts are under way to use these models to accelerate various aspects of scientific workflows (including streamlining literature reviews, planning experiments, data analysis, and code development) and generating novel findings and hypotheses that can then spur further research directions. However, significant challenges remain in the effective use of these models in scientific applications, including issues with flawed or limited training data and limited verification, validation, and uncertainty quantification capabilities.
This report of the Committee on Foundation Models for Scientific Discovery and Innovation explores many of these opportunities and challenges and describes key gaps and potential future directions. This report explores use of foundation models independently and cooperatively with traditional modeling, exemplar use cases of foundation models, and challenges associated with the use of foundation models. While much of this report applies broadly to the use of foundation models for scientific discovery, the conversations are specifically focused on strategic considerations and directions for the Department of Energy (DOE) and its unique mission.
The current definition of foundation models varies across communities. This study uses the following definition:
Today, foundation models are large-scale neural networks trained on vast amounts of heterogeneous data with the capability of learning new representations via fine-tuning on additional data. They represent a shift from traditional artificial intelligence (AI) systems designed for specific tasks. They possess the capacity to generate findings and discern patterns within extensive data sets with data volumes that exceed by orders of magnitude the computing and storage capacities of traditional solvers and even previous machine learning models.
Some of the key characteristics defining foundation models include massive scale, self-supervised pretraining, adaptability, emergent capabilities, ability to work in multiple modalities and be task agnostic, and a multipurpose architecture. These characteristics position foundation models as a potential paradigm shift for scientific research.
Despite the emergence of foundation models, traditional modeling (large-scale computational science solvers as well as statistical models) often retains critical advantages, particularly in interpretability, reliability, and strict adherence to physical laws. The fusion of traditional modeling approaches with foundation models is a promising direction.
Conclusion 2-1: Integrating traditional models with foundation models is proving to be increasingly powerful and has significant potential to advance computational findings in the physical sciences. These hybrid methods can be viewed as algorithmic alloys that can leverage the physical interpretability and structures of classical computational approaches alongside the data-driven adaptability of foundation models. This fusion enables the modeling of complex multiphysics, multiscale, and partially observed (understood) systems that challenge traditional approaches both computationally and mathematically.
The fusion of foundation models with traditional numerical methods represents more than a computational advance; it constitutes a paradigm shift in the conduct of scientific discovery.
Recommendation 2-1: The Department of Energy (DOE) should invest in foundation model development, particularly in areas of strategic importance to DOE, including areas where DOE already has advantages leveraging its unique strengths in those domains. DOE should also prioritize the hybridization of foundation models and traditional modeling. Such hybrid modeling strategies can fuse the physical interpretability and robustness of classical solvers with the efficiency and learning capabilities of foundation models, particularly in multiscale, multiphysics applications where traditional approaches have limitations in capturing the heterogeneity,
complexity, and dynamics of the physical system. DOE should not, however, abandon its expertise in numerical and computational methods and should continue investing strategically in software and infrastructure.
Although traditional modeling remains superior today in terms of interpretability and adherence to physical laws, integrating it with foundation models offers powerful new capabilities. These hybrid approaches enable better modeling of complex systems, and DOE should prioritize the use of these integrated methods.
In framing potential DOE efforts in foundation models, the strategic focus remains a subject of debate: how best to balance the department’s broad application space, navigate the trade-offs between leveraging past industry advancements and addressing the unique national security imperatives of DOE, and ensure responsible stewardship of taxpayer resources. A primary concern is that DOE cannot compete with the head start in technology maturation and large market share currently held by large companies, such as Microsoft and Google, that back efforts with large investments (both financially and with workforce).
Conclusion 3-1: Commercial industry has driven rapid progress in developing large language model–based foundation models, yielding a robust ecosystem of tools and capabilities. As demonstrated by, for example, the collaboration between Los Alamos National Laboratory and OpenAI, DOE can leverage these industry advances and findings as it develops foundation models for science and conducts coordinated DOE-wide assessments to identify appropriate opportunities.
This raises the fundamental question of whether DOE should be competing at all in the foundation model space and, if it does, whether it should focus on collaborations with industry or focus on complementary space where DOE’s unique mission lies. The committee believes that DOE needs to develop these tools internally in addition to the private sector’s development because the needs of the government, whether for national security or continued scientific preeminence, will not be met by private interests. The two endeavors (private and public) do not compete—they complement each other. Despite the mismatch in funding compared to industry leaders, DOE holds clear strategic advantages in several areas.
Conclusion 3-2: DOE retains clear strategic advantages in five areas: (1) a world-class scientific workforce in computational science; (2) access to large-scale, science-focused, and experimental computing
hardware; (3) stewardship of unique experimental facilities and open and controlled or classified scientific data; (4) capability to tackle long-term, high-risk, high-reward scientific problems; and (5) access to unique scientific data that may not be easily reproduced and which can be expanded as synthetic data may be necessary for training future foundation models.
With DOE’s advantages and foundation model capabilities in mind, the committee directed a series of recommendations addressing the potential role DOE can play in their development and implementation.
Keeping humans in the loop is important for foundation models for a number of reasons. These include addressing accountability and oversight, error detection and correction, interpretability and trust, and contextual judgment. The human counterpart can help determine the suitability and reliability of the foundation model. This outlines the following conclusion and recommendation from the committee regarding the importance of including humans in the foundation model processes.
Conclusion 3-3: While AI systems can exceed human performance in many ways, they can also fail in ways a human likely never would. For this reason, the qualification of foundation models will be necessary for decision making and prediction in the presence of uncertainty.
Recommendation 3-1: The Department of Energy (DOE) should study and develop the fusion of artificial intelligence (AI) and human capabilities. At present, AI systems handle the repetitive, manual, or routine tasks, and are starting to show abilities to reason. As AI becomes more capable, deep analysis and strategy recommendations become feasible, but humans should maintain oversight and validation, particularly for qualification and other aspects of DOE’s mission.
Agentic AI has surged as a means of using large language models (LLMs) to launch external agents to explore hypotheses or improve or verify responses. There is a unique opportunity for DOE to explore these capabilities. Such capabilities may, for example, expose automatic differentiation “hooks” in their open-source libraries; to train foundation models, a software interface must expose the computational graph of a machine learning library to adjoint calculations in a scientific code, allowing the seamless backpropagation of gradients between the two codes. The majority of DOE codes are written in Fortran, C, or C++ and do not expose the necessary computational graph to pass adjoint information. If such hooks or interfaces were exposed, LLMs would be able to couple directly to production codes, integrating robust numerical prediction into the training
process. This would allow LLMs to both perform simulation and calculate loss function, enabling complete end-to-end training with DOE’s reliable and mature physics-based simulators. For example, a text prompt (e.g., “Why is the drag high on this vehicle?”) could directly evaluate sensitivities to components of a scientific simulator (e.g., “The mesh facets of the tail fins are in a high-shear layer”) rather than attempting to glean answers through text in scientific reports. By a similar process, DOE could apply LLMs and foundation models to help operate user facilities, leading to autonomous “self-driving laboratories.”
Recommendation 3-2: The Department of Energy should evaluate the capabilities and risks of agentic artificial intelligence (AI) systems for its core applications. In particular, the committee advocates exploring agentic AI for developing autonomous laboratories for scientific discovery, decision making, and action planning for high-stakes applications.
With the rapid development of foundation models and other AI systems, there is additional potential for security risks from these systems. The adversarial use of foundation models poses security risks in two main ways. First, attackers could target the model itself to subvert its function or steal intellectual property through methods such as Prompt Injection (jailbreaking), Data Poisoning, and Model Stealing. Second, adversaries could leverage foundation models as weapons to accelerate traditional cybercrime, enabling the mass production of highly effective phishing and deepfakes, lowering the barrier to writing malicious code, and introducing new supply chain vulnerabilities when models are integrated with external systems.
There needs to be the development of processes to verify that foundation models are reliable, safe, and trustworthy throughout their life cycles. Additional measures should also be developed to protect against adversarial applications of foundation models. These could be assisted by proactive cybersecurity strategies such as red teaming, where real-world attacks are simulated to help identify and address security weaknesses. The committee therefore states the following recommendation.
Recommendation 3-3: To address potential security risks arising from the adversarial use of foundation models, the Department of Energy should explore strategies for artificial intelligence assurance, red teaming, and development of countermeasures.
Although industry leaders may have a head start with foundation models, there is value for DOE to focus on areas where it holds strategic advantages. Using these capabilities to help develop and direct foundation models can help to solidify DOE’s place in foundation models for scientific discovery and innovation leadership.
The national laboratories hold deep institutional expertise, embedded in their workforce, legacy data sets, and extensive experimental and modeling infrastructure. Yet the sheer scale of the DOE system, characterized by siloed specialized knowledge and the complexity of coordinating a large, distributed workforce, can be fundamentally misaligned with the speed and flexibility required for rapid decision making.
DOE invested early in material informatics and high-throughput experimental data curation campaigns to build unique access to data sets, through the Material Genome Initiative and other efforts. By combining advanced foundation models, high-performance computing, and curated experimental data, materials informatics can dramatically reduce the search space for viable material substitutes or processes. This is an example of a DOE effort to advance an aspect of computational science and how such research and development leads to important new capabilities.
Conclusion 4-1: Many DOE missions demand rapid analysis and decision making under urgent national security or economic constraints. Although the national laboratories hold deep institutional expertise—embedded in their workforce, legacy data sets, and extensive experimental and modeling infrastructure—the sheer scale of the DOE system, characterized by siloed specialized knowledge and the complexity of coordinating a large, distributed workforce, can be misaligned with the agility required for decisive action. Development of foundation models for this purpose poses a unique opportunity to address rapid analysis and decision making.
Recommendation 4-1: The Department of Energy should explore the use of foundation models to accelerate situational understanding by unifying dispersed, siloed, and diverse multimodal data sources as input to decision-making frameworks across heterogeneous environments.
Additionally, the needs of a DOE foundation model arguably pose more stringent requirements than in academic/industrial settings. For stockpile stewardship, simulation of critical components has matured over decades to the point that simulations calibrated by extensive testing are viewed as capable of replacing full-scale, experiment-based design. This outlines more opportunities for DOE.
Conclusion 4-2: DOE is uniquely positioned to shape the future of AI-driven science. Material informatics and near-autonomous scientific platforms highlight the power of combining curated experimental data, simulation, and advanced AI to accelerate discovery. Federated comput-
ing and facility integration extend this vision by enabling distributed use of DOE’s infrastructure.
The curation and integration of specialized knowledge coupled with emerging multimodal and agentic AI approaches underscore the importance of preserving expertise, reasoning across diverse scientific data streams, and directly linking foundation models to DOE’s mature simulation ecosystem.
Recommendation 4-2: The Department of Energy should both modernize existing infrastructure and invest in new infrastructure to generate, curate, and facilitate the large data corpus necessary to build a scientific foundation model, including simulations to create data, high-throughput and/or autonomous experimental facilities, and facilities to host data. Additionally, they should create interfaces (e.g., agentic, retrieval-augmented generation tools) through which large foundation models may easily access these sources. A successful strategy will provide holistic access to multimodal or heterogeneous infrastructure across the entire DOE complex, mitigating the “stove-piping” of assets between different laboratories or departments.
A strength of DOE is its ability to retain scientific talent, which should be reinforced with AI expertise as well. The success of any DOE-wide foundation model initiative depends entirely on attracting and retaining top AI talent, including overcoming the hurdle of slow funding cycles. However, DOE currently has excellent infrastructure and expertise as well as well-defined, mission-driven research.
Conclusion 4-3: DOE struggles to compete with the private sector for AI talent due to lower salaries and slow, traditional funding cycles. However, DOE’s unique strengths, such as its mission-driven work, long-term career paths, and powerful supercomputing infrastructure, can be leveraged to attract talent. Building a strong academic pipeline through closer collaboration with universities is also essential for its long-term success.
Recommendation 4-3: To maintain a top-tier workforce, the Department of Energy (DOE) should design leadership-scale scientific research programs and provide staff with opportunities to rapidly adapt to a quickly evolving technological landscape. To attract early-career scientists, DOE should be perceived as the best place to become a leader in scientific machine learning; although industry may lead in large language model space, the unique access to state-of-the-art science can attract top talent. To be competitive with large-scale development efforts in industry, it is important to avoid
fracturing of scientists’ time and attention. We recommend that DOE should create mechanisms by which medium through large teams can mount coordinated, focused efforts targeting mission-critical developments in fundamental research into, and applications of, foundation models for science.
One of DOE’s major strengths is its data collection and generation capabilities. Leaning on this strength can be beneficial to the development and use of foundation models for scientific discovery and innovation. To expedite this potential, the data generated needs to be readable and usable by both the human users and the foundation models. This will help enhance operational efficiency and productivity and boost communication and collaboration. The standardization of data can help make DOE data readable and usable.
Conclusion 4-4: Although DOE curates many high-value data sets of value for construction of foundation models, they are typically developed in an ad hoc manner with heterogeneous file formats and data curation strategies that currently pose a barrier to high-throughput processing of data. Foundation models present a unique opportunity to address this issue.
Recommendation 4-4: To increase the success of future foundation models for science, the Department of Energy should invest in large-scale data user facilities (classified and unclassified), leveraged by artificial intelligence’s growing capability to interpret heterogeneous scientific data, similar to the successes experienced with previous investments in supercomputers and open-source scientific computing libraries.
Applying foundation models within DOE missions presents a multilayered set of scientific and operational challenges. These models, which emerged from success in domains such as natural language processing and vision, struggle to transfer directly into computational science workflows that demand physical fidelity, mesh-aware representations, and scalable performance across problems involving multiscale and multiphysics described by partial differential equations.
Verification, validation, and uncertainty quantification (VVUQ) are essential components of trustworthy scientific computing, ensuring that models are mathematically sound (verification), accurately represent the real-world systems they simulate (validation), and provide a clear understanding of uncertainties in their predictions (uncertainty quantification). These practices are well established in traditional modeling and simulation but are not yet adequately developed or standardized, particularly for foundation models. AI models often operate as
black boxes, lacking transparency in how outputs are generated or how reliable they are under different conditions. Establishing VVUQ standards for foundation models is critical to ensure that these systems can be safely and effectively used in scientific discovery.
Conclusion 5-1: VVUQ methods analogous to those for traditional computational modeling do not exist for, or map directly onto, foundation models.
Conclusion 5-2: VVUQ, interpretability, and reproducibility are critical for establishing and maintaining trust in systems that are inherently complex, opaque, and increasingly deployed in high-stakes situations. Integration of VVUQ into foundation models would lead to increasing their trustworthiness, reliability, and fit for purpose, which is essential for future scientific discovery and innovation.
Recommendation 5-1: The Department of Energy (DOE) should lead the development of verification, validation, and uncertainty quantification frameworks tailored to foundation models, with built-in support for physical consistency, structured uncertainty quantification, and reproducible benchmarking in DOE-relevant settings.
There have been successes in validating model outputs with experimental data, as the data provide a real-world benchmark against which the models’ accuracy can be determined. Without high-quality and robust experimental data, it is difficult to determine if a model’s predictions are valid or merely artifacts of its assumptions or training data. This is especially important for foundation models and hybrid models, which may generalize well in theory but fail under specific conditions or in untested regimes. Therefore, the committee states the following conclusion and recommendation.
Conclusion 5-3: Foundation models for science will demand more and different physical experiments to validate the veracity of the AI predictions. Empirical grounding ensures that foundation model outputs reflect physical laws and domain-specific behavior. This is especially critical in high-stake DOE applications, where simulations alone cannot guarantee correctness, and where physical experiments provide the only definitive test of predictive validity.
Recommendation 5-2: In line with Recommendation 4-2, the Department of Energy should place high priority on data collection efforts to support reproducible foundation model training and validation, analogous to traditional efforts in verification, validation, and uncertainty quantification.
DOE is in a unique position for the development and use of foundation models for scientific discovery. They are leaders and have the capacity to tackle long-term, high-risk, high-reward scientific problems. An issue currently with foundation models for science is the lack of standards for development and use. Using these key resources, DOE can be contributing to the development and establishment of these standards for foundation models. Having concrete standards ensures compatibility and interoperability and improves reliability of a system.
Recommendation 5-3: The Department of Energy should establish and enforce standardized protocols and develop benchmarks for training, documenting, and reproducing foundation models for science and should participate in defining software standards, addressing randomness, hardware variability, and data access across its laboratories and high-performance computing infrastructure.
Although many of the technical challenges associated with foundation models can be addressed through internal research and development, deployment at DOE scale will increasingly involve external partnerships. Collaboration with industry introduces more constraints. Proprietary model weights, restricted data access, and closed-source infrastructure often prevent rigorous VVUQ and reproducibility practices, especially when security, transparency, or auditability is required. Collaboration with industry introduces more constraints. Proprietary model weights, restricted data access, and closed-source infrastructure often prevent rigorous VVUQ and reproducibility practices, especially when security, transparency, or auditability is required. These collaborations demand careful planning and coordination to bridge institutional differences in mission, priorities, and operational practices, particularly in areas such as contracting mechanisms, responsible AI standards, intellectual property frameworks, data-sharing protocols, and alignment on VVUQ expectations.
Conclusion 5-4: Partnering of DOE laboratories with industry on foundation models will require deliberate effort, including flexible contracting mechanisms, clear intellectual property agreements, data-sharing processes, aligning on VVUQ approaches, responsible AI practices, and a shared understanding of respective missions, objectives, and constraints.
Recommendation 5-4: The Department of Energy should deliberately pursue partnerships with industry and academia to address national mission goals, governed by flexible contracts, responsible artificial intelligence standards, and alignment on reproducibility, verification, validation, and uncertainty quantification approaches and data sharing.