As part of the effort to translate the Code Principles and Code Commitments into actionable, real-world practice, it was important to provide a framework for artificial intelligence (AI) that could be referenced throughout this document that encompasses the full lifecycle of activities relevant to the consideration, development, implementation, and sustainment of these technologies. While software development processes have been subject to research and improvement science since the 1980s, there remain multiple development cycle models proposed that provide differing levels of granularity and emphasis relative to their purpose, including those developed for AI. For these reasons, a review of the literature for software development lifecycles with special attention to those proposed for AI was conducted. Well-known lifecycles were described, evaluated, and compared to establish a lifecycle definition for this work that could be used throughout.
“A software process can be defined as the coherent set of policies, organizational structures, technologies, procedures, and artifacts that are needed to conceive, develop, deploy, and maintain a software product” (Fuggetta, 2000). The most basic AI lifecycle model identifies three stages: design, development and deployment (U.S. General Services Administration, n.d.). Others provide more granular detail on certain subprocesses of interest. For example, in the National Academy of Medicine (NAM) seminal publication on AI, Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril (NAM, 2022a), the design phase (which the authors felt had been inadequately addressed in prior literature) was broken down into the following subsections: “(1) identify or re-assess needs, (2) describe existing workflows, (3) define the desired target state, (4) acquire or develop AI system, (5) implement AI system in target setting, (6) monitor ongoing performance, and (7) maintain, update, or de-implement” (Salwei and Carayon, 2022). And the Organisation for Economic Co-operation and Development (OECD) placed significant emphasis on the development stage with substages
that included collecting and processing data, building and/or adapting model(s), and testing, evaluation, verification, and validation (OECD, 2024). Other models should be acknowledged (Data Science Process Alliance, n.d.; De Silva and Alahakoon, 2022; Shearer, 2000); however, the author group did not feel that they added key elements beyond the proposed AI Code of Conduct (AICC) model that were needed for this work. The prior NAM and OECD framework (in blue) as well as the AI lifecycle to be used throughout this document were aligned together and shown in Table 4-1.
As part of the review, harmonization, and gap closure process, the author group also reviewed the definitions of each of the stages and substages from a number of prior frameworks to establish definitions for each of the AICC AI lifecycle categories. A summary of these definitions is provided in Table 4-2.
It is important to note that AI lifecycle considerations have some significant conceptual, technical, and operational differences relative to those for traditional technologies. Here, we seek to generalize considerations broadly for both discriminative and generative AI rather than discuss specific nuances between them. One way to understand these differences is to compare them across stages in the health AI lifecycle.
The design of health care AI applications involves a deep understanding of clinical workflows, data, and specific health care needs (Greenhalgh et al., 2017) and is more nuanced than traditional health information applications (e.g., electronic health record [EHR] billing tools), identifying precise problems amenable to AI solutions (e.g., detecting breast cancer). It also involves deeper collaboration across a range of stakeholders including clinicians, patients, family and community members, and implementation and data scientists and practitioners (Hogg et al., 2023; Scott et al., 2021). One example is the definition of use cases and objectives, including specifying how AI will improve patient outcomes, assist with diagnostics, or optimize workflows. While any technology tool has stated goals and objectives for implementation and use, for health AI it is more challenging to ensure that the design goals are met due to training data that may result in biased operation, or produce other unexpected behavior, or not allow transparency or explainability in situations where it is needed (Ferrara, 2023). Early planning for health care AI must also address ethical issues and privacy concerns from the outset. This includes considerations around informed consent, data anonymization, and the potential for algorithmic biases (Khalid et al., 2023).
TABLE 4-1 | A Comparison of Relevant AI Lifecycles with Stage and Content Alignment
| Source | AI Lifecycle Stages | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| US GSA | Design | Develop | Deploy | |||||||
| Hope, Hype, Promise, and Peril | Identify or Re-assess Needs | Describe Existing Workflow | Define Desired Target State | System Design and Planning | Acquire or Develop AI System | Implement AI System in Target Setting | Maintain, Update, or De-implement | |||
| OECD | Plan and Design | Collect and Process Data | Build and/or Adapt Model | Test, Evaluate, Verify, and Validate | Make Available for Use/Deploy | Operate and Monitor | Retire/Decommission | |||
| AICC | Problem Scoping, System Design, and Planning | Data Acquisition, Management, and Linkage | Model Development, Testing, and Procurement | Implement and Scale | Post Implementation Monitoring, Feedback Systems, and Decommissioning | |||||
NOTES: The lifecycle used for the AICC project and in this document is in yellow at the bottom. AICC = AI Code of Conduct; OECD = Organisation for Economic Co-operation and Development; US GSA = U.S. General Services Administration.
TABLE 4-2 | Description of the AI Development Lifecycle
| AICC AI Lifecycle Stage | Description |
|---|---|
| Design: problem scoping, system design, and planning | In this phase, the goals and stakeholders of the AI system are identified as are the data, tools, and technologies needed to achieve those goals. Standard project management approaches including timelines, milestones, and proactive risk identification and mitigation strategies are employed. This includes use cases and incorporation of user-centered design principles regarding engagement of end users and defining standardized, measurable metrics for successful implementation and outcome evaluation. These metrics refer not only to statistical measures of model success, but also of clinical and economic outcome metrics. Planning for seamless data collection for metrics is also included in this phase as is planning for interoperability and system scalability. |
| Data Management: data acquisition, management, and linkage | In this phase, data needs identified during planning are addressed. Data may be directly extracted from existing systems or developed using synthetic techniques (with due consideration of the associated benefits and limitations). These data must be securely transmitted and stored with appropriate access controls. The data are assessed for integrity, and transformed to meet model requirements, including assurance for data quality characterization as well as representativeness and generalizability of the data to ensure that they are fit for intended purposes and application to diverse populations. As applicable, datasets may be linked to create a comprehensive repository for the AI system. |
| Development: model development, testing, and procurement | This phase includes the process of developing new AI models and/or procuring existing AI systems and then adapting them to the local environment. This phase is iterative until the AI system performs to the goals and metrics set out in the planning phase, which includes ongoing algorithm development/training, testing, and refining until the system demonstrates the required accuracy, robustness, and sub-population equity. |
| Implementation: implementation and scaling | In this phase, AI systems are incorporated into the desired workflow through a user-centered design process to ensure that the predefined goals are met and performance in the setting of use are achieved. Typically, this involves initial deployment in a test environment using real-world data, with monitoring and acceptance testing occurring before implementation in live systems, whereupon additional assessment of model performance is completed. |
| Maintenance: post-implementation monitoring and feedback systems and decommissioning | The setting in which AI systems are deployed may change in a number of ways. The clinical care processes and data collected may change over time, and the AI tool itself may change the workflow and outcomes it is embedded within. For these reasons, AI systems may produce variable outputs, which may be desirable or may reduce accuracy over time and even result in harm. As such, monitoring of AI system results, including the use of feedback systems, is essential to ensure that the systems are achieving their stated goals and not causing harm. Decommissioning is also critical to consider, which includes removing systems that are not performing as expected or are no longer required or desired. |
Digital health applications might not face these ethical challenges to the same extent. The design phase must consider regulatory requirements specific to health care AI. This involves understanding evolving guidelines and requirements from federal agencies and governing bodies and incorporating these into the design process. It is also important to consider cognitive heuristic biases that may be generated from how AI is integrated into workflow, and mechanisms to expose and prevent these biases are important (Jabbour et al., 2023). Unlike rule-based systems, AI solutions may be designed to augment users, requiring greater clarity about roles and accountabilities as well as the associated mechanisms to track and report on feedback loops. Finally, interoperability and scalability require careful consideration and planning during the design phase of AI systems (Oikonomou and Khera, 2024). Of the utmost importance during the design phase, developers, in collaboration with end users and recipients of AI (e.g., patients), must plan for comprehensive evaluation of AI tools and products throughout the entirety of the AI lifecycle. As outlined in the proposed IMPACTS Framework, the evaluation should include the assessment of Integration; Monitoring, governance and accountability; Performance quality metrics; Acceptability trust and training; Cost and economic evaluation; Technology safety and transparency; and Scalability and impact (Jacob et al., 2025). This model addresses the widely used approach of assessing the statistical accuracy of a model, but also includes but is not limited to issues such as interoperability and workflow integration, data privacy and security, usability, safety, clinical effectiveness, clinical efficiency, and clinical utility.
Data management in AI systems differs from digital health application development in that doing so requires large, high-quality datasets for training and validation. Ensuring data quality and mitigating bias are critical during model development (Ahmed et al., 2023a). To avoid creating new or exacerbating existing biases, AI models require datasets that are diverse and representative.
AI systems development also differs from digital health system development in the use of advanced machine learning and related AI techniques. Once developed, health care AI models require rigorous validation and testing to ensure their accuracy, reliability, and generalizability, and applicability across different local patient populations. This involves comparing AI outputs with expert clinical judgments and assessing performance metrics such as sensitivity, specificity, and predictive value. AI models often require iterative development based on feedback from clinical trials or pilot studies (Siontis et al., 2021). This iterative process helps to refine the model and improve its performance before full-scale deployment. Developing AI models also involves promoting and facilitating transparency and explainability in the design, inputs, processes, and outputs where possible, sometimes through tools and techniques used in parallel to the
core AI. This is especially important in health care contexts where the ability to comprehend the rationale for AI-generated guidance is crucial for clinical decision making (Albahri et al., 2023). Traditional digital health systems might not face such stringent requirements regarding data bias, have less intensive validation processes, have a more linear development process, and typically are much easier to explain, which has led to variability in AI development evaluations (Tornero-Costa et al., 2023).
Additionally, procurement of health AI systems often requires specialized expertise to evaluate appropriateness to a given setting. Unlike digital health solutions, assessing an AI application involves understanding its underlying algorithmic performance, data requirements, and potential biases, all within the context of use. As such, procurement teams may need to include data scientists or AI specialists in addition to clinical, administrative, or technology expertise for traditional health information procurement. Regulatory compliance is another factor in this phase, as health AI applications and their use are subject to evolving regulatory requirements. Procurement processes must also ensure that data privacy and security measures are robust from technological, legal, and contractual perspectives. While these issues are present with digital health procurement, the changing regulatory landscape and the expansion in the quantity of data used in AI systems require additional attention in AI procurements.
Implementation also presents important differences for health AI solutions which often need to be integrated with existing EHRs or other digital health information systems, and often in ways that are more complex and can change over time due to the evolving nature of AI algorithmic solutions (Greenhalgh et al., 2017). AI applications often require customization and calibration based on the specific needs of the health care setting (Brady et al., 2024). This can involve fine-tuning algorithms to local data and practices, which is less common with digital health applications. During implementation of health AI, it is important to assess workflow changes, decision support, and the performance of the AI-enabled clinical team. Implementing AI tools may also require additional training for health care professionals to understand and effectively use the AI system.
Once implemented, AI applications require ongoing substantial maintenance. Indeed, continuous monitoring of AI applications, or algorithmovigilance, is crucial to ensure their accuracy and effectiveness (Embí, 2021), including assessment of real-world outcome performance, as planned in the design phase. Unlike previous IT applications, AI systems and their effects and impacts might evolve over time as they are exposed to new data and environments, so ongoing performance evaluation is necessary to detect and correct issues (Davis et al., 2017). AI applications need to be regularly assessed for biases and accuracy (Davis et al.,
2024). This is especially important in health care where inequities in data can lead to or exacerbate existing disparities in health outcomes. Health AI systems may require regular updates and retraining as they change or “drift,” to adapt to new data, variations in patient populations, and changes in medical knowledge. Finally, monitoring health AI includes ensuring ethical use and addressing concerns about transparency, accountability, and consent in ways that might differ from earlier digital health applications.
In summary, health AI applications share many similarities to digital health applications, but they also differ in important ways. Key differences are depicted in Figure 4-1. In the design phase there must be a focus on understanding clinical needs, defining specific use cases, addressing ethical and privacy issues, navigating regulatory requirements, and planning for data management. Development of health AI involves selecting and developing algorithms, ensuring data quality and bias mitigation, rigorous model validation, iterative development, and striving to promote explainability and transparency. And, procurement requires specialized expertise, regulatory compliance, and careful consideration of data handling. Implementation involves complex integration, extensive training, and customization. Finally, maintenance focuses on performance accuracy, bias, fairness, adaptability, and ethical concerns. These differences reflect the challenges and considerations associated with health AI applications, necessitating tailored approaches compared to earlier digital health solutions.