During a vendor demonstration, a health system chief medical officer witnessed a model that predicts patient deterioration six hours earlier than the current early warning score. The system is designed to be customized, and her data science team is pushing to scale it from two to forty units. Nursing staff on the current pilot units have expressed skepticism, but the vendor has a contract ready, and the choice is ultimately hers.
When the time comes to consider whether to scale the system to forty units, there is no algorithm to take the leap of faith. Concern for clinical trust if the alerts are wrong is also a question no model can answer. It's a question that only people can answer and this is the real challenge health systems will have to tackle this year, whether their leaders are ready to take that leap.
Every single clinical model comes with an output and a question, and the hardest part comes after the output is generated. Responsible use of software as explained in the AI Code of Conduct published by the National Academy of Medicine, is a human commitment while the software is in use, and will never be a property to the software.
Drawing from a different perspective, the American Medical Association is similar. In identifying their work as augmented intelligence, the association purposely chose augmented instead of artificial. Their augmented intelligence framework is based on the belief that these tools are designed to be used in combination with clinical judgment and that clinicians are ultimately responsible for the care provided. A model, for example, may suggest a patient be discharged, but the responsibility for the discharge order is that of the clinician who signs it.
This responsibility does not distribute evenly. It falls the most on the clinician at the bedside and on the nurse who has to make the determination of whether to act on the alert. It also falls on the purchasing executive, who probably has never been to the unit. When the model was purchased, each of these individuals was given an unanswered question. The responsibility to answer the question does not disappear just because it has not been addressed. It will remain unassigned until the moment it is necessary.
What Tooling Cannot Settle
It is possible to evaluate a model for precision. No metric exists to inform a leader how precise is adequate prior to 40 units being reliant on the output. The 2025 recommendations from the Joint Commission and the Coalition for Health AI on the responsible integration of AI within healthcare illustrate the importance of leadership in contrast to engineers. The recommendations suggest named oversight with validation and monitoring for extended periods of time post-launch. Each of these recommendations is the responsibility of leadership prior to being the responsibility of engineering. Someone is responsible for deciding who monitors the model and what threshold on the metrics indicates to cease operations.
For the past two years, ECRI from the Emergency Care Research Institute, has cited AI as one of its top 10 health technology concerns. The concern described is not related to a model failing in the controlled environment of a lab. The concern relates to a model that performs well in a controlled lab environment and then, in an uncontrolled messy environment post-launch, begins to drift with no one assigned to monitor for drift. The Agency for Healthcare Research and Quality has provided evidence on how models built on sub-optimal data and designed to close care gaps for the most coveted patients of a health system drift. No contract can mitigate these issues. A leader must own these issues in the absence of alarms.
It is clear from this example what is at stake. In an article published in 2019 in the journal Science, researchers found that a population health algorithm, which was used on millions of patients, was likely to underestimate the care needs of Black patients. The algorithm was shown to learn from previous health expenditure patterns, which was based on the patient care system. The tool was effectively built to serve a purpose, but the ethical assessment of the algorithm was the responsibility of a human. For quite some time, a number of key stakeholders failed to take responsibility to serve this purpose. On the other hand, the FDA appears to be placing restrictions on the final say on software in the context of AI-driven medical software. This is in the interest of the regulatory oversight of the software's lifecycle to ensure a human remains involved at all times.
The Work Only a Leader Can Do
The response that is emerging from health care systems is more fundamental and defined. A number of companies have appointed a Chief AI Officer (CAIO) and put in place oversight committees to control the boundaries of AI-powered health solutions. These structures may deliver improvement, but their efficacy is always limited by the judgment of the individuals in the positions of responsibility.
Professional associations tend to agree. Judgement in uncertain situations is identified by the American College of Healthcare Executives as the critical component of their model of leadership competency. The current US federal regulations specify that decision support tools used by certified and legally recognized services must include a description of their key attributes. This requirement provides leaders with the foundation to consider more sophisticated questions. This 'raw material' is of no use unless a leader picks it up. This may mean examining the disclosure, and re-engaging the nurse who no longer trusts the alerts, and valuing the loss of trust. Ultimately, tools will improve, and the decision to trust a tool and decide to defer to it remains a human judgment.
When done correctly, this is a discipline that is not beautiful. A named owner is responsible for each model and is expected to review it regularly to ensure the model is not broken. In advance of a warning, there is a threshold written in the model that requires the owner to pause if the model is not in a healthy state. The clinician's distrust in the alert is noted as a comment in the record where the regularly reported metrics are logged. This is a trace of standard management practice in the context of a new, strange, and very powerful class of management tools.
What Health Systems Must Determine
Over the next 24 months, all healthcare systems will start purchasing more of these tools. The more challenging concern is who will purchase them, and does that person have either the authority or the desire to say no. A healthcare system that purchases the best-performing tool on the market can still fail to serve its patients, if no person is willing to say no when the evidence justifies a no. Implementing the tool is the easy part. The decision to use the tool effectively is a judgment that is developed over time, and many systems have not begun to develop this judgment.
Healthcare systems have come through this best when they intentionally developed this judgment. Each clinical model has a person who is named and titled as the owner. The person named as the owner is the person to whom the committee is accountable, and not to the committee. The person named as owner has the authority and is obligated to stop the clinical tool from being used, and is also obligated to engage the reluctant clinician as part of the decision. The most effective leaders consider every output of the tool as an question awaiting an answer from a human.