Using Claude AI to replicate decision models

Introduction

Health Technology Assessment (HTA) requires the inclusion of economic analysis so that decision-makers have information about the impact their decisions will have on the health of the population - both in terms of direct health impact but also the health foregone by not choosing to fund something else. Health economists use decision models to quantify these trade-offs. Building these models in Microsoft Excel is the industry standard for global HTA, and is explicitly required in some countries. Critics of Excel-based models cite the meaningful improvements offered by building models using best practices from computer science and writing scripts in code rather than in spreadsheets.

There is growing momentum toward the adoption of script-based methods for health economic decision models within Health Technology Assessment (HTA). One frequently-cited barrier to adoption is the ‘steep learning curve’ for health economists to learn how to write code. This, combined with the overwhelming diversity of possible methods for developing a model using script rather than using spreadsheet software, means that the uptake of the technology lags years behind that technology being available.

A potential break in this logjam comes in the form of a different and more recent technology: Artificial Intelligence (AI). While AI is a diabolically complicated technology, a brief explanation may suffice for the purposes of this work. AI generates responses to user prompts using a complex prediction algorithm called a Large Language Model (LLM). Based on these predictions, AI agents can perform complex tasks at high speed with output, including computer code, that appears reasonable and clear.

AI chatbots are designed and optimized for human-like and helpful responses. This optimization frequently comes at the expense of accuracy, a phenomenon known as ‘hallucination’. AIs can, and frequently do, provide users with responses that are very confident and totally inaccurate. This severely limits their usefulness in performing tasks that require a high degree of accuracy, such as building functioning code for health economic decision models. The random predictive nature of the LLMs that underlie these agents, combined with their tendency to hallucinate, means that a great deal of verification work is needed to yield any sufficiently useful code, if that is possible at all.

In October of 2025, Anthropic released an update of their AI called Claude. This update allowed users to provide Claude with a customized set of operating instructions called ‘Skills’. With Skills, users can direct Claude to approach responses through specified methods and formats. In so doing, Skills allow Claude to produce more consistent outputs, and quality checks can be explicitly programmed into the AI’s process. In this way, properly designed Skills could theoretically reduce the impact of AI errors to a manageable level.

In the following posts, I’m going to describe a process by which I built a Claude Skill for replicating health economic decision models. I’ll also be describing some case studies in which the Skill was used to replicate a model from information taken from public sources.

Methods

Building Skills in Claude

A typical interaction between a user and Claude (or any AI) involves the user inputting a prompt - a question, an instruction, an idea - into a chat box. Claude analyses the prompt and algorithmically generates a response. The nature of the response is a product of the nature of the prompt, any previous instructions given by the user, and Claude’s underlying programming. Identical prompts may yield different responses. Claude can therefore be understood as a highly skilled but naive assistant who needs detailed explanation to perform even simple tasks consistently. The role of the user is to provide that explanation and then to inspect Claude’s work and to challenge its assumptions. Active and thoughtful engagement by the user is a critical part of developing any code in Claude.

Figure 1 - Using the Claude interface

In the development of this Skill, the user (me) asked Claude to read methods guidance for how to build models in R and to translate that guidance into a step-by-step process for model development. The user designed the general flow of the process (i.e., the order that steps are performed, the activities within each step) and Claude filled in the content. At each step, the user reviewed Claude’s responses, identified errors, and provided clarification and feedback before proceeding to the next step. The result of this iterative process is a series of specifically formatted files containing instructions, templates, and references. These files can then be loaded into Claude’s interface and called upon to perform the desired set of tasks in the prescribed order using the required methods.

The Replicator Skill

I worked with Claude to develop a Skill that would read the description of a health economic decision model and generate a working version of that model in R. This Skill would not require access to the source materials or any model files. Instead, it could read in the methods used to describe the model including its structure, assumptions, inputs, and transition logic, and produce an R-based model fitting those specifications.

The Skill guides the replication process through six progressive steps:

  1. Model Structure and Configuration: Claude reads the model description and designs its framework and internal logic. Parameters describing the model’s general behaviour (called ‘Control Parameters’) like cycle length, time horizon, discount rates, and probabilistic iterations are defined here as well.

  2. Parameterization: Claude uses an adapted version of the Batch Importer to define the values for all model parameters. It also performs probabilistic draws for each parameter value using its error term and the method of moments. These draws can be used for probabilistic analysis.

  3. Create Probability and Payoff Functions: Claude will write mathematical functions to describe the transition probabilities between model states, health care system costs, and health state utilities. Claude will then create matrices to hold this information.

  4. Generate Markov Trace: Claude programs the model to calculate state membership over the course of the model’s time horizon based on the values generated in the previous steps. Estimates for total costs and quality-adjusted life years (QALYs) are calculated in this step using matrix multiplication. Depending on the user’s specifications, the analysis may be deterministic or probabilistic.

  5. Perform Cost-Effectiveness Analysis: incremental cost-effectiveness is calculated for the experimental and comparator treatments.

  6. Probabilistic Analysis and Data Visualization: perform probabilistic analysis. Claude also writes scripts to calculate outcomes like the proportion of ICERs that lie below a user-specified willingness-to-pay (WTP) threshold and generates cost-effectiveness acceptability curves (CEACs).

Each step contains validation checks to evaluate whether the code is producing outputs in the expected way. The user reviews and inspects these validations checks and, if they pass, instructs Claude to continue. The final output of this process is a series of R scripts that power the model. The user can run these scripts to estimate cost-effectiveness.

Parts of the Skill were developed in Claude Sonnet version 4.5, with final development conducted using Haiku version 4.5. I find it generally works better with Sonnet than with Haiku. A package called ClaudeR that allows Claude to execute R commands on the user’s computer was also used to validate code and resolve error messages.

Modeling Methods Approach

The Claude Skill used template code described in R for Health Technology Assessment, a methods textbook edited by Baio, Thom, and Pechlivanouglou. This book is a collection of coding practices and approaches for building HTA-relevant analysis, including cost-effectiveness models, in R, with more than 30 co-authors from around the world. This textbook likely represents the most authoritative source for R-based model specification in the available literature. Specifically for this exercise, Chapter 9 (authors: Thom, Soares, Krijkamp, and Lamrock) was used to instruct Claude how to develop a Markov model.

The instructions from the book were supplemented by code I developed to handle parameters and conduct probabilistic analysis. In this code, parameter values are stored for both deterministic and probabilistic analysis. Each probabilistic run is an instance of the underlying logic of the deterministic model with a set of probabilistically sampled parameter values. Probabilistic sampling is conducted through the method of moments, using the mean and error terms for each parameter and making standard assumptions about parameter uncertainty distributions (gamma for costs, beta for probabilities, etc.).

Step-by-step Skill development

Working with an AI is an iterative process where the user uses natural language to prompt the AI to answer questions or follow directives. Building a Skill with an AI followed this same approach. The user (me) asked Claude to build a Skill that I could use to replicate models, and Claude jumped into action. The ensuing design of the Skill followed a back-and-forth process where I gave guidance and direction.

For transparency’s sake I should say that this was my second attempt at building a Skill. I had previous tried another model-building method that Claude really struggled with. Despite the failure in my first attempt, Claude has access to previous chats and I had primed it with some sample code. While the process outlined in the table looks very short, it was only this straightforward because I had done a fair amount of previous work.

Table 1: Building the Skill with Claude AI

Step User Prompt Claude’s Action Claude’s Prompt of User
1 We are going to learn how to replicate ICER models using R. First, read this document and save as an artifact (but not the Discussion): [User provided full markdown content of Markov chapter directly] Saved complete Markov models chapter as artifact (markov_models_chapter.md); extracted key sections covering cohort Markov models in discrete time, health states, transition matrices, half-cycle correction basics Now that I have the chapter, what skill structure do you propose? Should it be: (1) 6-step workflow (Extract/Parameterize/Structure/Trace/Results/PSA), (2) Progressive disclosure with core concepts, or (3) Example-driven with reference materials?
2 [Implicit: proceed with skill structuring] Reviewed existing r-hta-markov-replicator skill to identify structure; began analyzing how to organize reference chapter into skill format; created initial skill draft with 6-step workflow structure matching reference methodology Should I transform this into a complete Claude skill? I need to clarify: does your existing skill match the reference chapter’s methodology exactly, or are there discrepancies?
3 Conduct a thorough review comparing the skill to the reference document and identify any discrepancies. Performed detailed line-by-line comparison of skill vs reference document; identified 10 key discrepancies: (1) Discount factors pattern in Results section, (2) Half-cycle correction explanation, (3) Trace validation logic, (4) PSA sampling approach, and 6 others; created comprehensive COMPREHENSIVE_DISCREPANCY_REVIEW.md document listing all 10 with explanations Most importantly, I found something in the deterministic model that looks like a bug: rep(c(0:(n_cycles/2-1)), each=2) for discount exponents. Is this intentional, or should it be 0:(n_cycles-1)?
4 Investigate the discount factor pattern. It appears in line 191 of the reference but NOT in line 353 (PSA section). Is this a mistake or intentional? Conducted deep investigation into half-cycle correction methodology; discovered it’s NOT a bug—it’s deliberate implementation of standard HTA practice; created detailed analysis showing: (1) what half-cycle correction is (reducing bias from discrete-time assumption), (2) how the pattern works (repeating exponents to approximate mid-cycle timing), (3) why reference uses it (standard NICE DSU guidance), (4) why PSA section doesn’t use it (simplification for probabilistic approach) The pattern rep(c(0:(n_cycles/2-1)), each=2) is actually half-cycle correction. Should I update the skill to explain this clearly and implement it as the default approach?
5 Update the skill Completely revised Results section of skill to: (1) Explain half-cycle correction concept and rationale, (2) Show both approaches (with and without correction), (3) Implement half-cycle correction as default, (4) Provide detailed code examples with comments, (5) Create validation checks, (6) Document why NICE DSU recommends this approach Skill updated. Should I also create a separate reference document (results-calculation.md) that goes deeper into half-cycle correction mechanics?
6 [Implicit: yes, create comprehensive documentation] Created detailed reference documents: (1) results-calculation with complete half-cycle correction explanation, (2) trace-validation with validation checks and common issues, (3) Updated comprehensive discrepancy review to show all 10 resolutions All 10 discrepancies have been resolved. The skill now matches the reference document’s methodology. Should I continue developing the remaining sections, or would you like to shift focus?

In the above conversation, you see an exchange between me and Claude. I ask Claude to read a document (the chapter from the R for HTA textbook) and then build a Skill following the methods in the text. Claude gives that a try and tests it out. There is some back and forth as I work with Claude to identify the cause of a misalignment (which turns out to be the half-cycle correction). With that solved, the updated Skill is created.

I am not going to include the full details about the first version of the Skill I tried to develop. Suffice it to say it followed the same sort of process described above, but took way longer. Claude had to be guided over every bump in the road. I needed to feed it lots of example code. It tried things that it insisted should have worked but that ultimately didn’t. It was sort of like mentoring a very skilled research assistant who was also a bit of a space case. The details are therefore not particularly interesting.

You will also see references to “discrepancies” in the chat summary. There were some recommended steps in the R for HTA textbook (validation checks, something to do with the half-cycle correction, something to do with how discounting is applied) that were different than what was in the Skill. In other words, Claude had read the text chapter and had made its own ‘choice’ about how to implement a few things. I went through that list and chose which things to align and which were unimportant. Generally they were all minor and most simply resulted in adding some quality checks to the code.

Result

The Replicator Skill is a directory comprised of the following elements:

SKILL.md (8.4 KB)

The core skill file with:

  • 6-step workflow overview (Extract → Parameterize → Structure → Trace → Results → Probabilistic)
  • Quick start code examples for each step
  • Half-cycle correction explanation
  • Validation checkpoints
  • Common pitfalls and best practices
  • File structure and timing guidance

Reference Files (6 files, 27 KB total)

  1. extraction-checklist.md (3.7 KB)

    • Checklist for identifying all model components
    • Parameter extraction templates
    • Documentation completeness verification
  2. parameterization-guide.md (7.7 KB)

    • External parameter organization
    • Rate-to-probability conversions
    • Hazard ratio application formulas
    • Probability function structure
    • Payoff function examples
    • PSA distribution guidance
  3. structure-validation.md (3.6 KB)

    • Transition matrix creation
    • 6 comprehensive validation checks
    • Time-dependent matrices
    • Troubleshooting guide
  4. trace-validation.md (3.2 KB)

    • Markov trace generation
    • Validation checks and interpretation
    • Common issues and fixes
  5. results-calculation.md (5.1 KB)

    • Half-cycle correction implementation (detailed)
    • Discounting formulas
    • Incremental analysis
    • ICER and net health benefit calculations
    • Sensitivity analysis examples
  6. psa-guide.md (6 KB)

    • PSA setup and sample sizing
    • Distribution types and formulas
    • Correlated parameter sampling
    • Visualizations (cost-effectiveness plane, CEAC)
    • Result interpretation

Script (1.7 KB)

validate_markov_model.R

  • Basic validation checks for model correctness
  • Check transition matrix row sums
  • Verify trace conservation
  • Monitor absorbing states

Skill Structure

markov-models-skill.skill (zip file)
├── markov-models-skill/
│   ├── SKILL.md (required)
│   ├── references/ (6 files)
│   │   ├── extraction-checklist.md
│   │   ├── parameterization-guide.md
│   │   ├── structure-validation.md
│   │   ├── trace-validation.md
│   │   ├── results-calculation.md
│   │   └── psa-guide.md
│   └── scripts/ (1 file)
│       └── validate_markov_model.R

In another post, I’m going to describe a case study where I use the Replicator Skill to build a functioning version of a Markov model described in a public-facing document. I’ll follow that with a longer discussion of the work and its implications. For now, I will close this post by saying that this process really highlighted for me that AI technologies are a tool that still very much need human guidance. With Skills, you can automate some of that guidance, to great effect.