March 27, 2025

The Digital Benefits Network and Massive Data Institute Experiment with AI to Translate Public Benefits Policies into Software Code

Georgetown University’s Digital Benefits Network (DBN) at the Beeck Center for Social Impact + Innovation and Massive Data Institute (MDI) teams are among the first to extensively research how to apply a Rules as Code framework to the U.S. public benefits system. We draw on international research and examples, and have identified numerous U.S.-based projects that could inform a national strategy, along with a shared syntax and data standard.

On March 24, 2025, we released a new report, AI-Powered Rules as Code: Experiments with Public Benefits Policy, which documents four experiments exploring if artificial intelligence (AI) can be used to expedite the translation of policies into software code for implementation in public benefits eligibility and enrollment systems under a Rules as Code approach. We also offer a summary + key takeaways version.

To increase access for eligible individuals and ensure system accuracy, it is imperative to quickly build a better understanding of how generative AI technologies, in particular commercially-available large language models (LLMs), can, or cannot, aid in the translation and standardization of eligibility rules in software code for these vital public benefits programs. We see a critical opportunity to create open, foundational resources for benefits eligibility logic that can inform new AI technologies and ensure that they are correctly interpreting benefits policies.

We conducted initial experiments from June to September 2024 during the Policy2Code Prototyping Challenge. The challenge was hosted by the DBN and the MDI, as part of the Rule as Code Community of Practice. Twelve teams from the U.S. and Canada participated in the resulting Policy2Code Demo Day at BenCon 2024. We finished running the experiments and completing the analysis from October 2024 to February 2025. Our research team includes students, faculty, and staff from the DBN, MDI, and the Center for Security and Emerging Technology (CSET).

Our incredible students, Jason Goodman, MPP/MBA candidate, Alessandra Garcia Guevara, computer science undergraduate candidate, and Kenny Mohammed, computer science PhD candidate, presenting at the Policy2Code Demo Day — Our incredible students, Jason Goodman, MPP/MBA candidate, Alessandra Garcia Guevara, computer science undergraduate candidate, and Mohamed (Kenny) Ahmed, computer science PhD candidate, presenting at the Policy2Code Demo Day

Our team experimented with multiple LLMs and methodologies to determine how well the LLMs could translate Supplemental Nutrition Assistance Program (SNAP) and Medicaid policies across seven different states. We found that LLMs are capable of supporting the process of generating code from policy, but still require external knowledge and human oversight within an iterative process for any policies containing complex logic.

Our experiments addressed the following questions:

Experiment 1: How well can LLM chatbots answer general SNAP and Medicaid eligibility questions based on their training data and/or resources available on the internet? What factors affect their responses?

We tested how three LLMs—the web browser versions of ChatGPT and Gemini, as well as the GPT-4o API—extracted and provided information about eligibility for different policies.

Experiment 2: How well can an LLM generate accurate, complete, and logical summaries of benefits policy rules when provided official policy documents?

We used Retrieval-Augmented Generation (RAG), a technique that allowed an LLM to use not only its training data to respond to queries, but also knowledge from authoritative sources provided by the user — in this case authoritative benefits program policy documents.

We prompted the LLM (GPT-4o) to populate the predefined rules with corresponding values based on a specified program and state. The goal was to compare the ability of readily available LLMs and LLMs with additional external knowledge (e.g. RAG) to generate programmatic policy rules.

Experiment 4: How effectively can an LLM generate software code to determine eligibility for public benefits programs?

We developed three distinct experimental designs to assess how different inputs and prompting strategies impacted the quality and effectiveness of LLM-generated software code. Our goal was to determine the optimal input structure and level of guidance required by LLMs to generate accurate and functional code.

In the report and summary you will learn more about key takeaways including:

LLMs can help support the Rules as Code pipeline. LLMs can extract programmable rules from policy by leveraging expert knowledge retrieved from policy documents and employing well-crafted templates.
LLMs achieve better policy-to-code conversion when prompts are detailed and the policy logic is simple.
State governments can make it easier for LLMs to use their policies by making them digitally accessible.
Humans must be in the loop to review outputs from LLMs. Accuracy and equity considerations must outweigh efficiency in high-stakes benefits systems.
Current web-based chatbots have mixed results, often risking incorrect information presented in a confident tone.

A sample of results from Experiment 2, which used the GPT-4o API to answer questions about SNAP and Medicaid using the official policy documents from seven states.

Ongoing Open Experimentation

We encourage ongoing, open experimentation to test the application of LLMs in public benefits eligibility and the development of Rules as Code. In the report, we offer ideas on how to repeat and build upon our experiments.

The report includes methodology details, results in tables and visualizations, and findings including considerations for public benefits use cases. Each experiment includes openly accessible materials in a linked appendix. We share the rubrics we developed to evaluate the accuracy, completeness, and relevance of the responses. Additionally, we share our prompts, responses, and scores.

Get in Touch

We’re eager to hear from you. This work, like any public-facing project, improves as more stakeholders offer their input. We welcome your thoughts, questions, or potential collaboration ideas. Email us at rulesascode@georgetown.edu.