SDTM Automation Agent v2.0 — AI-Powered SDTM Dataset + SAS Program Generation
The SDTM Automation Agent is an AI-powered pipeline that takes raw clinical data and SDTM mapping specifications, then automatically generates CDISC-compliant SDTM datasets and the SAS programs to reproduce them.
The SDTM Automation Agent uses a multi-agent architecture. Each agent handles one stage:
| # | Agent | Input | Output |
|---|---|---|---|
| 1 | Spec Agent | SDTMIG specs (JSON/Excel) | 100+ structured mapping rules |
| 2 | Data Agent | Raw clinical CSV/XPT/SAS7BDAT | Parsed, validated raw datasets |
| 3 | Build Agent | Raw data + spec rules | CDISC-compliant SDTM datasets |
| 4 | Code Agent | SDTM variable mappings | SAS programs (DATA step, SORT, LABEL) |
| 5 | QC Agent | SDTM vs spec comparison | Validation findings + issues |
| 6 | Report Agent | All outputs | Audit trail markdown report |
Each SDTM variable is derived using one of four methods:
| Type | Description | Example |
|---|---|---|
| direct | Direct copy from raw source variable | AGE = dm_raw.AGE |
| decode | Value-level mapping with lookup table | ARMCD: TRT→"TRT", PBO→"PBO" |
| hardcoded | Static value assigned to variable | STUDYID = "CDISCPILOT01" |
| computed | Derived via calculation or logic | AESTDY = AESTDTC - RFSTDTC + 1 |
Want to test with your own clinical data? Use the /api/upload endpoint:
curl -X POST https://sdtm-demo.clincoder.cloud/api/upload \ -F "file=@your_raw_data.csv" \ -F "domain=DM"
curl -X POST https://sdtm-demo.clincoder.cloud/api/upload \ -H "Content-Type: text/csv" \ -H "X-Domain: DM" \ -d 'SUBJID,AGE,SEX,RACE,ARMCD,COUNTRY 001,45,M,WHITE,TRT,USA 002,62,F,ASIAN,PBO,CAN'
SUBJID,AGE,SEX,RACE,ARMCD,COUNTRY 001,45,M,WHITE,TRT,USA 002,62,F,ASIAN,PBO,CAN 003,38,F,BLACK OR AFRICAN AMERICAN,TRT,USA 004,55,M,WHITE,PBO,USA 005,49,M,OTHER,TRT,USA 006,28,F,ASIAN,PBO,CAN 007,71,M,WHITE,TRT,USA 008,33,F,BLACK OR AFRICAN AMERICAN,PBO,USA 009,44,M,WHITE,TRT,USA 010,56,F,ASIAN,SCRNFAIL,USA
USUBJID,AETERM,AESEV,AESER,AEACN,AEREL,AESTDTC,AEENDTC CDISCPILOT01-001,HEADACHE,MILD,N,DOSE NOT CHANGED,NOT RELATED,2025-03-15,2025-03-16 CDISCPILOT01-001,NAUSEA,MODERATE,N,DOSE NOT CHANGED,POSSIBLE,2025-03-20,2025-03-22 CDISCPILOT01-002,DIZZINESS,SEVERE,Y,DRUG WITHDRAWN,PROBABLE,2025-04-01,2025-04-05 CDISCPILOT01-002,FATIGUE,MILD,N,DOSE NOT CHANGED,NOT RELATED,2025-04-10,2025-04-12
All endpoints return JSON unless noted.
Health check. Returns service status.
{"status": "ok", "service": "sdtm-automation-agent", "version": "2.0.0", "domains": 8}
Run full demo pipeline (5 subjects, 3 domains: DM, AE, LB). Returns plain text report.
Download a sample SDTM DM dataset as CSV.
Download a sample SAS program for AE domain.
Full SDTMIG 3.4 specification JSON. 8 domains, 100+ mapping rules.
List available SDTM domains with variable counts and structure.
{"DM": {"label": "Demographics", "class": "Special Purpose", "variables": 16},
"AE": {"label": "Adverse Events", "class": "Events", "variables": 14},
"LB": {"label": "Lab Results", "class": "Findings", "variables": 16}}
Upload raw CSV data. Returns SDTM dataset + SAS code.
Parameters:
file (multipart) — the CSV filedomain (multipart) — target SDTM domain (DM, AE, LB, etc.)Response: JSON with sdtm_csv, sas_code, mapping_summary, validation
Quick test data you can paste into the upload endpoint:
curl -s -X POST https://sdtm-demo.clincoder.cloud/api/upload \ -F "file=@-;filename=raw_dm.csv" \ -F "domain=DM" << 'EOF' SUBJID,AGE,SEX,RACE,ARMCD,COUNTRY 001,45,M,WHITE,TRT,USA 002,62,F,ASIAN,PBO,CAN EOF
curl -s -X POST https://sdtm-demo.clincoder.cloud/api/upload \ -H "Content-Type: text/plain" \ -d 'domain: DM SUBJID,AGE,SEX,RACE 001,35,M,WHITE 002,42,F,ASIAN'
| Component | Technology | Details |
|---|---|---|
| HTTP Server | Python http.server | Runs on port 8933, managed by systemd |
| SDTM Engine | pandas + numpy | Pure Python, no external DB needed |
| SAS Generator | Template engine | Generates SAS DATA step with LENGTH, LABEL, SORT |
| MCP Server | JSON-RPC over stdio | 7 tools for LLM orchestration |
| Reverse Proxy | nginx | HTTPS with Let's Encrypt |
| Monitoring | systemd | Auto-restart on failure |
# /etc/systemd/system/sdtm-agent.service [Unit] Description=SDTM Automation Agent Demo After=network.target [Service] ExecStart=/root/hermes-venv/bin/python3 /root/sdtm-automation-agent/http_server.py Restart=always WorkingDirectory=/root/sdtm-automation-agent [Install] WantedBy=multi-user.target
CSV is the primary input format. The agent also supports XPT (SAS Transport) and SAS7BDAT via pyreadstat.
8 domains: DM (Demographics), AE (Adverse Events), LB (Lab Tests), CM (Concomitant Meds), VS (Vital Signs), EX (Exposure), DS (Disposition), MH (Medical History).
The spec parser supports custom JSON specs. Upload a spec JSON file and the agent will build your custom domain.
The output follows SDTMIG 3.4 conventions but is intended as a starting point. Final submission requires Pinnacle 21 validation and human review.
Pinnacle 21 validates existing SDTM datasets. This agent creates SDTM datasets from raw data — different tool for a different problem.
Yes. The MCP server exposes 7 tools that any LLM can call. Configure it in Claude Desktop's claude_desktop_config.json.
This is a portfolio project for the AI Agent Developer — SDTM Automation role at Redbock/NES Fircroft ($80-105/hr, Remote US). It demonstrates the exact skill set requested: AI agent development + SDTM/CDISC expertise + Python + SAS code generation + MCP.