Help & Documentation

SDTM Automation Agent v2.0 — AI-Powered SDTM Dataset + SAS Program Generation

Overview

The SDTM Automation Agent is an AI-powered pipeline that takes raw clinical data and SDTM mapping specifications, then automatically generates CDISC-compliant SDTM datasets and the SAS programs to reproduce them.

Target Role: AI Agent Developer — SDTM Automation (Redbock/NES Fircroft)
Rate: $80-105/hr • Location: Remote US • Contract: 12 months

Key Capabilities

How It Works: 6-Agent Pipeline

The SDTM Automation Agent uses a multi-agent architecture. Each agent handles one stage:

#AgentInputOutput
1Spec AgentSDTMIG specs (JSON/Excel)100+ structured mapping rules
2Data AgentRaw clinical CSV/XPT/SAS7BDATParsed, validated raw datasets
3Build AgentRaw data + spec rulesCDISC-compliant SDTM datasets
4Code AgentSDTM variable mappingsSAS programs (DATA step, SORT, LABEL)
5QC AgentSDTM vs spec comparisonValidation findings + issues
6Report AgentAll outputsAudit trail markdown report

Mapping Types

Each SDTM variable is derived using one of four methods:

TypeDescriptionExample
directDirect copy from raw source variableAGE = dm_raw.AGE
decodeValue-level mapping with lookup tableARMCD: TRT→"TRT", PBO→"PBO"
hardcodedStatic value assigned to variableSTUDYID = "CDISCPILOT01"
computedDerived via calculation or logicAESTDY = AESTDTC - RFSTDTC + 1

Upload Your Own Data

Want to test with your own clinical data? Use the /api/upload endpoint:

Option 1: Via cURL

curl -X POST https://sdtm-demo.clincoder.cloud/api/upload \
  -F "file=@your_raw_data.csv" \
  -F "domain=DM"

Option 2: Raw CSV Body

curl -X POST https://sdtm-demo.clincoder.cloud/api/upload \
  -H "Content-Type: text/csv" \
  -H "X-Domain: DM" \
  -d 'SUBJID,AGE,SEX,RACE,ARMCD,COUNTRY
001,45,M,WHITE,TRT,USA
002,62,F,ASIAN,PBO,CAN'

Sample CSV: DM Domain

SUBJID,AGE,SEX,RACE,ARMCD,COUNTRY
001,45,M,WHITE,TRT,USA
002,62,F,ASIAN,PBO,CAN
003,38,F,BLACK OR AFRICAN AMERICAN,TRT,USA
004,55,M,WHITE,PBO,USA
005,49,M,OTHER,TRT,USA
006,28,F,ASIAN,PBO,CAN
007,71,M,WHITE,TRT,USA
008,33,F,BLACK OR AFRICAN AMERICAN,PBO,USA
009,44,M,WHITE,TRT,USA
010,56,F,ASIAN,SCRNFAIL,USA

Sample CSV: AE Domain

USUBJID,AETERM,AESEV,AESER,AEACN,AEREL,AESTDTC,AEENDTC
CDISCPILOT01-001,HEADACHE,MILD,N,DOSE NOT CHANGED,NOT RELATED,2025-03-15,2025-03-16
CDISCPILOT01-001,NAUSEA,MODERATE,N,DOSE NOT CHANGED,POSSIBLE,2025-03-20,2025-03-22
CDISCPILOT01-002,DIZZINESS,SEVERE,Y,DRUG WITHDRAWN,PROBABLE,2025-04-01,2025-04-05
CDISCPILOT01-002,FATIGUE,MILD,N,DOSE NOT CHANGED,NOT RELATED,2025-04-10,2025-04-12

API Reference

All endpoints return JSON unless noted.

GET /health

Health check. Returns service status.

{"status": "ok", "service": "sdtm-automation-agent", "version": "2.0.0", "domains": 8}

GET /api/demo

Run full demo pipeline (5 subjects, 3 domains: DM, AE, LB). Returns plain text report.

GET /api/dm.csv

Download a sample SDTM DM dataset as CSV.

GET /api/ae.sas

Download a sample SAS program for AE domain.

GET /api/specs

Full SDTMIG 3.4 specification JSON. 8 domains, 100+ mapping rules.

GET /api/domains

List available SDTM domains with variable counts and structure.

{"DM": {"label": "Demographics", "class": "Special Purpose", "variables": 16},
 "AE": {"label": "Adverse Events", "class": "Events", "variables": 14},
 "LB": {"label": "Lab Results", "class": "Findings", "variables": 16}}

POST /api/upload

Upload raw CSV data. Returns SDTM dataset + SAS code.

Parameters:

Response: JSON with sdtm_csv, sas_code, mapping_summary, validation

Sample Data & Test Cases

Quick test data you can paste into the upload endpoint:

Test 1: Basic DM Upload

curl -s -X POST https://sdtm-demo.clincoder.cloud/api/upload \
  -F "file=@-;filename=raw_dm.csv" \
  -F "domain=DM" << 'EOF'
SUBJID,AGE,SEX,RACE,ARMCD,COUNTRY
001,45,M,WHITE,TRT,USA
002,62,F,ASIAN,PBO,CAN
EOF

Test 2: Minimal Upload (inline)

curl -s -X POST https://sdtm-demo.clincoder.cloud/api/upload \
  -H "Content-Type: text/plain" \
  -d 'domain: DM
SUBJID,AGE,SEX,RACE
001,35,M,WHITE
002,42,F,ASIAN'

Deployment Architecture

ComponentTechnologyDetails
HTTP ServerPython http.serverRuns on port 8933, managed by systemd
SDTM Enginepandas + numpyPure Python, no external DB needed
SAS GeneratorTemplate engineGenerates SAS DATA step with LENGTH, LABEL, SORT
MCP ServerJSON-RPC over stdio7 tools for LLM orchestration
Reverse ProxynginxHTTPS with Let's Encrypt
MonitoringsystemdAuto-restart on failure

Systemd Service

# /etc/systemd/system/sdtm-agent.service
[Unit]
Description=SDTM Automation Agent Demo
After=network.target

[Service]
ExecStart=/root/hermes-venv/bin/python3 /root/sdtm-automation-agent/http_server.py
Restart=always
WorkingDirectory=/root/sdtm-automation-agent

[Install]
WantedBy=multi-user.target

Frequently Asked Questions

What data formats are supported?

CSV is the primary input format. The agent also supports XPT (SAS Transport) and SAS7BDAT via pyreadstat.

Which SDTM domains are available?

8 domains: DM (Demographics), AE (Adverse Events), LB (Lab Tests), CM (Concomitant Meds), VS (Vital Signs), EX (Exposure), DS (Disposition), MH (Medical History).

Can I add my own domain?

The spec parser supports custom JSON specs. Upload a spec JSON file and the agent will build your custom domain.

Does this produce submission-ready SDTM?

The output follows SDTMIG 3.4 conventions but is intended as a starting point. Final submission requires Pinnacle 21 validation and human review.

How is this different from Pinnacle 21?

Pinnacle 21 validates existing SDTM datasets. This agent creates SDTM datasets from raw data — different tool for a different problem.

Can I connect this to Claude/ChatGPT?

Yes. The MCP server exposes 7 tools that any LLM can call. Configure it in Claude Desktop's claude_desktop_config.json.

What is the intended use case?

This is a portfolio project for the AI Agent Developer — SDTM Automation role at Redbock/NES Fircroft ($80-105/hr, Remote US). It demonstrates the exact skill set requested: AI agent development + SDTM/CDISC expertise + Python + SAS code generation + MCP.