⚛️ Human Digital Twin Knowledge Graph

Multilevel, multi‑omics graph connecting genomics, transcriptomics, proteomics, metabolomics, phenotypes, and clinical data. Built by Karute for predictive, personalized medicine.

13 Entity types 11+ Relationships Integrated LLM (Cypher/RAG) Digital Twin
← Central node: DigitalTwin (patient_id, age, sex) — linked to omics layers, diseases, biomarkers.

📦 Core entities (13)

1. Patient/Digital Twin — twin_id, name, age
2. Diseases (diabetes, cancer)
3. Genomic variants (DNA)
4. Transcriptomic (RNA)
5. Proteomic (proteins)
6. Metabolomic
7. Epigenomic
8. HPO terms (phenotypes)
9. GO terms (BP, MF, CC)
10. Signaling pathways
11. Biomarkers
12. Hormones
13. Target cells

🔗 11 relationship types

HAS_DISEASEHAS_VARIANTAFFECTS (variant→transcript)TRANSCRIBED_TOTRANSLATES_TOPRODUCES (protein→metabolite)REGULATES (epigenomic)ASSOCIATED_WITH (disease→HPO)ANNOTATED_WITH (protein→GO)BINDS_TO (hormone→target cell)INDICATES (biomarker→disease)

Example breast cancer: Alice (P1) HAS_DISEASE Breast Cancer; HAS_VARIANT BRCA1; variant AFFECTS transcript BRCA1-001; transcript TRANSLATES_TO protein BRCA1; protein ANNOTATED_WITH GO:DNA repair; biomarker CA-15-3 INDICATES Breast Cancer.

🗂️ Graph schema (label & property highlights)

Node examples & properties

  • GenomicVariant: variant_id, chromosome, position, ref/alt
  • Protein: uniprot_id, abundance, activity_state
  • Metabolite: metabolite_id, chemical_formula, concentration
  • HPOTerm: hpo_id, name, definition
  • SignalingPathway: reactome_id, description

Relationship properties

  • ASSOCIATED_WITH {odds_ratio, risk_allele, pmid}
  • BINDS_TO {affinity, kd}
  • HAS_GLUCOSE_STATE {fasting, timestamp}
  • INDICATES {sensitivity, specificity}
  • HAS_PHENOTYPE {onset_age, severity}
▶ Temporal versioning: DigitalTwin -[HAS_STATE_AT {timestamp}]-> GraphState (snapshot of biomarkers, expression)

🧬 Predictive applications

Early cancer (breast) prediction

Alice's twin

  • Genomic: BRCA1 pathogenic variant
  • Proteomic: CA‑15‑3 25% above baseline
  • Metabolomic: dysregulated X pathway
  • KG traversal → risk 12% → 48% integrated
🔍 LLM: “Schedule breast MRI & repeat proteomics in 6 weeks.”

Type 2 diabetes prediction

TCF7L2 risk variant + HbA1c 6.0% + BCAAs↑

5‑year risk: HIGH (40%)

Modifiable: BCAAs, TNF‑alpha

Subtype: Severe insulin deficient (based on postprandial spikes)

“GLP‑1 agonist may reduce glucose variability by 25%.”

⏱️ Dynamic updates & LLM query

New lab results create new graph states. LLM fine‑tuned to translate questions → Cypher:

“What biomarkers indicate breast cancer in patient?” → MATCH (b:Biomarker)-[:INDICATES]->(d:Disease {name:'Breast Cancer'}) RETURN b.name

🧅 Multi‑omics data layers

📀 DISEASE layer Cancer · Diabetes · CVD · Neuro
🧬 PHENOTYPE layer Hyperglycemia · Weight loss · Polyuria
🧪 METABOLOMICS Glucose · BCAAs · Lipids · Ketones
⚛️ PROTEOMICS Insulin · GLUT4 · Receptors · Cytokines
📝 TRANSCRIPTOMICS INS · TCF7L2 · PPARG expression
🔬 EPIGENOMICS DNA methylation · histone marks
🧬 GENOMICS SNPs · CNVs · indels · WGS
🧍 FOUNDATION Digital Twin profile · baseline

🔄 Cross‑layer integration example (diabetes)

🧬 GENOMICS: TCF7L2 risk variant
EPIGENOMICS: DNA methylation (PPARGC1A)
TRANSCRIPTOMICS: altered insulin receptor substrate
PROTEOMICS: ↓ adiponectin, ↑ cytokines
METABOLOMICS: elevated BCAAs, ceramides
PHENOTYPE: impaired glucose tolerance
DISEASE: Type 2 diabetes (high risk)

⚙️ Technology framework

Graph database

Neo4j / Amazon Neptune · TigerGraph. Property graph model with versioned states.

ETL & integration

Public ontologies: HPO, GO, Reactome, ClinVar, DisGeNET, HMDB, GTEx, ENCODE.

HDT‑LLM (specialized medical AI)

Base: LLaMA/Mistral fine‑tuned on biomedical text + Cypher generation. RAG pipeline: natural language → graph query → subgraph → answer.

Cloud: AWS/GCP, Kubernetes
Privacy: anonymized, federated learning
Graph algorithms: community detection, link prediction

📁 Data sources layer

dbSNPgnomADGTExUniProtHMDBENCODEHPOReactomeEHRTCGAClinVarKEGG

📐 Simplified graph snippet

DigitalTwin "Alice"
├─ HAS_VARIANT → TCF7L2 (rs7903146) → LOCATED_IN → Gene(TCF7L2)
│ └─ ASSOCIATED_WITH → Type2Diabetes
├─ HAS_GLUCOSE_STATE → HbA1c 6.5%
├─ HAS_PHENOTYPE → Hyperglycemia
└─ HAS_DISEASE_HISTORY → Type2Diabetes

Gene(TCF7L2) — TRANSCRIBES_TO → Transcript(TCF7L2 RNA) — TRANSLATES_TO → Protein(TCF7L2)
Protein(TCF7L2) — PARTICIPATES_IN → Wnt signaling pathway
Hormone(Insulin) — ACTIVATES → Insulin signaling pathway — OCCURS_IN → Beta cell

🕒 Temporal trajectory (diabetes risk)

🟢 Low (5%) 🟡 Medium (25%) 🟠 High (45%) 🔴 Very high (75%)

🧠 Vision & challenges

From reactive to predictive medicine. Causality vs association, data volume (WGS ~100GB), interpretability, ethics.

Karute's HDT-KG enables digital twin trajectory, early disease预警, and personalized intervention simulation.


Application 2 detailed: Diabetes scenarios – 5‑year risk, subtyping, treatment what‑if (semaglutide vs metformin) all encoded via KG paths.