Methods | Psychedelics Knowledge Graph

Why

We are entering the era of agentic science, where AI agents can support many steps of knowledge generation. This creates an opportunity to address long-standing inefficiencies in academic workflows: findings are scattered across papers, evidence is split across disciplinary silos, and reviews require substantial expert labor, become static snapshots once published, and are difficult to query or reuse. As a result, human researchers struggle to see relationships, trends, gaps, and the overall shape of a field, while AI agents cannot easily use the evidence directly.

Agentic tools make it possible to build a new type of living evidence system, where literature discovery, screening, extraction, visualization, and updating are part of a continuous workflow. Findings can be converted into structured, provenance-rich records that remain linked to their source papers and evidence locators, so the evidence base can be searched, corrected, reused, and extended as the literature changes.

The Psychedelics Knowledge Graph applies this model to psychedelic research, where a fast-growing literature spans clinical, mechanistic, and translational work that is often read separately. Screened papers are converted into structured evidence records that power an interactive graph and dashboard. Human researchers can move from field-level patterns and gaps to source studies visually, while agents and analytic tools can query the same provenance-rich evidence directly.

Pipeline Overview

The workflow is designed to keep literature discovery broad while making graph inclusion conservative. Searches cast a wide net across clinical, biological, brain, behavioral, subjective, treatment-context, and real-world evidence. Each paper is then interpreted according to what it actually contains and how much source text is available, so the graph shows evidence that can be traced back to specific papers.

Define the Evidence Scope

The project starts with explicit vocabularies for psychedelic compounds and evidence domains: molecular targets, molecular pathways and cellular readouts, brain systems, cognitive and behavioral function, subjective experience, pharmacokinetics and exposure, intervention context, real-world use, clinical outcomes, functioning, and safety. These vocabularies define what the graph can represent and what the search needs to cover.

Discover Candidate Papers

PubMed and OpenAlex searches combine broad domain queries, focused compound-topic queries, and supplementary direct-pair checks. Results are matched by DOI and merged into a single paper library. Metadata enrichment uses different sources for different needs: PubMed, PMC, OpenAlex, Crossref, and Semantic Scholar for bibliographic records and abstracts; PubMed for publication-type labels; and Unpaywall, OpenAlex, and PMC for open-access full-text or PDF links.

Screen and Route

Candidate papers are screened for clear psychedelic relevance using their titles and abstracts. Papers that remain in scope are routed by evidence domain, publication type, and available source text. This separates, for example, primary studies from reviews and meta-analyses, and lets the extraction step use different expectations for full-text and abstract-based evidence.

Extract Structured Evidence

Eligible papers are processed with LLM-based, route-specific extraction instructions. The model identifies candidate structured evidence: compounds, evidence domains, study type, assay or outcome details, result direction, and source locators. When PDFs are available, they are first converted with GROBID into structured TEI full-text artifacts so the extraction step can use article sections, tables, figures, and references as auditable evidence anchors. Abstract-only records are handled more conservatively because they expose less of the underlying evidence.

Validate and Publish

Extracted evidence is checked for completeness, consistency, and source support before it appears in the public graph. Accepted records become graph relationships linking psychedelic compounds to targets, pathways, brain systems, tasks, clinical outcomes, safety outcomes, and study contexts. Records that are ambiguous or insufficiently supported are held back for review.

Maintain the Living Graph

The graph is designed to evolve as the literature grows. New papers, corrected metadata, improved extraction, and community feedback can all update the evidence base. Each public build is versioned, and release notes summarize what changed so readers can understand how the graph evolves over time.

Literature Search Strategy

The search is organized around evidence domains: molecular targets, molecular pathways and cellular readouts, brain systems, cognitive and behavioral functions, subjective experience, pharmacokinetics and exposure, intervention delivery context, real-world use and public health, clinical outcomes and safety, and clinical studies that measure biological or behavioral endpoints.

PubMed was used for curated biomedical indexing, and OpenAlex was used for broader scholarly coverage across journals, books, and preprints. Searches use the same three-block structure: compound terms, domain-specific entity or outcome terms, and evidence-context terms. Terms inside each block are joined with OR; the blocks are joined with AND. Broad modules cover domain families, while focused modules target well-studied compound-topic combinations so that important papers are not captured only through broad queries.

Molecular targets 10 grouped term combinations

Broad target-family modules

Modules
serotonin receptors; monoamine transporters; glutamate/NMDA/AMPA/mGluR2 targets; opioid, sigma, and TAAR targets; plasticity, TrkB, and BDNF target evidence

Compound block
classic psychedelic, entactogen, dissociative, psychoplastogen, and compound-specific terms including psilocybin, psilocin, LSD, DMT, 5-MeO-DMT, mescaline, MDMA, ketamine, salvinorin A, ibogaine, and noribogaine

Entity block
5-HT receptor families; SERT, DAT, NET, and VMAT2; NMDA, AMPA, and mGluR2 receptors; kappa and mu opioid receptors; sigma-1 receptor; TAAR1; TrkB, BDNF, and neuroplasticity targets

Evidence block
binding; affinity; Ki; Kd; IC50; EC50; radioligand; functional assay; agonist; antagonist; partial agonist; signaling
Focused compound-target modules

Modules
LSD-5-HT2A; psilocin/psilocybin-5-HT2A; MDMA transporters; ketamine-NMDA; salvinorin A-kappa opioid receptor

Query emphasis
narrower compound and target names paired with binding, receptor pharmacology, transporter, channel-blocker, functional-assay, beta-arrestin, and G-protein terms

Molecular pathways and cellular readouts 5 grouped term combinations

Broad molecular/pathway modules

Modules
molecular pathway plasticity; gene expression and transcriptomics; inflammatory and neuroendocrine molecular readouts

Entity block
BDNF; TrkB; NTRK2; mTOR; ERK; MAPK; CREB; Akt; synaptogenesis; dendritic spine; synaptic plasticity; c-Fos; Arc; immediate early genes; transcriptome; epigenetic terms; cytokines; inflammation; cortisol; HPA axis

Evidence block
signaling; phosphorylation; expression; protein expression; gene expression; transcriptomics; western blot; qPCR; RNA-seq; immunohistochemistry; ELISA; molecular readout; plasticity; synaptic; neuronal
Focused molecular/pathway modules

Modules
ketamine/psychedelic mTOR-synaptogenesis; psychedelic immediate early genes

Query emphasis
specific compound-pathway combinations involving ketamine, psilocybin, LSD, DMT, MDMA, DOI, mTOR, BDNF, TrkB, ERK, Akt, c-Fos, Fos, Arc, Egr1, and gene-expression assays

Brain systems, circuits, and neurophysiology 10 grouped term combinations

Broad brain-system modules

Modules
systems neuroimaging and connectivity; brain regions and named circuits; PET, receptor occupancy, and metabolism; EEG, MEG, and neurophysiology

Entity block
default mode, salience, frontoparietal, central executive, limbic, visual, and sensorimotor networks; prefrontal cortex; anterior and posterior cingulate; hippocampus; amygdala; thalamus; claustrum; striatum; nucleus accumbens; insula; thalamo-cortical, cortico-striatal, fronto-limbic, hippocampal-prefrontal, amygdala-prefrontal, and mesolimbic reward circuits

Evidence block
fMRI; BOLD; resting-state; functional connectivity; effective connectivity; PET; receptor occupancy; FDG; cerebral blood flow; EEG; MEG; neural oscillations; ERP; entropy; electrophysiology; c-Fos; neuronal activity
Focused brain-system modules

Modules
psilocybin-default mode connectivity; LSD-thalamocortical connectivity; DMT EEG/fMRI dynamics; ayahuasca-default mode connectivity; psilocybin PET/5-HT2A occupancy; ketamine prefrontal-hippocampal circuitry

Query emphasis
compound-specific network, circuit, imaging, receptor-occupancy, neural-dynamics, and electrophysiology terms

Cognitive and behavioral function 4 grouped term combinations

Broad task and translational behavior modules

Modules
cognitive and affective task domains; translational behavioral assays

Entity block
cognitive flexibility; reversal learning; set shifting; fear conditioning; fear extinction; reward learning; social reward; social cognition; empathy; emotion recognition; attention; impulsivity; prepulse inhibition; working memory; forced swim; tail suspension; sucrose preference; social defeat; elevated plus maze; conditioned place preference; self-administration; relapse; head-twitch response

Evidence block
task; behavior; behavioural; learning; conditioning; performance; paradigm; mouse; rat; rodent; animal model; behavioral assay; in vivo; c-Fos
Focused cognitive-behavioral modules

Modules
MDMA social reward and cognition; psychedelic fear extinction and flexibility

Query emphasis
compound-specific social reward, social cognition, empathy, emotion recognition, fear extinction, reversal learning, learning, conditioning, and performance terms

Subjective experience and pharmacokinetics 4 grouped term combinations

Subjective experience and acute-effect modules

Modules
acute subjective effects and phenomenology; subjective-effect measures

Entity block
subjective effects; phenomenology; mystical experience; ego dissolution; altered state; oceanic boundlessness; challenging experience; anxiety; insight; emotional breakthrough; intensity; time perception; visual effects; questionnaire and rating-scale terms

Evidence block
acute effect; subjective rating; questionnaire; scale; psychometric; dose-response; controlled administration; human laboratory; experience report; outcome measure
Pharmacokinetics and exposure modules

Modules
pharmacokinetics and exposure; metabolite and exposure measurement

Entity block
pharmacokinetics; ADME; absorption; distribution; metabolism; elimination; clearance; half-life; bioavailability; plasma concentration; blood concentration; serum concentration; metabolite; psilocin; noribogaine; route of administration; dose; protein binding

Evidence block
LC-MS; mass spectrometry; plasma; serum; blood; urine; sampling; concentration-time curve; Cmax; Tmax; AUC; pharmacokinetic model; analytical measurement

Intervention context and real-world use 4 grouped term combinations

Intervention delivery-context modules

Modules
psychotherapy, preparation, integration, set and setting, session structure, psychological support, aftercare, blinding, training, and manualized-delivery terms; focused set/setting and therapeutic-support details

Context block
psychotherapy; psychological support; preparation; integration; set and setting; therapeutic alliance; music; guide; facilitator; therapist; session structure; dosing session; aftercare; manual; training; expectancy; blinding

Evidence block
clinical; feasibility; acceptability; qualitative; protocol; trial design; outcome; safety; adverse experience; implementation detail; supportive therapy
Real-world use and public-health modules

Modules
epidemiology, surveys, naturalistic use, lifetime or past-year use, drug checking, poison-control and emergency records, toxicity, harm reduction; focused naturalistic, community, retreat, and microdosing use

Context block
naturalistic use; community use; retreat; ceremony; microdosing; epidemiology; prevalence; survey; lifetime use; past-year use; poison control; emergency department; toxicity; adverse experience; harm reduction; drug checking

Evidence block
observational; population; survey; cohort; registry; case series; risk; safety; mental health; wellbeing; adverse event; intoxication; exposure; public health

Clinical outcomes, symptoms, functioning, and safety 17 grouped term combinations

Broad clinical outcome modules

Modules
clinical class core; depression spectrum; PTSD and trauma; substance use and addiction; anxiety, distress, and palliative care; pain, headache, and migraine; OCD, eating disorders, and autism

Clinical block
depression; major depressive disorder; treatment-resistant depression; PTSD; substance use disorder; alcohol, tobacco, opioid, cocaine, methamphetamine, stimulant, and cannabis use disorders; generalized and social anxiety; distress associated with life-threatening disease; OCD; eating disorders; autism spectrum disorder; headache disorders; migraine; chronic pain; fibromyalgia

Evidence block
clinical trial; randomized; randomised; placebo; open-label; treatment; therapy; efficacy; safety; tolerability; outcome; follow-up
Focused clinical outcome modules

Modules
psilocybin-depression; MDMA-PTSD; ketamine-depression-suicidality; ibogaine-opioid/substance use disorder; LSD-alcohol/anxiety

Query emphasis
narrow compound-condition combinations paired with clinical trial, treatment, psychotherapy, abstinence, detoxification, withdrawal, craving, relapse, safety, and outcome terms
Symptom and functioning modules

Modules
symptoms, functioning, and quality of life; suicidality, anhedonia, sleep, and function; craving, relapse, and functioning

Outcome block
suicidal ideation; suicidality; anhedonia; craving; withdrawal; relapse; sleep; insomnia; pain intensity; quality of life; wellbeing; social, occupational, emotional, and functional impairment terms
Safety modules

Modules
safety, tolerability, and adverse events; cardiovascular, mania, psychosis, and HPPD safety

Outcome block
safety; tolerability; adverse events; serious adverse events; side effects; cardiovascular; blood pressure; heart rate; hypertension; mania; psychosis; dissociation; hallucinogen persisting perception disorder; flashback

Clinical studies with biological and behavioral endpoints 6 grouped term combinations

Broad clinical endpoint modules

Modules
clinical population modules with brain, molecular, cognitive, and behavioral endpoints; clinical outcome endpoint modules

Clinical block
depression; major depressive disorder; treatment-resistant depression; PTSD; anxiety; substance use disorder; addiction; alcohol use disorder; opioid use disorder; craving; suicidal ideation; anhedonia; quality of life; patient populations; clinical outcomes

Endpoint block
fMRI; BOLD; functional connectivity; default mode network; amygdala; prefrontal cortex; PET; receptor occupancy; EEG; neural oscillations; cognitive task; cognitive flexibility; emotional processing; BDNF; cortisol; cytokines; molecular readouts
Focused clinical endpoint modules

Modules
psilocybin depression brain and molecular endpoints; ketamine depression molecular endpoints; psilocybin depression brain/molecular endpoints; MDMA PTSD social-brain endpoints

Query emphasis
compound-condition combinations paired with imaging, connectivity, BDNF, cortisol, inflammation, cytokines, EEG, cognitive function, social cognition, emotion recognition, and brain-region terms

Supplementary targeted direct-pair searches very many pair combinations

The grouped domain searches are the primary discovery instrument. Direct-pair searches are used as a supplementary layer for selected compound-entity and compound-outcome combinations: larger generated grids were run for bounded target and clinical vocabularies, while later domain additions used targeted pair checks rather than an exhaustive cross-product of every possible compound and concept.

Pair layer	Pair space	How it was used
Molecular target grid	Canonical compounds paired with the molecular target vocabulary	1,840 compound-target pairs were run as 5,520 binding, receptor-pharmacology, and functional-assay searches.
Clinical evidence grid	Canonical compounds paired with the clinical evidence vocabulary	1,240 compound-clinical evidence pairs were run as 3,717 clinical-trial, randomized/placebo, and treatment-outcome searches.
Brain, network, and task pair files	Canonical compounds paired with brain-region/network, circuit, and cognitive-behavioral task concepts	Generated as versioned search artifacts for 3,600 newly added brain/network/task pairs, but not run as a complete direct-pair discovery search in this build.
Additional direct-pair check	Selected sparse molecular, brain/network, cognitive-behavioral, symptom, functioning, and safety combinations	62 selected pair searches were run after the additional domain searches: 41 molecular/brain/cognitive/pathway searches and 21 clinical symptom/function/safety searches.

Compound vocabulary: LSD; psilocybin; psilocin; mescaline; DMT; 5-MeO-DMT; bufotenin; ayahuasca; ibogaine; noribogaine; MDMA; MDA; ketamine; esketamine; arketamine; DOI; DOB; DOM; DOET; 2C-B; 2C-E; 2C-I; 2C-T-2; 2C-T-7; 5-MeO-DiPT; DiPT; DPT; LSA; AL-LAD; ETH-LAD; PRO-LAD; 1P-LSD; salvinorin A; lisuride; Bromo-DragonFLY; 25I-NBOMe; 25B-NBOMe; 25C-NBOMe; TMA; TMA-2
Target vocabulary: 5-HT2A; 5-HT2B; 5-HT2C; 5-HT1A; 5-HT1B; 5-HT1D; 5-HT1E; 5-HT1F; 5-HT5A; 5-HT6; 5-HT7; mGluR2 (GRM2); TAAR1; SERT (SLC6A4); NET (SLC6A2); DAT (SLC6A3); VMAT2 (SLC18A2); D1 receptor (DRD1); D2 receptor (DRD2); D3 receptor (DRD3); D4 receptor (DRD4); D5 receptor (DRD5); Alpha1A adrenergic receptor (ADRA1A); Alpha1B adrenergic receptor (ADRA1B); Alpha2A adrenergic receptor (ADRA2A); Alpha2B adrenergic receptor (ADRA2B); Alpha2C adrenergic receptor (ADRA2C); Beta1 adrenergic receptor (ADRB1); Beta2 adrenergic receptor (ADRB2); M1 muscarinic receptor (CHRM1); M2 muscarinic receptor (CHRM2); M3 muscarinic receptor (CHRM3); M4 muscarinic receptor (CHRM4); M5 muscarinic receptor (CHRM5); H1 receptor (HRH1); H2 receptor (HRH2); Sigma-1 receptor (SIGMAR1); Sigma-2 receptor (TMEM97); kappa opioid receptor (OPRK1); mu opioid receptor (OPRM1); delta opioid receptor (OPRD1); NMDA receptor; AMPA receptor; TrkB (NTRK2); CB1 receptor (CNR1); CB2 receptor (CNR2)
Brain region and network vocabulary: Prefrontal cortex; medial prefrontal cortex; orbitofrontal cortex; anterior cingulate cortex; posterior cingulate cortex; cingulate cortex; visual cortex; somatosensory cortex; insula; hippocampus; ventral hippocampus; dorsal hippocampus; amygdala; basolateral amygdala; central amygdala; striatum; ventral striatum; dorsal striatum; nucleus accumbens; caudate; putamen; thalamus; mediodorsal thalamus; reticular thalamus; claustrum; habenula; lateral habenula; dorsal raphe nucleus; raphe nucleus; ventral tegmental area; locus coeruleus; periaqueductal gray; default mode network; salience network; frontoparietal network; central executive network; limbic network; visual network; sensorimotor network; thalamo-cortical circuit; cortico-striatal circuit; cortical-subcortical circuit; fronto-limbic circuit; hippocampal-prefrontal circuit; amygdala-prefrontal circuit; mesolimbic reward circuit
Cognitive and behavioral vocabulary: Cognitive flexibility; reversal learning; probabilistic reversal learning; set shifting; attentional set shifting; Wisconsin Card Sorting Test; go/no-go task; stop-signal task; delay discounting; fear conditioning; fear extinction; extinction learning; threat processing; startle response; prepulse inhibition; conditioned freezing; reward learning; reinforcement learning; social reward learning; monetary incentive delay task; sucrose preference; conditioned place preference; self-administration; drug seeking; relapse behavior; emotional processing; emotion recognition; facial emotion recognition; social cognition; empathy; theory of mind; social behavior; social interaction; attention; continuous performance task; working memory; novel object recognition; spatial memory; forced swim test; tail suspension test; learned helplessness; chronic social defeat; open field test; elevated plus maze
Clinical vocabulary: Treatment-resistant depression; major depressive disorder; bipolar depression; persistent depressive disorder; post-traumatic stress disorder; complex post-traumatic stress disorder; alcohol use disorder; tobacco use disorder; nicotine dependence; opioid use disorder; cannabis use disorder; cocaine use disorder; methamphetamine use disorder; stimulant use disorder; substance use disorder; generalized anxiety disorder; social anxiety disorder; distress associated with life-threatening disease; obsessive-compulsive disorder; eating disorders; anorexia nervosa; bulimia nervosa; binge-eating disorder; autism spectrum disorder; demoralization; suicidal ideation; cluster headache; headache disorders; migraine; chronic pain; fibromyalgia

Molecular target query forms: {compound} {target} binding affinity Ki; {compound} {target} receptor pharmacology assay; {compound} {target} functional assay agonist antagonist
Brain/network query forms: {compound} {entity} functional connectivity; {compound} {entity} neuroimaging; {compound} {entity} circuit behavior
Cognitive-behavioral query forms: {compound} {entity} cognitive task; {compound} {entity} behavioral task; {compound} {entity} learning conditioning
Clinical query forms: {compound} {condition or outcome} randomized placebo; {compound} {condition or outcome} treatment outcome; {compound} {condition or outcome} clinical trial

PRISMA Flow Diagram

Evidence synthesis needs a visible record of what entered the review and what happened next. This PRISMA-style flow follows candidate papers from discovery through screening, full-text access, conversion, and inclusion. Side boxes show the current reasons papers leave or pause the full-text path. Because duplicate records are identified before papers are added, there is no separate duplicate-removal step in this diagram.

Loading paper flow.

Limits, Updates, And Reuse

Coverage depends on the search vocabulary, source indexing, DOI metadata, and access to full text. LLMs help scale screening and extraction, but their outputs stay tied to reviewable records. Evidence labels and risk-of-bias notes are descriptive unless a formal certainty or risk-of-bias workflow is added.

Validation

Evidence records are checked against the expected data format and evidence-policy rules before they are included in the public graph.

Known Limits

Search terms, provider indexing, missing abstracts, access restrictions, and PDF conversion quality all shape coverage.

Living Evidence

New searches can be scheduled, new records can be screened, and the public graph can be rebuilt when accepted evidence changes.

Data Boundary

The public graph publishes identifiers, metadata, structured evidence, provenance, and project-written summaries. Source texts remain governed by article licenses and copyright.

Workflow guide Search completeness Search strategy Evidence policy Terminology

Corrections are part of the method. Missing papers, wrong edges, extraction errors, unclear labels, and scope suggestions all help improve the graph.

Open a GitHub issue

Why

Pipeline Overview

Define the Evidence Scope

Discover Candidate Papers

Screen and Route

Extract Structured Evidence

Validate and Publish

Maintain the Living Graph

Literature Search Strategy

Broad target-family modules

Focused compound-target modules

Broad molecular/pathway modules

Focused molecular/pathway modules

Broad brain-system modules

Focused brain-system modules

Broad task and translational behavior modules

Focused cognitive-behavioral modules

Subjective experience and acute-effect modules

Pharmacokinetics and exposure modules

Intervention delivery-context modules

Real-world use and public-health modules

Broad clinical outcome modules

Focused clinical outcome modules

Symptom and functioning modules

Safety modules