Back to graph

Methods

How the Psychedelics Knowledge Graph is assembled: what literature is searched, how papers are screened, how evidence is extracted, and how everything is maintained.

Why

We are entering the era of agentic science, where AI agents can support many steps of knowledge generation. This creates an opportunity to address long-standing inefficiencies in academic workflows: findings are scattered across papers, evidence is split across disciplinary silos, and reviews require substantial expert labor, become static snapshots once published, and are difficult to query or reuse. As a result, human researchers struggle to see relationships, trends, gaps, and the overall shape of a field, while AI agents cannot easily use the evidence directly.

Agentic tools make it possible to build a new type of living evidence system, where literature discovery, screening, extraction, visualization, and updating are part of a continuous workflow. Findings can be converted into structured, provenance-rich records that remain linked to their source papers and evidence locators, so the evidence base can be searched, corrected, reused, and extended as the literature changes.

The Psychedelics Knowledge Graph applies this model to psychedelic research, where a fast-growing literature spans clinical, mechanistic, and translational work that is often read separately. Screened papers are converted into structured evidence records that power an interactive graph and dashboard. Human researchers can move from field-level patterns and gaps to source studies visually, while agents and analytic tools can query the same provenance-rich evidence directly.

Pipeline Overview

The workflow is designed to keep literature discovery broad while making graph inclusion conservative. Searches cast a wide net across clinical, biological, brain, behavioral, subjective, treatment-context, and real-world evidence. Each paper is then interpreted according to what it actually contains and how much source text is available, so the graph shows evidence that can be traced back to specific papers.

Define the Evidence Scope

The project starts with explicit vocabularies for psychedelic compounds and evidence domains: molecular targets, molecular pathways and cellular readouts, brain systems, cognitive and behavioral function, subjective experience, pharmacokinetics and exposure, intervention context, real-world use, clinical outcomes, functioning, and safety. These vocabularies define what the graph can represent and what the search needs to cover.

Discover Candidate Papers

PubMed and OpenAlex searches combine broad domain queries, focused compound-topic queries, and supplementary direct-pair checks. Results are matched by DOI and merged into a single paper library. Metadata enrichment uses different sources for different needs: PubMed, PMC, OpenAlex, Crossref, and Semantic Scholar for bibliographic records and abstracts; PubMed for publication-type labels; and Unpaywall, OpenAlex, and PMC for open-access full-text or PDF links.

Screen and Route

Candidate papers are screened for clear psychedelic relevance using their titles and abstracts. Papers that remain in scope are routed by evidence domain, publication type, and available source text. This separates, for example, primary studies from reviews and meta-analyses, and lets the extraction step use different expectations for full-text and abstract-based evidence.

Extract Structured Evidence

Eligible papers are processed with LLM-based, route-specific extraction instructions. The model identifies candidate structured evidence: compounds, evidence domains, study type, assay or outcome details, result direction, and source locators. When PDFs are available, they are first converted with GROBID into structured TEI full-text artifacts so the extraction step can use article sections, tables, figures, and references as auditable evidence anchors. Abstract-only records are handled more conservatively because they expose less of the underlying evidence.

Validate and Publish

Extracted evidence is checked for completeness, consistency, and source support before it appears in the public graph. Accepted records become graph relationships linking psychedelic compounds to targets, pathways, brain systems, tasks, clinical outcomes, safety outcomes, and study contexts. Records that are ambiguous or insufficiently supported are held back for review.

Maintain the Living Graph

The graph is designed to evolve as the literature grows. New papers, corrected metadata, improved extraction, and community feedback can all update the evidence base. Each public build is versioned, and release notes summarize what changed so readers can understand how the graph evolves over time.

Literature Search Strategy

The search is organized around evidence domains: molecular targets, molecular pathways and cellular readouts, brain systems, cognitive and behavioral functions, subjective experience, pharmacokinetics and exposure, intervention delivery context, real-world use and public health, clinical outcomes and safety, and clinical studies that measure biological or behavioral endpoints.

PubMed was used for curated biomedical indexing, and OpenAlex was used for broader scholarly coverage across journals, books, and preprints. Searches use the same three-block structure: compound terms, domain-specific entity or outcome terms, and evidence-context terms. Terms inside each block are joined with OR; the blocks are joined with AND. Broad modules cover domain families, while focused modules target well-studied compound-topic combinations so that important papers are not captured only through broad queries.

Molecular targets 10 grouped term combinations
  1. Broad target-family modules

    Modules
    serotonin receptors; monoamine transporters; glutamate/NMDA/AMPA/mGluR2 targets; opioid, sigma, and TAAR targets; plasticity, TrkB, and BDNF target evidence
    Compound block
    classic psychedelic, entactogen, dissociative, psychoplastogen, and compound-specific terms including psilocybin, psilocin, LSD, DMT, 5-MeO-DMT, mescaline, MDMA, ketamine, salvinorin A, ibogaine, and noribogaine
    Entity block
    5-HT receptor families; SERT, DAT, NET, and VMAT2; NMDA, AMPA, and mGluR2 receptors; kappa and mu opioid receptors; sigma-1 receptor; TAAR1; TrkB, BDNF, and neuroplasticity targets
    Evidence block
    binding; affinity; Ki; Kd; IC50; EC50; radioligand; functional assay; agonist; antagonist; partial agonist; signaling
  2. Focused compound-target modules

    Modules
    LSD-5-HT2A; psilocin/psilocybin-5-HT2A; MDMA transporters; ketamine-NMDA; salvinorin A-kappa opioid receptor
    Query emphasis
    narrower compound and target names paired with binding, receptor pharmacology, transporter, channel-blocker, functional-assay, beta-arrestin, and G-protein terms
Molecular pathways and cellular readouts 5 grouped term combinations
  1. Broad molecular/pathway modules

    Modules
    molecular pathway plasticity; gene expression and transcriptomics; inflammatory and neuroendocrine molecular readouts
    Entity block
    BDNF; TrkB; NTRK2; mTOR; ERK; MAPK; CREB; Akt; synaptogenesis; dendritic spine; synaptic plasticity; c-Fos; Arc; immediate early genes; transcriptome; epigenetic terms; cytokines; inflammation; cortisol; HPA axis
    Evidence block
    signaling; phosphorylation; expression; protein expression; gene expression; transcriptomics; western blot; qPCR; RNA-seq; immunohistochemistry; ELISA; molecular readout; plasticity; synaptic; neuronal
  2. Focused molecular/pathway modules

    Modules
    ketamine/psychedelic mTOR-synaptogenesis; psychedelic immediate early genes
    Query emphasis
    specific compound-pathway combinations involving ketamine, psilocybin, LSD, DMT, MDMA, DOI, mTOR, BDNF, TrkB, ERK, Akt, c-Fos, Fos, Arc, Egr1, and gene-expression assays
Brain systems, circuits, and neurophysiology 10 grouped term combinations
  1. Broad brain-system modules

    Modules
    systems neuroimaging and connectivity; brain regions and named circuits; PET, receptor occupancy, and metabolism; EEG, MEG, and neurophysiology
    Entity block
    default mode, salience, frontoparietal, central executive, limbic, visual, and sensorimotor networks; prefrontal cortex; anterior and posterior cingulate; hippocampus; amygdala; thalamus; claustrum; striatum; nucleus accumbens; insula; thalamo-cortical, cortico-striatal, fronto-limbic, hippocampal-prefrontal, amygdala-prefrontal, and mesolimbic reward circuits
    Evidence block
    fMRI; BOLD; resting-state; functional connectivity; effective connectivity; PET; receptor occupancy; FDG; cerebral blood flow; EEG; MEG; neural oscillations; ERP; entropy; electrophysiology; c-Fos; neuronal activity
  2. Focused brain-system modules

    Modules
    psilocybin-default mode connectivity; LSD-thalamocortical connectivity; DMT EEG/fMRI dynamics; ayahuasca-default mode connectivity; psilocybin PET/5-HT2A occupancy; ketamine prefrontal-hippocampal circuitry
    Query emphasis
    compound-specific network, circuit, imaging, receptor-occupancy, neural-dynamics, and electrophysiology terms
Cognitive and behavioral function 4 grouped term combinations
  1. Broad task and translational behavior modules

    Modules
    cognitive and affective task domains; translational behavioral assays
    Entity block
    cognitive flexibility; reversal learning; set shifting; fear conditioning; fear extinction; reward learning; social reward; social cognition; empathy; emotion recognition; attention; impulsivity; prepulse inhibition; working memory; forced swim; tail suspension; sucrose preference; social defeat; elevated plus maze; conditioned place preference; self-administration; relapse; head-twitch response
    Evidence block
    task; behavior; behavioural; learning; conditioning; performance; paradigm; mouse; rat; rodent; animal model; behavioral assay; in vivo; c-Fos
  2. Focused cognitive-behavioral modules

    Modules
    MDMA social reward and cognition; psychedelic fear extinction and flexibility
    Query emphasis
    compound-specific social reward, social cognition, empathy, emotion recognition, fear extinction, reversal learning, learning, conditioning, and performance terms
Subjective experience and pharmacokinetics 4 grouped term combinations
  1. Subjective experience and acute-effect modules

    Modules
    acute subjective effects and phenomenology; subjective-effect measures
    Entity block
    subjective effects; phenomenology; mystical experience; ego dissolution; altered state; oceanic boundlessness; challenging experience; anxiety; insight; emotional breakthrough; intensity; time perception; visual effects; questionnaire and rating-scale terms
    Evidence block
    acute effect; subjective rating; questionnaire; scale; psychometric; dose-response; controlled administration; human laboratory; experience report; outcome measure
  2. Pharmacokinetics and exposure modules

    Modules
    pharmacokinetics and exposure; metabolite and exposure measurement
    Entity block
    pharmacokinetics; ADME; absorption; distribution; metabolism; elimination; clearance; half-life; bioavailability; plasma concentration; blood concentration; serum concentration; metabolite; psilocin; noribogaine; route of administration; dose; protein binding
    Evidence block
    LC-MS; mass spectrometry; plasma; serum; blood; urine; sampling; concentration-time curve; Cmax; Tmax; AUC; pharmacokinetic model; analytical measurement
Intervention context and real-world use 4 grouped term combinations
  1. Intervention delivery-context modules

    Modules
    psychotherapy, preparation, integration, set and setting, session structure, psychological support, aftercare, blinding, training, and manualized-delivery terms; focused set/setting and therapeutic-support details
    Context block
    psychotherapy; psychological support; preparation; integration; set and setting; therapeutic alliance; music; guide; facilitator; therapist; session structure; dosing session; aftercare; manual; training; expectancy; blinding
    Evidence block
    clinical; feasibility; acceptability; qualitative; protocol; trial design; outcome; safety; adverse experience; implementation detail; supportive therapy
  2. Real-world use and public-health modules

    Modules
    epidemiology, surveys, naturalistic use, lifetime or past-year use, drug checking, poison-control and emergency records, toxicity, harm reduction; focused naturalistic, community, retreat, and microdosing use
    Context block
    naturalistic use; community use; retreat; ceremony; microdosing; epidemiology; prevalence; survey; lifetime use; past-year use; poison control; emergency department; toxicity; adverse experience; harm reduction; drug checking
    Evidence block
    observational; population; survey; cohort; registry; case series; risk; safety; mental health; wellbeing; adverse event; intoxication; exposure; public health
Clinical outcomes, symptoms, functioning, and safety 17 grouped term combinations
  1. Broad clinical outcome modules

    Modules
    clinical class core; depression spectrum; PTSD and trauma; substance use and addiction; anxiety, distress, and palliative care; pain, headache, and migraine; OCD, eating disorders, and autism
    Clinical block
    depression; major depressive disorder; treatment-resistant depression; PTSD; substance use disorder; alcohol, tobacco, opioid, cocaine, methamphetamine, stimulant, and cannabis use disorders; generalized and social anxiety; distress associated with life-threatening disease; OCD; eating disorders; autism spectrum disorder; headache disorders; migraine; chronic pain; fibromyalgia
    Evidence block
    clinical trial; randomized; randomised; placebo; open-label; treatment; therapy; efficacy; safety; tolerability; outcome; follow-up
  2. Focused clinical outcome modules

    Modules
    psilocybin-depression; MDMA-PTSD; ketamine-depression-suicidality; ibogaine-opioid/substance use disorder; LSD-alcohol/anxiety
    Query emphasis
    narrow compound-condition combinations paired with clinical trial, treatment, psychotherapy, abstinence, detoxification, withdrawal, craving, relapse, safety, and outcome terms
  3. Symptom and functioning modules

    Modules
    symptoms, functioning, and quality of life; suicidality, anhedonia, sleep, and function; craving, relapse, and functioning
    Outcome block
    suicidal ideation; suicidality; anhedonia; craving; withdrawal; relapse; sleep; insomnia; pain intensity; quality of life; wellbeing; social, occupational, emotional, and functional impairment terms
  4. Safety modules

    Modules
    safety, tolerability, and adverse events; cardiovascular, mania, psychosis, and HPPD safety
    Outcome block
    safety; tolerability; adverse events; serious adverse events; side effects; cardiovascular; blood pressure; heart rate; hypertension; mania; psychosis; dissociation; hallucinogen persisting perception disorder; flashback
Clinical studies with biological and behavioral endpoints 6 grouped term combinations
  1. Broad clinical endpoint modules

    Modules
    clinical population modules with brain, molecular, cognitive, and behavioral endpoints; clinical outcome endpoint modules
    Clinical block
    depression; major depressive disorder; treatment-resistant depression; PTSD; anxiety; substance use disorder; addiction; alcohol use disorder; opioid use disorder; craving; suicidal ideation; anhedonia; quality of life; patient populations; clinical outcomes
    Endpoint block
    fMRI; BOLD; functional connectivity; default mode network; amygdala; prefrontal cortex; PET; receptor occupancy; EEG; neural oscillations; cognitive task; cognitive flexibility; emotional processing; BDNF; cortisol; cytokines; molecular readouts
  2. Focused clinical endpoint modules

    Modules
    psilocybin depression brain and molecular endpoints; ketamine depression molecular endpoints; psilocybin depression brain/molecular endpoints; MDMA PTSD social-brain endpoints
    Query emphasis
    compound-condition combinations paired with imaging, connectivity, BDNF, cortisol, inflammation, cytokines, EEG, cognitive function, social cognition, emotion recognition, and brain-region terms
Supplementary targeted direct-pair searches very many pair combinations

The grouped domain searches are the primary discovery instrument. Direct-pair searches are used as a supplementary layer for selected compound-entity and compound-outcome combinations: larger generated grids were run for bounded target and clinical vocabularies, while later domain additions used targeted pair checks rather than an exhaustive cross-product of every possible compound and concept.

Pair layer Pair space How it was used
Molecular target grid Canonical compounds paired with the molecular target vocabulary 1,840 compound-target pairs were run as 5,520 binding, receptor-pharmacology, and functional-assay searches.
Clinical evidence grid Canonical compounds paired with the clinical evidence vocabulary 1,240 compound-clinical evidence pairs were run as 3,717 clinical-trial, randomized/placebo, and treatment-outcome searches.
Brain, network, and task pair files Canonical compounds paired with brain-region/network, circuit, and cognitive-behavioral task concepts Generated as versioned search artifacts for 3,600 newly added brain/network/task pairs, but not run as a complete direct-pair discovery search in this build.
Additional direct-pair check Selected sparse molecular, brain/network, cognitive-behavioral, symptom, functioning, and safety combinations 62 selected pair searches were run after the additional domain searches: 41 molecular/brain/cognitive/pathway searches and 21 clinical symptom/function/safety searches.
Compound vocabulary
LSD; psilocybin; psilocin; mescaline; DMT; 5-MeO-DMT; bufotenin; ayahuasca; ibogaine; noribogaine; MDMA; MDA; ketamine; esketamine; arketamine; DOI; DOB; DOM; DOET; 2C-B; 2C-E; 2C-I; 2C-T-2; 2C-T-7; 5-MeO-DiPT; DiPT; DPT; LSA; AL-LAD; ETH-LAD; PRO-LAD; 1P-LSD; salvinorin A; lisuride; Bromo-DragonFLY; 25I-NBOMe; 25B-NBOMe; 25C-NBOMe; TMA; TMA-2
Target vocabulary
5-HT2A; 5-HT2B; 5-HT2C; 5-HT1A; 5-HT1B; 5-HT1D; 5-HT1E; 5-HT1F; 5-HT5A; 5-HT6; 5-HT7; mGluR2 (GRM2); TAAR1; SERT (SLC6A4); NET (SLC6A2); DAT (SLC6A3); VMAT2 (SLC18A2); D1 receptor (DRD1); D2 receptor (DRD2); D3 receptor (DRD3); D4 receptor (DRD4); D5 receptor (DRD5); Alpha1A adrenergic receptor (ADRA1A); Alpha1B adrenergic receptor (ADRA1B); Alpha2A adrenergic receptor (ADRA2A); Alpha2B adrenergic receptor (ADRA2B); Alpha2C adrenergic receptor (ADRA2C); Beta1 adrenergic receptor (ADRB1); Beta2 adrenergic receptor (ADRB2); M1 muscarinic receptor (CHRM1); M2 muscarinic receptor (CHRM2); M3 muscarinic receptor (CHRM3); M4 muscarinic receptor (CHRM4); M5 muscarinic receptor (CHRM5); H1 receptor (HRH1); H2 receptor (HRH2); Sigma-1 receptor (SIGMAR1); Sigma-2 receptor (TMEM97); kappa opioid receptor (OPRK1); mu opioid receptor (OPRM1); delta opioid receptor (OPRD1); NMDA receptor; AMPA receptor; TrkB (NTRK2); CB1 receptor (CNR1); CB2 receptor (CNR2)
Brain region and network vocabulary
Prefrontal cortex; medial prefrontal cortex; orbitofrontal cortex; anterior cingulate cortex; posterior cingulate cortex; cingulate cortex; visual cortex; somatosensory cortex; insula; hippocampus; ventral hippocampus; dorsal hippocampus; amygdala; basolateral amygdala; central amygdala; striatum; ventral striatum; dorsal striatum; nucleus accumbens; caudate; putamen; thalamus; mediodorsal thalamus; reticular thalamus; claustrum; habenula; lateral habenula; dorsal raphe nucleus; raphe nucleus; ventral tegmental area; locus coeruleus; periaqueductal gray; default mode network; salience network; frontoparietal network; central executive network; limbic network; visual network; sensorimotor network; thalamo-cortical circuit; cortico-striatal circuit; cortical-subcortical circuit; fronto-limbic circuit; hippocampal-prefrontal circuit; amygdala-prefrontal circuit; mesolimbic reward circuit
Cognitive and behavioral vocabulary
Cognitive flexibility; reversal learning; probabilistic reversal learning; set shifting; attentional set shifting; Wisconsin Card Sorting Test; go/no-go task; stop-signal task; delay discounting; fear conditioning; fear extinction; extinction learning; threat processing; startle response; prepulse inhibition; conditioned freezing; reward learning; reinforcement learning; social reward learning; monetary incentive delay task; sucrose preference; conditioned place preference; self-administration; drug seeking; relapse behavior; emotional processing; emotion recognition; facial emotion recognition; social cognition; empathy; theory of mind; social behavior; social interaction; attention; continuous performance task; working memory; novel object recognition; spatial memory; forced swim test; tail suspension test; learned helplessness; chronic social defeat; open field test; elevated plus maze
Clinical vocabulary
Treatment-resistant depression; major depressive disorder; bipolar depression; persistent depressive disorder; post-traumatic stress disorder; complex post-traumatic stress disorder; alcohol use disorder; tobacco use disorder; nicotine dependence; opioid use disorder; cannabis use disorder; cocaine use disorder; methamphetamine use disorder; stimulant use disorder; substance use disorder; generalized anxiety disorder; social anxiety disorder; distress associated with life-threatening disease; obsessive-compulsive disorder; eating disorders; anorexia nervosa; bulimia nervosa; binge-eating disorder; autism spectrum disorder; demoralization; suicidal ideation; cluster headache; headache disorders; migraine; chronic pain; fibromyalgia
Molecular target query forms
{compound} {target} binding affinity Ki; {compound} {target} receptor pharmacology assay; {compound} {target} functional assay agonist antagonist
Brain/network query forms
{compound} {entity} functional connectivity; {compound} {entity} neuroimaging; {compound} {entity} circuit behavior
Cognitive-behavioral query forms
{compound} {entity} cognitive task; {compound} {entity} behavioral task; {compound} {entity} learning conditioning
Clinical query forms
{compound} {condition or outcome} randomized placebo; {compound} {condition or outcome} treatment outcome; {compound} {condition or outcome} clinical trial

PRISMA Flow Diagram

Evidence synthesis needs a visible record of what entered the review and what happened next. This PRISMA-style flow follows candidate papers from discovery through screening, full-text access, conversion, and inclusion. Side boxes show the current reasons papers leave or pause the full-text path. Because duplicate records are identified before papers are added, there is no separate duplicate-removal step in this diagram.

Loading paper flow.

Limits, Updates, And Reuse

Coverage depends on the search vocabulary, source indexing, DOI metadata, and access to full text. LLMs help scale screening and extraction, but their outputs stay tied to reviewable records. Evidence labels and risk-of-bias notes are descriptive unless a formal certainty or risk-of-bias workflow is added.

Validation

Evidence records are checked against the expected data format and evidence-policy rules before they are included in the public graph.

Known Limits

Search terms, provider indexing, missing abstracts, access restrictions, and PDF conversion quality all shape coverage.

Living Evidence

New searches can be scheduled, new records can be screened, and the public graph can be rebuilt when accepted evidence changes.

Data Boundary

The public graph publishes identifiers, metadata, structured evidence, provenance, and project-written summaries. Source texts remain governed by article licenses and copyright.

Corrections are part of the method. Missing papers, wrong edges, extraction errors, unclear labels, and scope suggestions all help improve the graph.

Open a GitHub issue