This repository contains the data files and analysis scripts underlying Chapters 2
and 4. The two top-level folders correspond to the two chapters and are also
provided as compressed archives (`chapter2.tar.gz`, `chapter4.tar.gz`).
- **Chapter 2** — Evolution of root-related gene families across Lamiales /
Orobanchaceae, combining OrthoFinder orthogroup counts, phylogenetic
comparative analyses (PGLS, CAFE5 gene-family evolution) and below-ground
(root/haustorium) transcriptomics.
- **Chapter 4** — Functional study of haustorium biology in the parasitic plant
*Striga hermonthica*, combining phenotypic bioassays and a time-course /
severing RNA-seq experiment.
File formats: `.txt`/`.tsv`/`.csv`/`.matrix` are tab- or comma-delimited plain
text; `.nwk` is a Newick phylogenetic tree; `.xlsx`/`.xls` are Excel workbooks;
`.R` are R scripts; `.py` Python 3 scripts; `.sh` a shell script. Hidden helper
files (`.DS_Store`, `.Rhistory`) are macOS/R artefacts and can be ignored.
---
## chapter2/ — Root gene-family evolution
### chapter2/Orthofinder_results/ — Orthogroup inference and root gene set
Comparative genomics across ~64 Lamiales species (plus outgroups), based on an
OrthoFinder run.
- **`ALL_OGs_speciescount.txt`** (61 MB) — Full matrix of orthogroup (OG) gene
counts per species; rows = orthogroups, columns = species.
- **`ROOT_OGs.txt`** (7.3 MB) — Membership list of the curated "root" orthogroups:
each row is an OG followed by the comma-separated protein IDs assigned to it.
- **`ROOT_OGs_speciescount.txt`** — Per-species gene counts restricted to the 254
root-related orthogroups.
- **`species_tree.checked_ultrametric.nwk`** — Time-calibrated (ultrametric)
species tree in Newick format used for all phylogenetic comparative analyses.
- **`species_family_order.txt`** — Taxonomic lookup table (species, order, family,
trophism: autotroph/parasite) for the analysed taxa.
- **`AT_feature_table.txt`** (3.0 MB) — *Arabidopsis thaliana* protein feature
table (accession, name, symbol, AT locus) used to annotate orthogroups.
- **`TAIR_NCBI_root_IDs.xlsx`** — Curated *Arabidopsis* root-gene reference list
(TAIR ID, gene symbols, name, function).
- **`OF_rootOGs_heatmap.R`** — R script pulling *A. thaliana* genes from the root
OGs and drawing the orthogroup heatmap.
### chapter2/PGLS/ — Phylogenetic comparative analysis & gene-family evolution
Pipeline relating root-OG gene-family sizes to lifestyle traits (heterotrophy,
mycorrhization) using phylogenetic generalised least squares (PGLS) and CAFE5.
`PGLS/scripts/`
- **`01_prepare_inputs.py`** — Reconciles the OrthoFinder count table, ultrametric
tree and trait files; emits CAFE5-ready and PGLS-ready inputs.
- **`02_pgls_ancestral.py`** — PGLS of root-related orthogroups against traits,
plus ancestral-state / branch-change reconstruction of total root content.
- **`02b_sl_pathway.py`** — PGLS of strigolactone (SL) receptor-pathway OGs vs
heterotrophy/mycorrhization (tested from the full count matrix).
- **`03_figures.py`** — Generates the summary figures (vector PDF/EPS).
- **`run_cafe.sh`** — Driver script running CAFE5 gene-family evolution with
retried lambda auto-estimation on the deep species tree.
`PGLS/tables/`
- **`pgls_rootOG_results.tsv`** — PGLS results per root OG (slopes, t-values,
p-values, Pagel's lambda, FDR) for the heterotrophy and mycorrhization models.
- **`pgls_SLpathway_results.tsv`** — PGLS results for the strigolactone
receptor-pathway orthogroups, with per-group mean counts.
- **`branch_changes_total_root.tsv`** — Inferred change in total root gene content
along each tree branch (parent vs child node totals).
- **`total_root_content.tsv`** — Reconstructed total root gene content per taxon.
- **`AT_OG_info.xlsx`** — Mapping of *Arabidopsis* genes to orthogroups with TAIR
IDs, symbols, names and functions.
### chapter2/DEG_analysis/ — Differential expression (root vs haustorium)
- **`MeAr.gene.counts.expr.txt`** — Gene-level expression count table for
*Melampyrum* root vs root+haustorium samples.
- **`MeAr_StHe_thesis.R`** — R script for the *Melampyrum* (MeAr) / *Striga* (StHe)
differential-expression analysis.
- **`MeAr_trinotate_annotation_report.xlsx`** (89 MB) — Trinotate functional
annotation report for the *Melampyrum* assembly.
- **`StHeBC3_trinotate_annotation_report.xlsx`** (17 MB) — Trinotate functional
annotation report for the *Striga hermonthica* (BC3) assembly.
### chapter2/belowground_transcriptomics/ — Cross-species root/haustorium expression
Per-species gene-expression count matrices used for the 254-OG WGCNA / below-ground
expression comparison. For each species there is a full matrix
(`*.gene.counts.expr.txt`) and a root-restricted subset (`*.gene.counts.expr.root.txt`).
Species are given by short codes, e.g. `Avog` = *Alectra vogelii*, `StHe` =
*Striga hermonthica*, `MeAr` = *Melampyrum*, `LiPhi` = *Lindenbergia* (autotroph
reference), `PhJa`, `Pram`, `ReEl`/`ReEr`, `RhFi`, `RhinMin` (please confirm exact
binomials against the manuscript taxon table).
- **`<code>.gene.counts.expr.txt`** — Full expression count matrix (all tissues),
one per species: `Avog`, `LiPhi`, `MeAr`, `PhJa`, `Pram`, `ReEr`, `RhFi`,
`RhinMin`, `StHe`.
- **`<code>.gene.counts.expr.root.txt`** — Root-tissue-restricted count matrix,
one per species: `Avog`, `LiPhi`, `MeAr`, `PhJa`, `Pram`, `ReEl`, `RhFi`,
`RhinMin`, `StHe`.
- **`trait_table_sp.csv`** — Per-sample root traits (root width µm, root hair µm,
parasitism category) used as metadata.
- **`BG_expression_WGCNA.R`** — R script building the 254-OG expression matrices
and running the WGCNA co-expression analysis.
---
## chapter4/ — *Striga* haustorium biology: bioassays & RNA-seq
### chapter4/Bioassays/ — Phenotypic experiments
- **`all_phenotypic_plots_final.R`** — Master R script for all phenotypic analyses
and plots (survival, flowering kinetics, growth), using `survival`/`survminer`
and `ggplot2`.
`Bioassays/density/`
- **`Lhaust_dai61_noNA.xlsx`** — Lateral-haustorium counts at 61 days after
inoculation (rhizotron, *Striga*, attachments).
- **`lat_development_dai0-61.xlsx`** — Lateral-haustorium development over time
(DAI 0–61).
`Bioassays/pots/`
- **`DAI85.xlsx`** — Pot-experiment phenotypes at 85 days after inoculation
(length, density, bud, branching; includes a `lengths` sheet).
- **`lengths.xlsx`** — *Striga* shoot length / density / branching measurements.
`Bioassays/severing_experiments/` — Haustorium/connection severing experiments
- **`Striga_survival_das0-84_withoutLoutliers.xlsx`** — Survival data (day, event,
group) over 0–84 days after sowing, outliers removed.
- **`floweringkinetic_tableNOoutliers_all.xlsx`** — Flowering kinetics (DAS,
flowering, group), outliers removed.
- **`strigagrowth_noLoutliers_forplot.xlsx`** — *Striga* height over time (DAS,
height, group) formatted for plotting.
`Bioassays/severing_experiments/supplementary/`
- **`percent_flowering.xlsx`** — Percent flowering over time (DAS) per group.
- **`survivaldata.xlsx`** — Supplementary survival data (time, event, group).
### chapter4/RNAsequencing/ — Time-course / severing transcriptomics
- **`rsem.kinetic.gene.counts.matrix`** (12 MB) — RSEM gene-count matrix for the
infection time-course (T0/T12/T24/T48 conditions, with replicates and controls).
- **`rsem_tvlv48.gene.counts.matrix`** (15 MB) — RSEM gene-count matrix for the
terminal- vs lateral-haustorium / severing comparison (incl. 48 h samples).
- **`DESeq2.R`** — R script for differential-expression analysis (DESeq2/limma).
- **`maSigPro_T0-24_thesisplots.R`** — R script for time-series differential
expression (maSigPro, T0–T24) and thesis figures.
- **`trinotate_annotation_report.xlsx`** (17 MB) — Trinotate functional annotation
report (BLASTX/BLASTP, Pfam, GO, etc.) for the assembly.
- **`trinotate_report_gene_ontology_with_parents.xls`** (34 MB) — GO annotations
with parent terms expanded.
- **`GO_terms_description.txt`** (3.5 MB) — GO ID → term name → namespace lookup
table used for GO enrichment / annotation.
---
## Notes
- Species codes in the below-ground transcriptomics files should be matched to the
full binomials in the manuscript's taxon table; one root-subset file is labelled
`ReEl` while its full-matrix counterpart is `ReEr`.
- Large annotation workbooks (Trinotate reports) are provided for completeness and
may be slow to open in Excel.
Publication Date: 2026-06-17