Supplemental Data for Ph.D. thesis: The haustorial habit - Eco-evolutionary adaptations to a parasitic lifestyle

Description

This repository contains the data files and analysis scripts underlying Chapters 2

and 4. The two top-level folders correspond to the two chapters and are also

provided as compressed archives (`chapter2.tar.gz`, `chapter4.tar.gz`).

 

- **Chapter 2** — Evolution of root-related gene families across Lamiales /

  Orobanchaceae, combining OrthoFinder orthogroup counts, phylogenetic

  comparative analyses (PGLS, CAFE5 gene-family evolution) and below-ground

  (root/haustorium) transcriptomics.

- **Chapter 4** — Functional study of haustorium biology in the parasitic plant

  *Striga hermonthica*, combining phenotypic bioassays and a time-course /

  severing RNA-seq experiment.

 

File formats: `.txt`/`.tsv`/`.csv`/`.matrix` are tab- or comma-delimited plain

text; `.nwk` is a Newick phylogenetic tree; `.xlsx`/`.xls` are Excel workbooks;

`.R` are R scripts; `.py` Python 3 scripts; `.sh` a shell script. Hidden helper

files (`.DS_Store`, `.Rhistory`) are macOS/R artefacts and can be ignored.

 

---

## chapter2/ — Root gene-family evolution

 

### chapter2/Orthofinder_results/ — Orthogroup inference and root gene set

Comparative genomics across ~64 Lamiales species (plus outgroups), based on an

OrthoFinder run.

 

- **`ALL_OGs_speciescount.txt`** (61 MB) — Full matrix of orthogroup (OG) gene

  counts per species; rows = orthogroups, columns = species.

- **`ROOT_OGs.txt`** (7.3 MB) — Membership list of the curated "root" orthogroups:

  each row is an OG followed by the comma-separated protein IDs assigned to it.

- **`ROOT_OGs_speciescount.txt`** — Per-species gene counts restricted to the 254

  root-related orthogroups.

- **`species_tree.checked_ultrametric.nwk`** — Time-calibrated (ultrametric)

  species tree in Newick format used for all phylogenetic comparative analyses.

- **`species_family_order.txt`** — Taxonomic lookup table (species, order, family,

  trophism: autotroph/parasite) for the analysed taxa.

- **`AT_feature_table.txt`** (3.0 MB) — *Arabidopsis thaliana* protein feature

  table (accession, name, symbol, AT locus) used to annotate orthogroups.

- **`TAIR_NCBI_root_IDs.xlsx`** — Curated *Arabidopsis* root-gene reference list

  (TAIR ID, gene symbols, name, function).

- **`OF_rootOGs_heatmap.R`** — R script pulling *A. thaliana* genes from the root

  OGs and drawing the orthogroup heatmap.

 

### chapter2/PGLS/ — Phylogenetic comparative analysis & gene-family evolution

Pipeline relating root-OG gene-family sizes to lifestyle traits (heterotrophy,

mycorrhization) using phylogenetic generalised least squares (PGLS) and CAFE5.

 

`PGLS/scripts/`

- **`01_prepare_inputs.py`** — Reconciles the OrthoFinder count table, ultrametric

  tree and trait files; emits CAFE5-ready and PGLS-ready inputs.

- **`02_pgls_ancestral.py`** — PGLS of root-related orthogroups against traits,

  plus ancestral-state / branch-change reconstruction of total root content.

- **`02b_sl_pathway.py`** — PGLS of strigolactone (SL) receptor-pathway OGs vs

  heterotrophy/mycorrhization (tested from the full count matrix).

- **`03_figures.py`** — Generates the summary figures (vector PDF/EPS).

- **`run_cafe.sh`** — Driver script running CAFE5 gene-family evolution with

  retried lambda auto-estimation on the deep species tree.

 

`PGLS/tables/`

- **`pgls_rootOG_results.tsv`** — PGLS results per root OG (slopes, t-values,

  p-values, Pagel's lambda, FDR) for the heterotrophy and mycorrhization models.

- **`pgls_SLpathway_results.tsv`** — PGLS results for the strigolactone

  receptor-pathway orthogroups, with per-group mean counts.

- **`branch_changes_total_root.tsv`** — Inferred change in total root gene content

  along each tree branch (parent vs child node totals).

- **`total_root_content.tsv`** — Reconstructed total root gene content per taxon.

- **`AT_OG_info.xlsx`** — Mapping of *Arabidopsis* genes to orthogroups with TAIR

  IDs, symbols, names and functions.

 

### chapter2/DEG_analysis/ — Differential expression (root vs haustorium)

- **`MeAr.gene.counts.expr.txt`** — Gene-level expression count table for

  *Melampyrum* root vs root+haustorium samples.

- **`MeAr_StHe_thesis.R`** — R script for the *Melampyrum* (MeAr) / *Striga* (StHe)

  differential-expression analysis.

- **`MeAr_trinotate_annotation_report.xlsx`** (89 MB) — Trinotate functional

  annotation report for the *Melampyrum* assembly.

- **`StHeBC3_trinotate_annotation_report.xlsx`** (17 MB) — Trinotate functional

  annotation report for the *Striga hermonthica* (BC3) assembly.

 

### chapter2/belowground_transcriptomics/ — Cross-species root/haustorium expression

Per-species gene-expression count matrices used for the 254-OG WGCNA / below-ground

expression comparison. For each species there is a full matrix

(`*.gene.counts.expr.txt`) and a root-restricted subset (`*.gene.counts.expr.root.txt`).

Species are given by short codes, e.g. `Avog` = *Alectra vogelii*, `StHe` =

*Striga hermonthica*, `MeAr` = *Melampyrum*, `LiPhi` = *Lindenbergia* (autotroph

reference), `PhJa`, `Pram`, `ReEl`/`ReEr`, `RhFi`, `RhinMin` (please confirm exact

binomials against the manuscript taxon table).

 

- **`<code>.gene.counts.expr.txt`** — Full expression count matrix (all tissues),

  one per species: `Avog`, `LiPhi`, `MeAr`, `PhJa`, `Pram`, `ReEr`, `RhFi`,

  `RhinMin`, `StHe`.

- **`<code>.gene.counts.expr.root.txt`** — Root-tissue-restricted count matrix,

  one per species: `Avog`, `LiPhi`, `MeAr`, `PhJa`, `Pram`, `ReEl`, `RhFi`,

  `RhinMin`, `StHe`.

- **`trait_table_sp.csv`** — Per-sample root traits (root width µm, root hair µm,

  parasitism category) used as metadata.

- **`BG_expression_WGCNA.R`** — R script building the 254-OG expression matrices

  and running the WGCNA co-expression analysis.

 

---

## chapter4/ — *Striga* haustorium biology: bioassays & RNA-seq

 

### chapter4/Bioassays/ — Phenotypic experiments

- **`all_phenotypic_plots_final.R`** — Master R script for all phenotypic analyses

  and plots (survival, flowering kinetics, growth), using `survival`/`survminer`

  and `ggplot2`.

 

`Bioassays/density/`

- **`Lhaust_dai61_noNA.xlsx`** — Lateral-haustorium counts at 61 days after

  inoculation (rhizotron, *Striga*, attachments).

- **`lat_development_dai0-61.xlsx`** — Lateral-haustorium development over time

  (DAI 0–61).

 

`Bioassays/pots/`

- **`DAI85.xlsx`** — Pot-experiment phenotypes at 85 days after inoculation

  (length, density, bud, branching; includes a `lengths` sheet).

- **`lengths.xlsx`** — *Striga* shoot length / density / branching measurements.

 

`Bioassays/severing_experiments/` — Haustorium/connection severing experiments

- **`Striga_survival_das0-84_withoutLoutliers.xlsx`** — Survival data (day, event,

  group) over 0–84 days after sowing, outliers removed.

- **`floweringkinetic_tableNOoutliers_all.xlsx`** — Flowering kinetics (DAS,

  flowering, group), outliers removed.

- **`strigagrowth_noLoutliers_forplot.xlsx`** — *Striga* height over time (DAS,

  height, group) formatted for plotting.

 

`Bioassays/severing_experiments/supplementary/`

- **`percent_flowering.xlsx`** — Percent flowering over time (DAS) per group.

- **`survivaldata.xlsx`** — Supplementary survival data (time, event, group).

 

### chapter4/RNAsequencing/ — Time-course / severing transcriptomics

- **`rsem.kinetic.gene.counts.matrix`** (12 MB) — RSEM gene-count matrix for the

  infection time-course (T0/T12/T24/T48 conditions, with replicates and controls).

- **`rsem_tvlv48.gene.counts.matrix`** (15 MB) — RSEM gene-count matrix for the

  terminal- vs lateral-haustorium / severing comparison (incl. 48 h samples).

- **`DESeq2.R`** — R script for differential-expression analysis (DESeq2/limma).

- **`maSigPro_T0-24_thesisplots.R`** — R script for time-series differential

  expression (maSigPro, T0–T24) and thesis figures.

- **`trinotate_annotation_report.xlsx`** (17 MB) — Trinotate functional annotation

  report (BLASTX/BLASTP, Pfam, GO, etc.) for the assembly.

- **`trinotate_report_gene_ontology_with_parents.xls`** (34 MB) — GO annotations

  with parent terms expanded.

- **`GO_terms_description.txt`** (3.5 MB) — GO ID → term name → namespace lookup

  table used for GO enrichment / annotation.

 

---

## Notes

- Species codes in the below-ground transcriptomics files should be matched to the

  full binomials in the manuscript's taxon table; one root-subset file is labelled

  `ReEl` while its full-matrix counterpart is `ReEr`.

- Large annotation workbooks (Trinotate reports) are provided for completeness and

  may be slow to open in Excel.

Authors

DOI: 10.5281/zenodo.20730764

Publication Date: 2026-06-17

Back to publications list


About