RNA sequencing (RNA-seq) technologies facilitate the characterisation of genes and transcripts in different cell types as well as their expression analysis across various conditions. Due to its ability to provide in-depth insights into transcription and post-transcription mechanisms, RNA-seq has been extensively used in functional genetics and transcriptomics, system biology and developmental biology in animals, plants, diseases, etc. The aim of this project is to use mathematical and computational models to integrate big genomic and transcriptomic data from high-throughput technologies in plant biology and develop new methods to identify which genes or transcripts have significant expression variation across experimental conditions of interest, then to interpret the regulatory causalities of these expression changes by distinguishing the effects from the transcription and alternative splicing.
We performed a high resolution ultra-deep RNA-seq time-course experiment to study Arabidopsis in response to cold treatment where plants were grown at 20oC and then the temperature was reduced to 4oC. We have developed a high quality Arabidopsis thaliana Reference Transcript Dataset (AtRTD2) transcriptome for accurate transcript and gene quantification. This high quality time-series dataset was used as the benchmark for novel method development and downstream expression analysis. The main outcomes of this project include three parts. i) A pipeline for differential expression (DE) and differential alternative splicing (DAS) analysis at both gene and transcript levels. Firstly, we implemented data pre-processing to reduce the noise/low expression, batch effects and technical biases of read counts. Then we used the limma-voom pipeline to compare the expression at corresponding time-points of 4oC to the time-points of 20oC. We identified 8,949 genes with altered expression of which 2,442 showed significant DAS and 1,647 were only regulated by AS. Compared with current publications, 3,039 of these genes were novel cold-responsive genes. In addition, we identified 4,008 differential transcript usage (DTU) transcripts of which the expression changes were significantly different to their cognate DAS genes. ii) A TSIS R package for time-series transcript isoform switch (IS) analysis was developed. IS refers to the time-points when a pair of transcript isoforms from the same gene reverse their relative expression abundances. By using a five metric scheme to evaluate robustly the qualities of each switch point, we identified 892 significant ISs between the high abundance transcripts in the DAS genes and about 57% of these switches occurred very rapidly between 0-6h following transfer to 4oC. iii) A RLowPC R package for co-expression network construction was generated. The RLowPC method uses a two-step approach to select the high-confidence edges first by reducing the search space by only picking the top ranked genes from an initial partial correlation analysis, and then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association.
In future work, we will construct dynamic transcriptional and AS regulatory networks to interpret the causalities of DE and DAS. We will study the coupling and de-coupling of expression rhythmicity to the Arabidopsis circadian clock in response to cold. We will develop new methods to improve the statistical power of expression comparative analysis, such as by taking into account the missing values of expression and by distinguishing the technical and biological variabilities.
|Date of Award
|Runxuan Zhang (Supervisor), John Brown (Supervisor), Robbie Waugh (Supervisor) & Ping Lin (Supervisor)
- High throughput transcriptomics
- Differential expression
- Differential alternative splicing
- Gene regulatory networks