Plant Data Visualization/Orthology Bundle - Cork Oak Use Case
Extensive phenotypic measurements and high-throughput RNA-seq experiments often make plant data fall in the scope of big data. The current plant data visualization/orthology bundle provides a set of interopable tools ranging from data annotation, analysis and visualisation to ease treatment and interpretation of plant data.
The present use-case aims to provide the user with the necessary knowledge to properly use and navigate this tool bundle. The use of data obtained from studies in Cork Oak, a non-model woody plant, highlights possible shortcomings which the user may also face regarding the lack of a full reference genome and absent/incomplete data annotations.
Download Cork Oak Data
The first step to follow the present use-case is to download data obtained from cork oak studies. It can be manually downloaded from github (https://github.com/hmrodrigues99), or by running the following in the command line:
Linux users:
# 1. Open the bash shell (command line)
# 2. Create and move to a new directory named corkoak_usecase
# This folder will be used to store our data. It's full path can be seen by running $PWD.
mkdir corkoak_usecase
cd corkoak_usecase
# 3. Download all necessary data, including the cork oak data files from github
wget https://github.com/hmrodrigues99/CorkOak_UseCase_Data/archive/main.zip
# 4. Unzip the downloaded folder and move into it's directory
unzip ./CorkOak_UseCase_Data-main.zip
cd CorkOak_UseCase_Data-main
# 4.1 If unzip fails, install it using ``sudo apt install unzip`` and try again
unzip ./CorkOak_UseCase_Data-main.zip
cd CorkOak_UseCase_Data-main
# 4.2 If for some reason the error persists, search the folder on your system (outside the bash shell) and unzip the file manually.
# Then, go back into the bash shell and move into the folder directory
cd CorkOak_UseCase_Data-main
Windows R Users:
#Create and move to a new directory named corkoak_usecase
dir.create("corkoak_usecase")
#Installing R.utils package, if not already installed
if (!require("utils")) install.packages("utils")
library("utils")
#Installing RCurl package, if not already installed
if (!require("RCurl")) install.packages("RCurl")
library("RCurl")
#Download the cork oak data folder from github
download.file("https://github.com/hmrodrigues99/CorkOak_UseCase_Data/archive/main.zip",destfile="corkoak_usecase.zip", method="libcurl")
#Unzip the downloaded file and move into it's directory
unzip("corkoak_usecase.zip")
setwd("CorkOak_UseCase_Data-main")
Windows Python (3.7+) Users:
#Within the Command line, install the requests package
pip install requests
#Now in a Python IDE, download the cork oak data folder [replace "PathToFile" with a target directory (e.g. "C://Users//hrodrigues//Data//")]
import requests
file = requests.get("https://github.com/hmrodrigues99/CorkOak_UseCase_Data/archive/main.zip")
open('PathToFile//CorkOak_UseCase_Data-main.zip', 'wb').write(file.content)
Cork Oak Data Contents:
corkoak_proteins.faaA FASTA file containing aminoacid sequences of cork oak proteins.
corkoak_edge- Table (.csv) with cork oak co-expressed gene predictions.The first two columns hold the interacting genes, and the third column the type of interaction (e.g. co_expressed).Predictions were obtained from cork oak RNA-seq datasets using the Seidr toolkit and filtered for genes putatively involved in lignin biosynthesis dependent on seasonal cues (identified in DOI: https://doi.org/10.1038/s41598-021-90938-5).
corkoak_nodeList (.csv) of cork oak protein IDs present in the interactions described above.
corkoak_LogFC_April,corkoak_LogFC_June,corkoak_LogFC_Julyandcorkoak_LogFC_July_AprilFiles containing cork oak experimental data (log2FC values obtained from gene differential expression analysis in cork oak tissue samples collected in the months of April, June and July).
Cytoscape_to_DiNAR.RR script used to process a Cytoscape network to be imported into DiNAR.
Ensembl_Plants_Query.RR script used to perform predefined queries (retrieval of gene descriptions, annotations and orthologs) from the Ensembl Plants database.
Task Workflow:
After data download, we are ready to start performing the proposed tasks, starting with Task 1 - Plant Data Annotation.
Tasks
References