annotateModule()
takes a data.frame
of regions with module
assignments, annotates them with genes using GREAT, adds additional gene
information with Ensembl BioMart, provides gene regulatory context with
annotatr, and then saves this as a .txt file. Support is provided for several
genomes, including hg38
, hg19
, hg18
, mm10
,
mm9
, and danRer7
.
Usage
annotateModule(
regions,
module = NULL,
grey = FALSE,
genome = c("hg38", "hg19", "hg18", "mm10", "mm9", "danRer7"),
includeCuratedRegDoms = FALSE,
rule = c("basalPlusExt", "twoClosest", "oneClosest"),
adv_upstream = 5,
adv_downstream = 1,
adv_span = 1000,
adv_twoDistance = 1000,
adv_oneDistance = 1000,
version = c("4.0.4", "3.0.0", "2.0.2"),
save = TRUE,
file = "Annotated_Module_Regions.txt",
verbose = TRUE
)
Arguments
- regions
A
data.frame
of regions with module assignments, typically obtained fromgetModules()
.- module
A
character
giving the name of one or more modules to annotate. If null, all modules will be annotated.- grey
A
logical(1)
specifying whether or not to include the grey (unassigned) module.- genome
A
character(1)
giving the genome build for the regions. Supported genomes includehg38
,hg19
,hg18
,mm10
,mm9
, anddanRer7
. SeeDetails
.- includeCuratedRegDoms
A
logical(1)
indicating whether to include curated regulatory domains for GREAT gene annotation.- rule
A
character(1)
specifying the rule used by GREAT for gene annotation. Possible values includebasalPlusExt
,twoClosest
, andoneClosest
. SeerGREAT::submitGreatJob()
for more details for this and the next six arguments.- adv_upstream
A
numeric(1)
giving the distance upstream of the TSS (in kb) to define a basal regulatory domain in thebasalPlusExt
rule.- adv_downstream
A
numeric(1)
giving the distance downstream of the TSS (in kb) to define a basal regulatory domain in thebasalPlusExt
rule.- adv_span
A
numeric(1)
specifying the distance upstream and downstream of the TSS (in kb) to define the maximum extension of the regulatory domain in thebasalPlusExt
rule.- adv_twoDistance
A
numeric(1)
giving the distance upstream and downstream of the TSS (in kb) to define the maximum extension of the regulatory domain in thetwoClosest
rule.- adv_oneDistance
A
numeric(1)
giving the distance upstream and downstream of the TSS (in kb) to define the maximum extension of the regulatory domain in theoneClosest
rule.- version
A
character(1)
specifying the version of GREAT to use for gene annotation. Possible values include4.0.4
,3.0.0
, and2.0.2
. Different versions of GREAT support different genomes. SeeDetails
.- save
A
logical(1)
indicating whether to save thedata.frame
.- file
A
character(1)
giving the file name (.txt) for the saveddata.frame
.- verbose
A
logical(1)
indicating whether messages should be printed.
Details
regions
can be filtered for one or more modules of interest, or the
grey (unassigned) module can be excluded. Gene annotation is performed by the
rGREAT package, which allows for different annotation rules and versions
of GREAT. The default basalPlusExt
annotation rule associates a gene
with a region if the region is within the basal regulatory domain of the gene
(5 kb upstream and 1 kb downstream of the TSS) or if it is within 1 Mb
upstream or downstream of the TSS and not in the basal regulatory domain of
another gene. Other rules include twoClosest
and oneClosest
,
which effectively assign the two nearest genes or one nearest genes,
respectively. See rGREAT::submitGreatJob()
for more details.
GREAT supports different genomes depending on the version:
- GREAT v4.0.4
hg38
,hg19
,mm10
,mm9
- GREAT v3.0.0
hg19
,mm10
,mm9
,danRer7
- GREAT v2.0.2
hg19
,hg18
,mm9
,danRer7
Gene information is provided by the biomaRt package, which adds the gene
description along with Ensembl and NCBI Entrez gene IDs. Regulatory context
is added by the annotatr package. This provides positional context of
the region relative to nearby genes, enhancers, and CpG islands. Note that
annotatr does not support the hg18
or danRer7
genomes.
See also
getModules()
to build a comethylation network and identify modules of comethylated regions.getGeneList()
to extract a list of genes or IDs from the annotated regions.listOntologies()
,enrichModule()
, andplotEnrichment()
to investigate functional enrichment of module regions with GREAT.
Examples
if (FALSE) {
# Get Comethylation Modules
modules <- getModules(methAdj, power = sft$powerEstimate, regions = regions,
corType = "pearson", file = "Modules.rds")
# Annotate Modules
regionsAnno <- annotateModule(regions, module = "bisque4",
genome = "hg38",
file = "Annotated_bisque4_Module_Regions.txt")
geneList_bisque4 <- getGeneList(regionsAnno, module = "bisque4")
# Analyze Functional Enrichment
ontologies <- listOntologies("hg38", version = "4.0.4")
enrich_bisque4 <- enrichModule(regions, module = "bisque4",
genome = "hg38",
file = "bisque4_Module_Enrichment.txt")
plotEnrichment(enrich_bisque4,
file = "bisque4_Module_Enrichment_Plot.pdf")
}