# Disable greedycut (filtering)
rnb.options("filtering.greedycut"=FALSE)
# Disable intersample variation plots (exploratory analysis)
rnb.options("exploratory.intersample"=FALSE)
# Reduce the subsampling number for estimating density plots
rnb.options("distribution.subsample"=100000)
# Disable regional methylation profiling (exploratory analysis)
rnb.options("exploratory.region.profiles"=NULL)
# Disable chromosome coverage plots (QC, sequencing data only)
rnb.options("qc.coverage.plots"=FALSE)
tools::dependsOnPkgs("RnBeads")
The web service submission form uses the cross-browser tooltips library by Walter Zorn. The library is distributed under the GNU Lesser General Public License
In order to keep up-to-date with the most recent developments in Computational Epigenomics, we continuingly update the RnBeads default option setting. In software version 2.9.3, we changed the following defaults:
Option name | Old default | New default |
---|---|---|
import.bed.style |
"BisSNP" |
"bismarkCov" |
normalization.background.method |
"methylumi.noob" |
"none" |
filtering.snp |
"3" |
"any" |
filtering.cross.reactive |
FALSE |
TRUE |
filtering.sex.chromosomes.removal |
FALSE |
TRUE |
filtering.missing.value.quantile |
1 |
0.5 |
exploratory.intersample |
NULL |
FALSE |
exploratory.deviation.plots |
NULL |
FALSE |
exploratory.region.profiles |
NULL |
"" |
differential.adjustment.sva |
TRUE |
FALSE |
differential.adjustment.celltype |
TRUE |
FALSE |
export.to.bed |
TRUE |
FALSE |
export.to.trackhub |
c("bigBed","bigWig") |
NULL |
Do you want to install from sources the package which needs compilation? y/n: n
Update all/some/none? [a/s/n]: n
1. Open the Advanced system settings. In Windows 7, for example, it can be reached through: Control Panel > System and Security > System > Advanced System Settings.
2. You see the "System Properties" dialog, open the "Advanced" tab. Click the button "Environment Variables..." to update the search path.
3. Locate the environment variable "Path" (it doesn't matter if it is the user or the system variables, as long as you are the user who starts R). Select it, click on Edit, and prepend the location of the Ghostscript executable, followed by a semicolon. The text you need to add is usually similar to C:\Program Files\gs\gs9.15\bin;
4. After starting a new R session, Ghostscript should be accessible from R. If it still cannot be located, you need to check the corresponding environment variable. In an R session, the command Sys.getenv()["R_GSCMD"]
shows the contents of the dedicated Ghostcript variable. If the variable does not exists or points to the wrong executable file, you can set it to the full path of the Ghostscript executable. This is achieved by editing or creating the file Renviron.site
in the etc
subdirectory of your R installation. Make sure the file contents includes the line R_GSCMD=C:\Program Files\gs\gs9.15\bin\gswin64c.exe
(assuming Ghostscript is located in C:\Program Files\gs\gs9.15
and you are using the 64-bit version of R). For more information, please check the R documentation on getting and setting environment variables.
Renviron.site
in the etc
subdirectory of your R installation. Make sure the file contents includes the lines:R_ZIPCMD=zip
R_UNZIPCMD=unzip
libmysql
and libmysql-devel
. NB. These packages are aliased as mariadb
on RedHat-derived Linux distributions (RHEL, CentOS, Fedora)libxml2
and libxml2-devel
gsl
package.yum install mariadb mariadb-devel libxml2 libxml2-devel gsl
sudo
to the beginning of the above command. Old Option | New Option |
---|---|
loading | import |
loading.default.data.type | import.default.data.type |
loading.table.separator | import.table.separator |
loading.bed.style | import.bed.style |
loading.bed.columns | import.bed.columns |
loading.bed.frame.shift | import.bed.frame.shift |
loading.bed.test | import.bed.test |
loading.bed.test.only | import.bed.test.only |
batch | exploratory |
batch.dreduction.columns | exploratory.columns |
batch.top.dimensions | exploratory.top.dimensions |
batch.principal.components | exploratory.principal.components |
batch.correlation.columns | exploratory.columns |
batch.correlation.pvalue.threshold | exploratory.correlation.pvalue.threshold |
batch.correlation.permutations | exploratory.correlation.permutations |
batch.correlation.qc | exploratory.correlation.qc |
profiles | exploratory |
profiles.beta.distribution | exploratory.beta.distribution |
profiles.intersample | exploratory.intersample |
profiles.deviation.plots | exploratory.deviation.plots |
profiles.columns | exploratory.columns |
profiles.clustering | exploratory.clustering |
profiles.clustering.top.sites | exploratory.clustering.top.sites |
region.profiles.types | exploratory.region.profiles |
export.to.ucsc | export.to.trackhub |
Old Function | New Function |
---|---|
rnb.execute.loading | rnb.execute.import |
rnb.execute.export | rnb.execute.tnt |
rnb.export.to.ucsc | rnb.export.to.trackhub |
rnb.run.loading | rnb.run.import |
rnb.run.batch | rnb.run.exploratory |
rnb.run.profiles | rnb.run.exploratory |
rnb.run.export | rnb.run.tnt |
loading.bed.style
option. Here is an overview of the currently implementd presets:
EPP
bed
files in the format as output files from the Epigenome Processing Pipeline developed by Fabian Müller and Christoph Bock A tab-separated file contains: the chromosome name, start coordinate, end coordinate, methylation value and coverage as a string ('#methylated_read/#total_reads'), some score, the strand, and additional information not taken into account by RnBeads. The file should not contain a header line. Coordinates are 0-based, spanning the first coordinate in a site and the first coordinate outside the site (i.e. end-start = 2 for a CpG). Here are some example lines (genome assembly mm9):
chr1 3010957 3010959 '27/27' 1000 +
chr1 3010971 3010973 '10/20' 500 +
chr1 3011025 3011027 '57/70' 814 -
...
BisSNP
bed
files are assumed to have been generated by the methylation calling tool BisSNP
. A tab-separated file contains the chromosome name, start coordinate, end coordinate, methylation value in percent, the coverage, the strand, and additional information not taken into account by RnBeads. The file should contain a header line. Coordinates are 0-based, spanning the first and the last coordinate in a site (i.e. end-start = 1 for a CpG). Sites on the - strand are shifted by +1. Here are some example lines (genome assembly hg19):
track name=file_sorted.realign.recal.cpg.filtered.sort.CG.bed type=bedDetail description="CG methylation
chr1 10496 10497 79.69 64 + 10496 10497 180,60,0 0 0
chr1 10524 10525 90.62 64 + 10524 10525 210,0,0 0 0
chr1 864802 864803 58.70 46 + 864802 864803 120,120,0 0 5
chr1 864803 864804 50.00 4 - 864803 864804 90,150,0 1 45
...
bismarkCov
cov
files are assumed to have the format as defined by Bismark's coverage file output converted from its bedGraph output (Bismark's bismark2bedGraph
module; see the section "Optional bedGraph output in the Bismark User Guide). A tab-separated file contains: the chromosome name, cytosine coordinate, cytosine coordinate (again), methylation value in percent, number of methylated reads and the number of unmethylated reads. The file should not contain a header line. Coordinates are 1-based. Strand information does not need to be provided, but is inferred from the coordinates: Coordinates on the - strand specify the C on the - strand (G on the + strand). Coordinates referring to cytosines not in CpG content are automatically discarded. Here are some example lines (genome assembly hg19):
...
chr9 73252 73252 100 1 0
chr9 73253 73253 0 0 1
chr9 73256 73256 100 1 0
chr9 73260 73260 0 0 1
chr9 73262 73262 100 1 0
chr9 73269 73269 100 1 0
...
bismarkCytosine
bed
files are assumed to have the format as defined by Bismark's cytosine report output (Bismark's coverage2cytosine
module; see the section "Optional genome-wide cytosine report output" in the Bismark User Guide). A tab-separated file contains: the chromosome name, cytosine coordinate, the strand, number of methylated reads, number of unmethylated reads, and additional information not taken into account by RnBeads. The file should not contain a header line. Coordinates are 1-based. Coordinates on the - strand specify the C on the - strand (G on the + strand). CpG without coverage are allowed, but not required. Here are some example lines (genome assembly hg19):
...
chr22 16050097 + 0 0 CG CGG
chr22 16050098 - 0 0 CG CGA
chr22 16050114 + 0 0 CG CGG
chr22 16050115 - 0 0 CG CGT
...
chr22 16115591 + 1 1 CG CGC
chr22 16117938 - 0 2 CG CGT
chr22 16122790 + 0 1 CG CGC
...
Encode
bed
files are assumed to have the format as the ones that can be downloaded from UCSC's ENCODE data portal. A tab-separated file contains: the chromosome name, start coordinate, end coordinate, some identifier, read coverage, the strand, start and end coordinates again (not sure why; we discard this information), some color value, read coverage and the methylation percentage. The file should contain a header line. Coordinates are 0-based. Note that this file format is very similar but not identical to the 'BisSNP'
one. Here are some example lines (genome assembly hg19):
track name="SL1815 MspIRRBS" description="HepG2_B1__GC_" visibility=2 itemRgb="On"
chr1 1000170 1000171 HepG2_B1__GC_ 62 + 1000170 1000171 55,255,0 62 6
chr1 1000190 1000191 HepG2_B1__GC_ 62 + 1000190 1000191 0,255,0 62 3
chr1 1000191 1000192 HepG2_B1__GC_ 31 - 1000191 1000192 0,255,0 31 0
chr1 1000198 1000199 HepG2_B1__GC_ 62 + 1000198 1000199 55,255,0 62 10
chr1 1000199 1000200 HepG2_B1__GC_ 31 - 1000199 1000200 0,255,0 31 0
chr1 1000206 1000207 HepG2_B1__GC_ 31 - 1000206 1000207 55,255,0 31 10
...
project
.data.zip
, samples.csv
and analysis.xml
from a dataset we provide on the methylome resources page.data.zip
to project/data
. Copy samples.csv
to the data directory as well. Keep the file analysis.xml
in the parent directory project
.project/data
.samples.csv
by adding the information for your samples to the annotation table. This file is in comma-separated format and can be edited by any spreadsheet software, such as Microsoft Excel or LibreOffice. If you still have little experience with RnBeads, avoid renaming columns because this might affect the subsequent analysis steps.Once you have added your dataset to the downloaded one, you can start the analysis pipeline using commands similar to the ones provided below:
# Set the working directory
setwd("project")
# Start the analysis pipeline
library(RnBeads)
rnb.run.xml("analysis.xml")
Feel free to experiment with different analysis options by editing the file analysis.xml
or setting them in the R session using the function rnb.options()
.
save.image()
, the analysis options are not stored. You can copy them to a list, and reset them upon loading, as shown in the example below:
# Saving the current session
RnBeadsOptions <- rnb.options()
save.image(file = "my.analysis.RData")
# Loading a session
library(RnBeads)
load("my.analysis.RData")
do.call(rnb.options, RnBeadsOptions)
sample | individual | diseaseState |
---|---|---|
sample_1 | John | normal |
sample_2 | John | tumor |
sample_3 | Jane | normal |
sample_4 | Jane | tumor |
sample_5 | George | normal |
sample_6 | George | tumor |
Further suppose, you want to compare tumor vs normal but with the pairing information by the patient/individual. Then you would apply the following option setting:
rnb.options("differential.comparison.columns"=c("diseaseState"),"columns.pairing"=c("diseaseState"="individual")
RnBSet
object. Use the function addPheno()
for this purpose. You can introduce a text string for each sample with the same designation for each group that you want to specify. The newly added column in the annotation table can then be used for grouping. You can either let RnBeads figure out the categories by itself, or explicitly set the corresponding group options (see rnb.options()
for details). You can set values to NA
for samples that you don't want to include in either of the groups.
If you want to specify explicit pairwise comparisons, just use the differential.comparison.columns.all.pairwise
option.
differential.comparison.columns
matches the correct column in your sample annotation sheet. Additionally, please check the values for the options min.group.size
and max.group.count
, which specify how the accepted groups for differential analysis look.
colors.category
and colors.gradient
. See the section Analysis Parameter Overview of the RnBeads vignette for more information.
RnBeads utilizes the package ggplot2 for generating most of the figures. Therefore, many aspects of the plots can be modified by adjusting the corresponding parameters in the default visual theme. As a simple example, executing the following command before starting the analysis pipeline sets the black-and-white theme:
theme_set(theme_bw())
Please check the documentation of ggplot2 for a detailed description of themes. We can also recommend an online quick reference on the subject, put together by members of the Sape research group at the University of Lugano.
source("https://bioconductor.org/biocLite.R")
biocLite("qvalue")
foreach
package that we use for parallelization across multiple cores have been known to create unnecessary copies of in-memory objects for each parallel task. We therefore recommend to reduce the number cores using the parallel.setup()
function in RnBeads. For large datasets, we recommend not to use more than 2 to 4 cores and - if possible - parallelize using a high performance compute cluster (HPC; see the "Deploying RnBeads on a Scientific Compute Cluster" section in the RnBeads vignette) rather than running the entire analysis on too many cores of a single machine.
/tmp/
path of a Linux/Unix machine. You can check where R stores your temporary data using the tempdir()
command in an R session. By default RnBeads also stores big datasets on the hard drive during the analysis in order to reduce memory consumption. For this task it makes implicit use of the ff
package for storing temporary files. Within an R session, you can see the ff temporary directory by executing the following commands:
tempdir()
library(ff)
getOption("fftempdir")
options(fftempdir="MY_DIRECTORY")