# Disable greedycut (filtering)
# Disable intersample variation plots (exploratory analysis)
# Reduce the subsampling number for estimating density plots
# Disable regional methylation profiling (exploratory analysis)
# Disable chromosome coverage plots (QC, sequencing data only)
In order to keep up-to-date with the most recent developments in Computational Epigenomics, we continuingly update the RnBeads default option setting. In software version 2.9.3, we changed the following defaults:
|Option name||Old default||New default|
Do you want to install from sources the package which needs compilation? y/n: n
Update all/some/none? [a/s/n]: n
1. Open the Advanced system settings. In Windows 7, for example, it can be reached through: Control Panel > System and Security > System > Advanced System Settings.
2. You see the "System Properties" dialog, open the "Advanced" tab. Click the button "Environment Variables..." to update the search path.
3. Locate the environment variable "Path" (it doesn't matter if it is the user or the system variables, as long as you are the user who starts R). Select it, click on Edit, and prepend the location of the Ghostscript executable, followed by a semicolon. The text you need to add is usually similar to
4. After starting a new R session, Ghostscript should be accessible from R. If it still cannot be located, you need to check the corresponding environment variable. In an R session, the command
Sys.getenv()["R_GSCMD"] shows the contents of the dedicated Ghostcript variable. If the variable does not exists or points to the wrong executable file, you can set it to the full path of the Ghostscript executable. This is achieved by editing or creating the file
Renviron.site in the
etc subdirectory of your R installation. Make sure the file contents includes the line
R_GSCMD=C:\Program Files\gs\gs9.15\bin\gswin64c.exe (assuming Ghostscript is located in
C:\Program Files\gs\gs9.15 and you are using the 64-bit version of R). For more information, please check the R documentation on getting and setting environment variables.
etcsubdirectory of your R installation. Make sure the file contents includes the lines:
libmysql-devel. NB. These packages are aliased as
mariadbon RedHat-derived Linux distributions (RHEL, CentOS, Fedora)
yum install mariadb mariadb-devel libxml2 libxml2-devel gsl
sudoto the beginning of the above command.
|Old Option||New Option|
|Old Function||New Function|
loading.bed.styleoption. Here is an overview of the currently implementd presets:
bed files in the format as output files from the Epigenome Processing Pipeline developed by Fabian Müller and Christoph Bock A tab-separated file contains: the chromosome name, start coordinate, end coordinate, methylation value and coverage as a string ('#methylated_read/#total_reads'), some score, the strand, and additional information not taken into account by RnBeads. The file should not contain a header line. Coordinates are 0-based, spanning the first coordinate in a site and the first coordinate outside the site (i.e. end-start = 2 for a CpG). Here are some example lines (genome assembly mm9):
chr1 3010957 3010959 '27/27' 1000 +
chr1 3010971 3010973 '10/20' 500 +
chr1 3011025 3011027 '57/70' 814 -
bed files are assumed to have been generated by the methylation calling tool
BisSNP. A tab-separated file contains the chromosome name, start coordinate, end coordinate, methylation value in percent, the coverage, the strand, and additional information not taken into account by RnBeads. The file should contain a header line. Coordinates are 0-based, spanning the first and the last coordinate in a site (i.e. end-start = 1 for a CpG). Sites on the - strand are shifted by +1. Here are some example lines (genome assembly hg19):
track name=file_sorted.realign.recal.cpg.filtered.sort.CG.bed type=bedDetail description="CG methylation
chr1 10496 10497 79.69 64 + 10496 10497 180,60,0 0 0
chr1 10524 10525 90.62 64 + 10524 10525 210,0,0 0 0
chr1 864802 864803 58.70 46 + 864802 864803 120,120,0 0 5
chr1 864803 864804 50.00 4 - 864803 864804 90,150,0 1 45
cov files are assumed to have the format as defined by Bismark's coverage file output converted from its bedGraph output (Bismark's
bismark2bedGraph module; see the section "Optional bedGraph output in the Bismark User Guide). A tab-separated file contains: the chromosome name, cytosine coordinate, cytosine coordinate (again), methylation value in percent, number of methylated reads and the number of unmethylated reads. The file should not contain a header line. Coordinates are 1-based. Strand information does not need to be provided, but is inferred from the coordinates: Coordinates on the - strand specify the C on the - strand (G on the + strand). Coordinates referring to cytosines not in CpG content are automatically discarded. Here are some example lines (genome assembly hg19):
chr9 73252 73252 100 1 0
chr9 73253 73253 0 0 1
chr9 73256 73256 100 1 0
chr9 73260 73260 0 0 1
chr9 73262 73262 100 1 0
chr9 73269 73269 100 1 0
bed files are assumed to have the format as defined by Bismark's cytosine report output (Bismark's
coverage2cytosine module; see the section "Optional genome-wide cytosine report output" in the Bismark User Guide). A tab-separated file contains: the chromosome name, cytosine coordinate, the strand, number of methylated reads, number of unmethylated reads, and additional information not taken into account by RnBeads. The file should not contain a header line. Coordinates are 1-based. Coordinates on the - strand specify the C on the - strand (G on the + strand). CpG without coverage are allowed, but not required. Here are some example lines (genome assembly hg19):
chr22 16050097 + 0 0 CG CGG
chr22 16050098 - 0 0 CG CGA
chr22 16050114 + 0 0 CG CGG
chr22 16050115 - 0 0 CG CGT
chr22 16115591 + 1 1 CG CGC
chr22 16117938 - 0 2 CG CGT
chr22 16122790 + 0 1 CG CGC
bed files are assumed to have the format as the ones that can be downloaded from UCSC's ENCODE data portal. A tab-separated file contains: the chromosome name, start coordinate, end coordinate, some identifier, read coverage, the strand, start and end coordinates again (not sure why; we discard this information), some color value, read coverage and the methylation percentage. The file should contain a header line. Coordinates are 0-based. Note that this file format is very similar but not identical to the
'BisSNP' one. Here are some example lines (genome assembly hg19):
track name="SL1815 MspIRRBS" description="HepG2_B1__GC_" visibility=2 itemRgb="On"
chr1 1000170 1000171 HepG2_B1__GC_ 62 + 1000170 1000171 55,255,0 62 6
chr1 1000190 1000191 HepG2_B1__GC_ 62 + 1000190 1000191 0,255,0 62 3
chr1 1000191 1000192 HepG2_B1__GC_ 31 - 1000191 1000192 0,255,0 31 0
chr1 1000198 1000199 HepG2_B1__GC_ 62 + 1000198 1000199 55,255,0 62 10
chr1 1000199 1000200 HepG2_B1__GC_ 31 - 1000199 1000200 0,255,0 31 0
chr1 1000206 1000207 HepG2_B1__GC_ 31 - 1000206 1000207 55,255,0 31 10
analysis.xmlfrom a dataset we provide on the methylome resources page.
samples.csvto the data directory as well. Keep the file
analysis.xmlin the parent directory
samples.csvby adding the information for your samples to the annotation table. This file is in comma-separated format and can be edited by any spreadsheet software, such as Microsoft Excel or LibreOffice. If you still have little experience with RnBeads, avoid renaming columns because this might affect the subsequent analysis steps.
Once you have added your dataset to the downloaded one, you can start the analysis pipeline using commands similar to the ones provided below:
# Set the working directory
# Start the analysis pipeline
Feel free to experiment with different analysis options by editing the file
analysis.xml or setting them in the R session using the function
save.image(), the analysis options are not stored. You can copy them to a list, and reset them upon loading, as shown in the example below:
# Saving the current session
RnBeadsOptions <- rnb.options()
save.image(file = "my.analysis.RData")
# Loading a session
Further suppose, you want to compare tumor vs normal but with the pairing information by the patient/individual. Then you would apply the following option setting:
RnBSetobject. Use the function
addPheno()for this purpose. You can introduce a text string for each sample with the same designation for each group that you want to specify. The newly added column in the annotation table can then be used for grouping. You can either let RnBeads figure out the categories by itself, or explicitly set the corresponding group options (see
rnb.options()for details). You can set values to
NAfor samples that you don't want to include in either of the groups. If you want to specify explicit pairwise comparisons, just use the
differential.comparison.columnsmatches the correct column in your sample annotation sheet. Additionally, please check the values for the options
max.group.count, which specify how the accepted groups for differential analysis look.
colors.gradient. See the section Analysis Parameter Overview of the RnBeads vignette for more information.
RnBeads utilizes the package ggplot2 for generating most of the figures. Therefore, many aspects of the plots can be modified by adjusting the corresponding parameters in the default visual theme. As a simple example, executing the following command before starting the analysis pipeline sets the black-and-white theme:
Please check the documentation of ggplot2 for a detailed description of themes. We can also recommend an online quick reference on the subject, put together by members of the Sape research group at the University of Lugano.
foreachpackage that we use for parallelization across multiple cores have been known to create unnecessary copies of in-memory objects for each parallel task. We therefore recommend to reduce the number cores using the
parallel.setup()function in RnBeads. For large datasets, we recommend not to use more than 2 to 4 cores and - if possible - parallelize using a high performance compute cluster (HPC; see the "Deploying RnBeads on a Scientific Compute Cluster" section in the RnBeads vignette) rather than running the entire analysis on too many cores of a single machine.
/tmp/path of a Linux/Unix machine. You can check where R stores your temporary data using the
tempdir()command in an R session. By default RnBeads also stores big datasets on the hard drive during the analysis in order to reduce memory consumption. For this task it makes implicit use of the
ffpackage for storing temporary files. Within an R session, you can see the ff temporary directory by executing the following commands:
You can change where big RnBeads methylation datasets are stored on disk using
before running your analysis. However, if an R session is abnormally terminated, some temporary files might remain, because ff and RnBeads cannot regain control of the R sesseion to delete these files. If you suspect that your computer contains old temporary files from RnBeads analyses, check the contents of the above directories and delete them manually.