Introduction

This vignette demonstrates a cross-platform analysis of DNA methylation data that involves methylation calls from Illumina EPIC v1 and EPIC v2 microarrays.

Example data set

This tutorial utilizes data released from an EPIC v2 validation study (Kaur et al.). The IDAT files are publicly available at the Gene Expression Omnibus (GEO) database. Technical replicate samples from cell lines listed below were provided with EPIC v1 and EPIC v2 data.

Load RnBSet objects

Provided RnBSet objects of EPIC v1 and EPIC v2 data were created by using the updated RnBeads.hg38 annotation package.

EPIC v1

HCT116 GM12878
Replicate 1 HCT116_Rep1_EPICv1 GM12878_Rep1_EPICv1
Replicate 2 HCT116_Rep2_EPICv1 GM12878_Rep2_EPICv1

EPIC v2

HCT116 GM12878
Replicate 1 HCT116_Rep1_EPICv2 GM12878_Rep1_EPICv2
Replicate 2 HCT116_Rep2_EPICv2 GM12878_Rep2_EPICv2

Load the RnBSet objects by using the following code.

library(RnBeads)
library(RnBeads.hg38)

rnbset_path <- "path/to/rnbset"

## EPIC v1
HCT116_Rep1_EPICv1 <- load.rnb.set(file.path(analysis_path, "HCT116_Rep1_EPICv1.RDS"))
HCT116_Rep2_EPICv1 <- load.rnb.set(file.path(analysis_path, "HCT116_Rep2_EPICv1.RDS"))

GM12878_Rep1_EPICv1 <- load.rnb.set(file.path(analysis_path, "GM12878_Rep1_EPICv1.RDS"))
GM12878_Rep2_EPICv1 <- load.rnb.set(file.path(analysis_path, "GM12878_Rep2_EPICv1.RDS"))

## EPIC v2
HCT116_Rep1_EPICv2 <- load.rnb.set(file.path(analysis_path, "HCT116_Rep1_EPICv2.RDS"))
HCT116_Rep2_EPICv2 <- load.rnb.set(file.path(analysis_path, "HCT116_Rep2_EPICv2.RDS"))

GM12878_Rep1_EPICv2 <- load.rnb.set(file.path(analysis_path, "GM12878_Rep1_EPICv2.RDS"))
GM12878_Rep2_EPICv2 <- load.rnb.set(file.path(analysis_path, "GM12878_Rep2_EPICv2.RDS"))

Combining the RnBSet objects

Method combine() has now been adapted to allow the merger of any two RnBeads data container objects. It has an important parameter type that specifies the set operation for merging the methylation sites:

Let us combine all data sets and perform a sequence of joint cross-platform analyses of all data. First combine the two array-base data sets:

## Combine HCT116 Replicate 1
HCT116_Rep1_combined <- rnb.combine.arrays(HCT116_Rep1_EPICv1, HCT116_Rep1_EPICv2, type = "common")

This creates an RnBeadRawSet object with 2 samples that contain 721435 common probes. Probe type counts are: 718599 CpG probes, 2779 CpH probes, 57 rs probes and 0 nv probes.

The other replicates can be combined the same way as the code above shows.

Joint analysis of the combined data

Make sure the RnBeads assembly option is set to hg38. This ensures the usage of the RnBeads.hg38 annotation package. Additionally, for differential analysis the following settings shall be configured as below. Please note that min.group.size option is set to 1 for this tutorial since the analysis will be conducted for each replicate sample.

rnb.options(assembly = "hg38")

## Differential comparison options
rnb.options(differential.comparison.columns = c("EPIC_version"))
rnb.options(columns.pairing = NULL)
rnb.options(min.group.size = 1)
rnb.options(differential.report.sites = TRUE)
rnb.options(identifiers.column="SampleID")

The RnBeads analysis pipeline can be run simply by using this new combined data set.

rnb.options(analysis.name = "EPICv1_vs_EPICv2_HCT116_Replicate_1")
report.dir <- file.path("path/to/HCT116_Rep1_Report")

res = rnb.run.analysis(data.source = HCT116_Rep1_combined, dir.report=report.dir)

The differential analysis scatterplot shows a high correlation number between the samples of two platforms (Figure 1.).