237553 sites were removed because they overlap with SNPs. The list of removed sites is available in a dedicated table accompanying this report.
1924 sites was detected as a high coverage outlier in at least one sample and removed at this step. An outlier site is defined as one whose coverage exceeds 50 times the 0.95-quantile of coverage values in its sample. The list of removed sites is available in a dedicated table accompanying this report.
A total of 131972922 sites with coverage less than 5 were masked by NA in the methylation table The numbers of masked sites per sample are available in a dedicated table accompanying this report.
1375478 sites on sex chromosomes were removed at this step. The list of removed sites is available in a dedicated table accompanying this report.
3348867 sites were removed because they contain more than 40 missing values in the methylation table. This threshold corresponds to 50% of all samples. The total number of missing values in the methylation table before this filtering step was 361033991. A dedicated table of all removed sites is attached to this report.
The figure below shows the distribution of missing values per site.
Sites to include |
Histogram of number of sites that contain missing values. The vertical line, if visible, denotes the applied threshold.
As a final outcome of the filtering procedures, 4724345 sites and 0 samples were removed. These statistics are presented in a dedicated table that accompanies this report and visualized in the figure below.
Fractions of removed values in the dataset after applying filtering procedures.
The figure below compares the distributions of the removed methylation β values and of the retained ones.
Plot type |
Comparison of removed and retained β values.Both distributions are estimated by randomly sampling 1000000 values in each group.