237553 sites were removed because they overlap with SNPs. The list of removed sites is available in a dedicated table accompanying this report.
1924 sites was detected as a high coverage outlier in at least one sample and removed at this step. An outlier site is defined as one whose coverage exceeds 50 times the 0.95-quantile of coverage values in its sample. The list of removed sites is available in a dedicated table accompanying this report.
A total of 131972922 sites with coverage less than 5 were masked by NA in the methylation table The numbers of masked sites per sample are available in a dedicated table accompanying this report.
1375478 sites on sex chromosomes were removed at this step. The list of removed sites is available in a dedicated table accompanying this report.
3348867 sites were removed because they contain more than 40 missing values in the methylation table. This threshold corresponds to 50% of all samples. The total number of missing values in the methylation table before this filtering step was 361033991. A dedicated table of all removed sites is attached to this report.
The figure below shows the distribution of missing values per site.
Sites to include |
Histogram of number of sites that contain missing values. The vertical line, if visible, denotes the applied threshold.
As a final outcome of the filtering procedures, 4724345 sites and 0 samples were removed. These statistics are presented in a dedicated table that accompanies this report and visualized in the figure below.
The figure below compares the distributions of the removed methylation β values and of the retained ones.
Plot type |
Comparison of removed and retained β values.Both distributions are estimated by randomly sampling 1000000 values in each group.