SNP-based filtering was not performed because the site annotation does not include SNP information.
434 sites was detected as a high coverage outlier in at least one sample and removed at this step. An outlier site is defined as one whose coverage exceeds 50 times the 0.95-quantile of coverage values in its sample. The list of removed sites is available in a dedicated table accompanying this report.
A total of 3143364 sites with coverage less than 5 were masked by NA in the methylation table The numbers of masked sites per sample are available in a dedicated table accompanying this report.
579135 sites were removed because they contain more than 7 missing values in the methylation table. This threshold corresponds to 50% of all samples. The total number of missing values in the methylation table before this filtering step was 6338692. A dedicated table of all removed sites is attached to this report.
The figure below shows the distribution of missing values per site.
Sites to include |
Histogram of number of sites that contain missing values. The vertical line, if visible, denotes the applied threshold.
As a final outcome of the filtering procedures, 579569 sites and 0 samples were removed. These statistics are presented in a dedicated table that accompanies this report and visualized in the figure below.
The figure below compares the distributions of the removed methylation β values and of the retained ones.
Plot type |
Comparison of removed and retained β values.Both distributions are estimated by randomly sampling 1000000 values in each group.