space
Mann-Whitney U-test
space
QUMA (QUantification tool for Methylation Analysis) top spacer close
The statistical significance between two groups of the entire set of CpG sites is evaluated with the Mann-Whitney U-test (also called the Wilcoxon rank-sum test) that is non-parametric statistical significance test for two distributed samples. Although, Student's t-test is useful in the same situations as Mann-Whitney U-test, we adopt not the parametric Student's t-test but the non-parametric Mann-Whitney U-test, because methylation status does not distribute as a normal distribution, especially in case of hyper- or hypo-methylation. Two-tailed p-value of the Mann-Whitney U-test is determined from ranks of ratio of methylated CpGs to all CpGs at each bisulfite sequence (exampled below). This p-value indicates the independence of distribution of the ratio of CpG methylation to all CpG.
Importantly, this test dose not detect differences in the some situations, especially CpG methylation of imprinting regions, because this test only check the difference of the average of two groups. Additionally, the patterns of CpG methylation are not considered.
 list  Example
The sample data sets are:
Me-CpGs/CpGs of each sequence
(number of methylated CpGs / number of CpGs)
average ratio of
methylation
number of
sequences
group1 6/19, 6/19, 8/19, 9/19 12/19, 15/19, 16/19, 18/19, 18/19, 18/19, 18/18, 19/19, 19/19 0.7409 13 (= n1)
group2 2/19, 2/19, 3/19, 3/19 5/19, 5/19, 7/19, 7/19, 7/19, 8/19 0.2579 10 (= n2)
(This is the analyzed data of the QUMA sample sequence files.)
Is this difference between the average ratio of methylation (0.7409 vs. 0.2579) significant?
First, make ranking of the values (methylation ratio) and determine a rank. When two or more values are share the same rank, take an average of the rank values. In the sample data, two sequences are Me-CpGs/CpGs = 3/19 and the rank values are 3 and 4. Then use 3.5 (average of 3 and 4) as the rank.
Second, calculate sum of the rank (Rank sum): R1 and R2.
Position i 1 2 3 4 5 6 7 8 9 10 11 12 Rank sum
Me-CpGs/CpGs 2/19 3/19 5/19 6/19 7/19 8/19 9/19 12/19 15/19 16/19 18/19 1
rank 1,2 3,4 5,6 7,8 9-11 12,13 14 15 16 17 18-20 21-23
rank (average) 1.5 3.5 5.5 7.5 10 12.5 14 15 16 17 19 22
number of
sequences
group1 0 0 0 2 0 1 1 1 1 1 3 3 212.5 (=R1)
group2 2 2 2 0 3 1 0 0 0 0 0 0 63.5 (=R2)
total 2 2 2 2 3 2 1 1 1 1 3 3
Third, determine temporary U-value, U1 and U2, as below.
U1 = n1 * n2 + n1 * (n1 + 1) / 2 - R1 = 8.5
U2 = n1 * n2 + n2 * (n2 + 1) / 2 - R2 = 121.5
Take the smaller value of U1 and U2 as the U-value. In this case, U = 8.5
Then determine a two-tailed p-value from the U-value. To determine the p-value, we take the approximation using the normal distribution for the number of sequences above 20. In the case of small sequences (20 and below), we determine the p-value from exact probabilities (Mann Whitney U exact test).
The normal approximation is performed as:
fomula1
where z is a standard normal deviate, E(U) is the mean of U and V(U) is the variance of U:
fomula2
fomula3
where ti is the number of tied ranks of the position i.
At the sample, E(U) = 65, V(U) = 257.812 and z = 3.51879. Then, the two-tailed p-value = 0.0004 is determined from the standard normal distribution (double value for two-tail).
Another sample data sets for Mann Whitne U exact test are:
Table1
Me-CpGs/CpGs of each sequence
(number of methylated CpGs / number of CpGs)
average ratio of
methylation
number of
sequences
group1 6/19, 6/19, 9/19 12/19, 15/19, 18/19 0.5789 6 (= n1)
group2 3/19, 5/19, 5/19, 7/19, 7/19 0.2842 5 (= n2)
Table2
Position i 1 2 3 4 5 6 7 8 number of
sequences
Rank sum
Me-CpGs/CpGs 3/19 5/19 6/19 7/19 9/19 12/19 15/19 18/19
rank 1 2,3 4,5 6,7 8 9 10 11
rank (average) 1 2.5 4.5 6.5 8 9 10 11
number of
sequences
group1 0 0 2 0 1 1 1 1 6 47 (=R1)
group2 1 2 0 2 0 0 0 0 5 19 (=R2)
total 1 2 2 2 1 1 1 1 11
U1 = n1 * n2 + n1 * (n1 + 1) / 2 - R1 = 4
U2 = n1 * n2 + n2 * (n2 + 1) / 2 - R2 = 26
U = min (U1, U2) = 4
When the marginal totals are fixed, there are 179 cases and 11 cases indicated below have U-value not more than the U-value of the sample.
Position i 1 2 3 4 5 6 7 8 Rank
sum
U-value Probability
Me-CpGs/CpGs 3/19 5/19 6/19 7/19 9/19 12/19 15/19 18/19
rank 1 2,3 4,5 6,7 8 9 10 11
rank (average) 1 2.5 4.5 6.5 8 9 10 11
group1/group2 1/0 2/0 2/0 1/1 0/1 0/1 0/1 0/1 21.5/44.5 0.5 0.00433
group1/group2 1/0 2/0 2/0 0/2 1/0 0/1 0/1 0/1 23/43 2 0.00216
group1/group2 1/0 2/0 2/0 0/2 0/1 1/0 0/1 0/1 24/42 3 0.00216
group1/group2 1/0 2/0 2/0 0/2 0/1 0/1 1/0 0/1 25/41 4 0.00216
group1/group2 1/0 2/0 1/1 2/0 0/1 0/1 0/1 0/1 23.5/42.5 2.5 0.00433
group1/group2 1/0 2/0 1/1 1/1 1/0 0/1 0/1 0/1 25/41 4 0.00866
group1/group2 0/1 1/1 0/2 1/1 1/0 1/0 1/0 1/0 47/19 4 0.00866
group1/group2 0/1 0/2 2/0 0/2 1/0 1/0 1/0 1/0 47/19 4 0.00216
group1/group2 0/1 0/2 1/1 2/0 0/1 1/0 1/0 1/0 47.5/18.5 3.5 0.00433
group1/group2 0/1 0/2 1/1 1/1 1/0 1/0 1/0 1/0 49/17 2 0.00866
group1/group2 0/1 0/2 0/2 2/0 1/0 1/0 1/0 1/0 51/15 0 0.00216
To determine a two-tailed p-value of the significance, make a sum of probabilities of these 11 cases. Then, the two-tailed p-value = 0.0498
QUMA (QUantification tool for Methylation Analysis) top spacer close