F.5 Maths Dispersion(2)

2014-11-02 8:49 am
Suggest suitable measure(s) of dispersion for comparing the two sets of data.
Set A: 3,6,7,13,16,16,27
Set B: 2,4,10,12,18,20,28
My calculation:
Set A:
range=27-3=24
Inter-quartile range=16-6=10
Standard deviation=7.54
Set B:
Range=28-2=26
Inter-quartile range=20-4=16
Standard deviation=8.53
So range and standard deviation can be used.

But why the answer is inter-quartile range and standard deviation?
Please help, thank you!!!

回答 (1)

2014-11-02 9:31 am
✔ 最佳答案
If you read in details the numbers in the two sets of data, you can observe that in Set A the datum 27 is especially deviated from the main group, this is what we called an outlier (extreme observation) in statistics.

Similarly, you may suspect the datum 28 in Set B is also an outlier.

They are not only the maximum in their own data set, but they seem like MUCH larger than the other numbers in the data set.

If we use "Range" as a measure of dispersion, our analysis would be greatly be affected by the extreme outlier, which may biase our impression of the overall dispersion of the data set.

Therefore, when such outliers exist in the data, range may not be a good choice as a measure of dispersion.

I give you an example:

Data Set X:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 999

Data Set Y:
2, 102, 202, 302, 402, 502, 602, 702, 802, 902

Clearly Data Set Y is more disperse than Data Set X.
(You can see that Data Set X is actually very concentrating at 1.)

If you look at the range:
Range of Data Set X = 999 - 1 = 998
Range of Data Set Y = 902 - 2 = 900

You will conclude that Data Set X is more disperse, which is NOT correct.
The problem arises because of the existence of the outlier "999".


2014-11-02 02:14:22 補充:
Another question, when the inter-quartile range cannot be used?

其實本題旨在測試同學是否理解不同 measure 的限制。
有時候這類問題未必有100%確實的答案。
要視乎你從不同情況下所帶出的理據。

其實沒有說什麼 "cannot be used" 的,應該是指 "not suitable to use" 的意思。

standard deviation 的壞處是麻煩,動輒要動用所有data計算。
range 的壞處是過份依賴最大和最小值,而忽略了中間的數據。

2014-11-02 02:16:55 補充:
而在此分析上 IQR 可算是大概在 s.d. 和 range 之間。

所以似乎很難可以確切地回答你什麼時候 IQR 不適用。

如果答得「濫」一點,就是當你不能確定數據的 upper quartile 或 lower quartile 時,你就不能計算(使用)IQR。

例如:
你只被提供數據中的最大三值:98, 97, 95
和最小三值:10, 11, 18
其他資料不明,那麼在此情況,你只能算到 Range,不能使用IQR。

2014-11-02 15:20:05 補充:
客氣了~
其實這類問題都要靠主觀判斷。

例如說回 measure of central tendency, 有時候用 mean 比較好, 有時候用 median 比較好, 很難一概而論, 要看實際情況...


收錄日期: 2021-04-15 17:06:18
原文連結 [永久失效]:
https://hk.answers.yahoo.com/question/index?qid=20141102000051KK00008

檢視 Wayback Machine 備份