Bài giảng Sinh thống kê - Lê Hoàng Ninh

pdf 48 trang hapham 2170
Bạn đang xem 20 trang mẫu của tài liệu "Bài giảng Sinh thống kê - Lê Hoàng Ninh", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfbai_giang_sinh_thong_ke_le_hoang_ninh.pdf

Nội dung text: Bài giảng Sinh thống kê - Lê Hoàng Ninh

  1. Sinh thống kê GS TS Lê Hoàng Ninh 1 © 2006
  2. Dịnh nghỉa một số thuật ngữ trong sinh thống kê • Dữ liệu: – Số đo hay quan sát một biến số • Biến số: – Đặc trưng được khảo sát đo đạt – Có thể có nhiều trị số khác nhau từ đối tượng nầy đến đối tượng khác Evidence-based Chiropractic 2 © 2006
  3. Định nghĩa từ dùng trong thống kê • Biến số độc lập – Có trước biến số phụ thuộc; căn nguyên/ nguyên nhân của một hệ quả nào đó – Thuốc lá -> ung thư phổi – Thuốc A -> khỏi bệnh • Biến số phụ thuộc: – Số đo hệ quả,/ kết cuộc – Trị số phụ thuộc và biến độc lập Evidence-based Chiropractic 3 © 2006
  4. Từ . • Tham số (Parameters) – Dữ liệu/ số đo trên quần thể (Summary data from a population) • Số thống kê (Statistics) – Dữ liệu/ số đo trên mẫu (Summary data from a sample) Evidence-based Chiropractic 4 © 2006
  5. Quần thể • Quần thể là tập hợp các cá thể mà mẫu được lấy ra – e.g., headache patients in a chiropractic office; automobile crash victims in an emergency room • Trong nghiên cứu, không thể đo đạt khảo sát trên toàn bộ quần thể • Do vậy cần phải lấy mẫu ( tổ hợp con của quần thể) Evidence-based Chiropractic 5 © 2006
  6. Mẫu ngẫu nhiên • Các đối tượng được lấy ra từ quần thể để sao cho các cá thể có cơ hội như nhau được chọn ra • Mẫu ngẫu nhiên thì đại diện cho quần thể • Mẫu không ngẫu nhiên thì không đại diện – May be biased regarding age, severity of the condition, socioeconomic status etc. Evidence-based Chiropractic 6 © 2006
  7. Mẫu ngẫu nhiên • Mẫu ngẫu nhiên hiếm có trong các nghiên cứu chăm sóc bệnh nhân • Thay vào đó, dùng phân phối ngẫu nhiên vào 2 nhóm điều trị và nhóm chứng – Each person has an equal chance of being assigned to either of the groups • Phân phối ngẫu nhiên vào các nhóm = randomization Evidence-based Chiropractic 7 © 2006
  8. Thống kê mô tả (DSs) • Cách tóm tắt dữ liệu • Minh họa bộ dữ liệu = shape, central tendency, and variability of a set of data – The shape of data has to do with the frequencies of the values of observations Evidence-based Chiropractic 8 © 2006
  9. Thống kê mô tả – Khuynh hướng trung tâm : vị trí chính giữa bộ dữ liệu – Khuynh hướng biến thiên: các trị số phía dưới , phía trên trị số trung tâm • Dispersion • Thống kê mô tả khác biệt với thống kê suy lý – Thống kê mô tả không thể kiểm định giả thuyết Evidence-based Chiropractic 9 © 2006
  10. MỘT BỘ DỮ LiỆU Case # Visits • Distribution provides a summary of: 1 7 2 2 – Frequencies of each of the values 3 2 • 2 – 3 4 3 • 3 – 4 5 4 • 4 – 3 etc. 6 3 7 5 • 5 – 1 8 3 • 6 – 1 9 4 • 7 – 2 10 6 – Ranges of values 11 2 12 3 • Lowest = 2 13 7 • Highest = 7 14 4 Evidence-based Chiropractic 10 © 2006
  11. Bảng phân phối tần số Frequency Percent Cumulative % • 2 3 21.4 21.4 • 3 4 28.6 50.0 • 4 3 21.4 71.4 • 5 1 7.1 78.5 • 6 1 7.1 85.6 • 7 2 14.3 100.0 Evidence-based Chiropractic 11 © 2006
  12. PHÂN PHỐI TẦN SỐ ĐƯỢC BIỂU THỊ BẰNG histogram Evidence-based Chiropractic 12 © 2006
  13. Histograms (cont.) • A histogram is a type of bar chart, but there are no spaces between the bars • Histograms are used to visually depict frequency distributions of continuous data • Bar charts are used to depict categorical information – e.g., Male–Female, Mild–Moderate–Severe, etc. Evidence-based Chiropractic 13 © 2006
  14. SỐ ĐO KHUYNH HƯỚNG TRUNG TÂM • Số trung bình – The most commonly used DS • Tính số trung bình – Add all values of a series of numbers and then divided by the total number of elements Evidence-based Chiropractic 14 © 2006
  15. Công thức tính số trung bình X • Trung bình mẫu X n X • Trung bình quần thể  N X (X bar) refers to the mean of a sample and μ refers to the mean of a population EX is a command that adds all of the X values n is the total number of values in the series of a sample and N is the same for a population Evidence-based Chiropractic 15 © 2006
  16. Số đo trung tâm • Mode Mode – The most frequently occurring value in a series – The modal value is the highest bar in a histogram Evidence-based Chiropractic 16 © 2006
  17. Số đo trung tâm • Trung vịn – The value that divides a series of values in half when they are all listed in order – When there are an odd number of values • The median is the middle value – When there are an even number of values • Count from each end of the series toward the middle and then average the 2 middle values Evidence-based Chiropractic 17 © 2006
  18. Số đo trung tâm • Each of the three methods of measuring central tendency has certain advantages and disadvantages • Which method should be used? – It depends on the type of data that is being analyzed – e.g., categorical, continuous, and the level of measurement that is involved Evidence-based Chiropractic 18 © 2006
  19. Cấp độ số đo • There are 4 levels of measurement – Nominal, ordinal, interval, and ratio 1. Nominal – Data are coded by a number, name, or letter that is assigned to a category or group – Examples • Gender (e.g., male, female) • Treatment preference (e.g., manipulation, mobilization, massage) Evidence-based Chiropractic 19 © 2006
  20. Cấp độ số đo 2. Ordinal – Is similar to nominal because the measurements involve categories – However, the categories are ordered by rank – Examples • Pain level (e.g., mild, moderate, severe) • Military rank (e.g., lieutenant, captain, major, colonel, general) Evidence-based Chiropractic 20 © 2006
  21. Cấp độ số đo • Ordinal values only describe order, not quantity – Thus, severe pain is not the same as 2 times mild pain • The only mathematical operations allowed for nominal and ordinal data are counting of categories – e.g., 25 males and 30 females Evidence-based Chiropractic 21 © 2006
  22. Cấp độ số đo 3. Khoảng – Measurements are ordered (like ordinal data) – Have equal intervals – Does not have a true zero – Examples • The Fahrenheit scale, where 0° does not correspond to an absence of heat (no true zero) • In contrast to Kelvin, which does have a true zero Evidence-based Chiropractic 22 © 2006
  23. Cấp độ số đo 4. Ratio – Measurements have equal intervals – There is a true zero – Ratio is the most advanced level of measurement, which can handle most types of mathematical operations Evidence-based Chiropractic 23 © 2006
  24. Levels of measurement (cont.) • Ratio examples – Range of motion • No movement corresponds to zero degrees • The interval between 10 and 20 degrees is the same as between 40 and 50 degrees – Lifting capacity • A person who is unable to lift scores zero • A person who lifts 30 kg can lift twice as much as one who lifts 15 kg Evidence-based Chiropractic 24 © 2006
  25. Levels of measurement (cont.) • NOIR is a mnemonic to help remember the names and order of the levels of measurement – Nominal Ordinal Interval Ratio Evidence-based Chiropractic 25 © 2006
  26. Cấp độ số đo Permissible mathematic Best measure of Measurement scale operations central tendency Nominal Counting Mode Greater or less than Ordinal Median operations Symmetrical – Mean Interval Addition and subtraction Skewed – Median Addition, subtraction, Symmetrical – Mean Ratio multiplication and division Skewed – Median Evidence-based Chiropractic 26 © 2006
  27. Hình dạng bộ dữ liệu • Histograms of frequency distributions have shape • Distributions are often symmetrical with most scores falling in the middle and fewer toward the extremes • Most biological data are symmetrically distributed and form a normal curve ( bell- shaped curve) Evidence-based Chiropractic 27 © 2006
  28. Hình dạng bộ dữ liệu Line depicting the shape of the data Evidence-based Chiropractic 28 © 2006
  29. Phân phối bình thường • The area under a normal curve has a normal distribution ( Gaussian distribution) • Properties of a normal distribution – It is symmetric about its mean – The highest point is at its mean Evidence-based Chiropractic 29 © 2006
  30. The normal distribution (cont.) Mean The highest point of As one moves away from the overlying the mean in either direction normal curve is at the height of the curve the mean decreases, approaching, but never reaching zero A normal distribution is symmetric about its mean Evidence-based Chiropractic 30 © 2006
  31. The normal distribution (cont.) Mean = Median = Mode Evidence-based Chiropractic 31 © 2006
  32. Phân phối lệch (Skewed distributions) • The data are not distributed symmetrically in skewed distributions – Consequently, the mean, median, and mode are not equal and are in different positions – Scores are clustered at one end of the distribution – A small number of extreme values are located in the limits of the opposite end Evidence-based Chiropractic 32 © 2006
  33. Phân phối lệch • Skew is always toward the direction of the longer tail – Positive if skewed to the right – Negative if to the left The mean is shifted the most Evidence-based Chiropractic 33 © 2006
  34. Phân phối lệch Skewed distributions • Because the mean is shifted so much, it is not the best estimate of the average score for skewed distributions • The median is a better estimate of the center of skewed distributions – It will be the central point of any distribution – 50% of the values are above and 50% below the median Evidence-based Chiropractic 34 © 2006
  35. Những tính chất đường cong bình thường • About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean • About 95.5% is within two SDs • About 99.7% is within three SDs Evidence-based Chiropractic 35 © 2006
  36. More properties of normal curves (cont.) Evidence-based Chiropractic 36 © 2006
  37. Độ lệch chuẩn (SD) • SD is a measure of the variability of a set of data • The mean represents the average of a group of scores, with some of the scores being above the mean and some below – This range of scores is referred to as variability or spread • Variance (S2) is another measure of spread Evidence-based Chiropractic 37 © 2006
  38. SD (cont.) • In effect, SD is the average amount of spread in a distribution of scores • The next slide is a group of 10 patients whose mean age is 40 years – Some are older than 40 and some younger Evidence-based Chiropractic 38 © 2006
  39. SD (cont.) Ages are spread out along an X axis The amount ages are spread out is known as dispersion or spread Evidence-based Chiropractic 39 © 2006
  40. Distances ages deviate above and below the mean Etc. Adding deviations always equals zero Evidence-based Chiropractic 40 © 2006
  41. Calculating S2 • To find the average, one would normally total the scores above and below the mean, add them together, and then divide by the number of values • However, the total always equals zero – Values must first be squared, which cancels the negative signs Evidence-based Chiropractic 41 © 2006
  42. Calculating S2 cont. S2 is not in the same units (age), but SD is Symbol for SD of a sample  for a population Evidence-based Chiropractic 42 © 2006
  43. Wide spread results in higher SDs narrow spread in lower SDs Evidence-based Chiropractic 43 © 2006
  44. Spread is important when comparing 2 or more group means It is more difficult to see a clear distinction between groups in the upper example because the spread is wider, even though the means are the same Evidence-based Chiropractic 44 © 2006
  45. z-scores • The number of SDs that a specific score is above or below the mean in a distribution • Raw scores can be converted to z-scores by subtracting the mean from the raw score then dividing the difference by the SD X  z  Evidence-based Chiropractic 45 © 2006
  46. z-scores (cont.) • Standardization – The process of converting raw to z-scores – The resulting distribution of z-scores will always have a mean of zero, a SD of one, and an area under the curve equal to one • The proportion of scores that are higher or lower than a specific z-score can be determined by referring to a z-table Evidence-based Chiropractic 46 © 2006
  47. z-scores (cont.) Refer to a z-table to find proportion under the curve Evidence-based Chiropractic 47 © 2006
  48. Partial z-table (to z = 1.5) showing proportions of the area under a normal curvez for-scores different values of(cont.) z. Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557Corresponds0.5596 0.5636 to0.5675 the 0.5714area 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948under0.5987 the0.6026 curve0.6064 in black0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.93320.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 Evidence-based Chiropractic 48 © 2006