Module Exercise 2: Public health (large area) epidemiology
The exercise:
The Australian government Department of Health (federal) produces reports each year containing data on notifiable diseases which are of great use to those studying changes in disease distributions with space or time with the aim of planning countrywide control initiatives. To facilitate similar regional operations, states and territories produce annual Public Health Bulletins, zoomingin on the data at a higher level of resolution.
Part 1: Access a table for NSW showing disease incidence for the years 2003 to 2012, and produce labelled, computergenerated time trend graphs for giardiasis and HIV infections using an application such as Excel®.
Part 2: Briefly discuss two possible reasons why each of these diseases might have increased or decreased over this period. Reference this discussion.
Aims of the exercise:
i. To acquire skills in the extraction, presentation, analysis and use of quantitative information from a largearea epidemiological report.
ii. To develop early perspectives on risk factors for specific diseases, and insight as to how and why these might change with time.
Hints:
i. Public Health Bulletins usually include data up to the year before they were published (eg: a 2012 bulletin usually contains data up to 2011).
ii. Departments are sometimes a few years behind with their bulletins, so a bulletin for the year 2013 might not be available until 2015.
iii. For comparison of disease incidence by places or by year, rates (not absolute numbers) are always used in epidemiology. Disease notification rates are usually given per 100,000 population.
Module Exercise 3: Bivariate linear regression analysis (correlation)
Background to the exercise:
As a preliminary step in a largescale study of asthma in Armidale, New South Wales, you are asked to carry out a study to identify the impact of ambient atmospheric general particulate pollution (PM_{10}) on the incidence of asthmatic wheeze in primary school children. Thermal inversions can occur periodically in the Armidale basin, trapping pollutants from point and diffuse sources in the lower atmosphere.
To ensure an accurate medical diagnosis you select all primary school children attending a day clinic over a 30day period in April. In this month, other “confounding” risk factors (such as rainfall) are at relatively low levels, and therefore to some extent controlled.
From trained clinical staff you obtain a daily record of asthmatic wheeze incidence in children presenting for all medical conditions at the clinic during the study period. The daily air quality record is obtained from the Department of the Environment and a short latency period (minutes to hours) between exposure to ambient air particulates and production of symptoms is assumed. You produce the tabulated data shown on the next page.
The exercise:
Part 1: Plot a graph showing the relationship between asthma wheeze and ambient atmospheric particulate matter (PM_{10}) using a recognised computer application such as Excel®. Add a computergenerated line of best fit, assuming a linear relationship. Present the graph for assessment with a comment on the type of correlation (direct or inverse), its electronicallycomputed strength in terms of Pearson’s Product Moment Correlation Coefficient r (some versions of the graph on Excel also give this), and a qualitative interpretation of this result (eg: “low correlation”, “moderate correlation”, etc.)
Part 2: Using the formula and table given in the module notes, handcalculate Pearson’s Product Moment Correlation Coefficient, r. Submit the tabulation used to generate values for the algebraic formula, along with your calculated value for r. Comment on the possible reason for any differences noted between the result obtained in parts 1 and 2.
Aim of the Exercise:
i. To gain an understanding of the use of bivariate linear regression analysis as a fundamental but powerful epidemiological analytical tool.
ii. To gain a conceptual idea of an industrially generated, environmental risk factor for an important health condition.
Day 
Total number of children with asthmatic wheeze 
Total number of children attending the clinic that day 
Ambient atmospheric particulates (PM_{10} in µg/m^{3}) 
Blank column for calculated values 
1 
11 
420 
40 

2 
8 
230 
45 

3 
11 
190 
90 

4 
24 
550 
60 

5 
31 
643 
50 

6 
39 
710 
60 

7 
39 
560 
360 

8 
26 
302 
320 

9 
19 
200 
110 

10 
31 
587 
70 

11 
22 
589 
80 

12 
21 
632 
64 

13 
14 
585 
50 

14 
27 
602 
50 

15 
22 
320 
130 

16 
16 
245 
220 

17 
24 
558 
100 

18 
26 
570 
60 

19 
42 
603 
40 

20 
36 
555 
40 

21 
46 
599 
100 

22 
17 
197 
160 

23 
16 
197 
190 

24 
26 
520 
80 

25 
22 
476 
50 

26 
19 
600 
40 

27 
14 
557 
30 

28 
17 
481 
40 

29 
10 
225 
50 

30 
10 
190 
40 

Hints:
i. If the question looks confusing and perplexing you probably need to go back to the module notes where the approach is clearly explained, and work through an example.
ii. When finished check your calculations thoroughly as marks are awarded for both method and the correct answer. With care it is relatively easy to score 100%.
iii. The first step when working with raw data is always to classify (ie: to construct a table). When in doubt, tabulate, when masses of numbers will always become clearer.
iv. Ensure accuracy by using one more decimal place in your calculations than you intend to give in your answer.
v. Use the formula in the module notes rather than the one given in text books, which is primarily for statisticians.
vi. When comparing health states (diseases and fitness) always use rates.
vii. Excel® does not do as much as SPSS and Minitab, but is probably the most userfriendly program to use, and links well with Word®. For example, values in the Word table can be cut and pasted into Excel®. Adding the line of best fit in Excel® involves highlighting the graph first by clicking on it, when the menu tab for this function will appear.