# Any clever statisticians who can lend a word of advice?

Discussion in 'General Education' started by novaguy1968, Sep 13, 2019.

Tags:
1. ### novaguy1968MS English Educator

Joined:
Jun 26, 2012
Messages:
106
14

Sep 13, 2019

Hi: I'm trying to collect some data, and am unsure of the best was to display and interpret; I'm not skilled at this type of thing-words are more my thing, not numbers I thought that perhaps organizing the data and displaying as a bell curve would be helpful, but I'm struggling with getting the information to calculate in Excel (despite having reviewed nearly a dozen videos on YouTube and websites) and then displaying them in a helpful format that will allow me to interpret the data effectively. A bell curve may not even be what is best to do this, but statistical data analysis is all Greek to me. I have a list of student test results from a diagnostic. The score range is from 1 (lowest of 50) to 34. It is likely that I will be breaking these students into bands (hopefully no more than 3); (If I were just doing this myself I would just visually do it based upon a random chosen set of values, but I want this to be compiled for statistical tracking purposes in a more authoritative way). Anyone have any suggestions? Thanks in advance!

2. 3. ### futuremathsprofPhenom

Joined:
Jun 27, 2014
Messages:
4,084
1,188

Sep 14, 2019

When you say that you are breaking the students into “bands” do you mean “bucket widths” of 3, to use the correct term? A bucket width refers to the range of values (along the x-axis) per bar (column) in a histogram, which lists the counts or frequencies per bucket (how often the data takes on each range of values).

If you want to display the data graphically, then you would want to do so using a histogram (this looks like a bar graph but without the spacing in between the bars). And to clarify, histograms are used to display quantitative variables and the range of values are continuous and not discrete like in the case of a bar graph. The latter is used to display categorical or qualitative variables.

Now, to normalize your data (so to score in a standard or consistent manner such that the mean is 0 and the standard deviation is 1), then you would fit said data to a standard normal bell curve. This is so that you can compare each students’ performance relative to their peers from the pre- to the post-test.

Normalizing your data converts the units to standard deviations and centers the data about the mean (in this case, the average class score on the diagnostic). Here, the class average would have a standard deviation of zero since the mean of a standard normal bell curve is a z-score of 0. A z-score is a measure of the spread about the mean; formally, it is defined as the number of the standard deviations from the mean. For example, if z = 1 (read: positive one), then the observation (data point) is one standard deviation above the mean. Positive standard deviations in the context of test-scores are “good.” Negative standard deviations are “bad” as said test scores fall below the average.

To do this, you would need to compute the z-score for each column of data (each bucket) and then graph the resulting data set.

To compute a z-score, you would first need to compute the population mean and population standard deviation of your data set. Here, “population” is referring to the distribution of diagnostic scores and “distribution” is referring to the range of possible numeric values and how often they occur in the data set.

Anyway, you can have Excel do this using the functions “=mean(first cell:last cell)” and “=stdev(first cell:last cell)”. To do this, you would first input all of the student scores in a different column, obviously. Next, subtract the population mean from each student score and then divide each difference by the population standard deviation. For example, in cell B1, let B1 equal “=(A1- (cell where mean is stored))/(cell wherever standard deviation is scored)”. Then, click the bottom right of the first cell in this column and drag down to however many scores you inputted into column A. Here, I “inputted” the class data into column A.

Finally, highlight the column where the normalized data is stored (in my example, this would be column B), click “Insert”, the select the histogram icon and voila. You should be able to adjust the bucket widths to your liking thereafter.

The benefit of normalizing the data is that you can now not only compare individual student scores across multiple data sets but the class data, as well, for any common assessment.

I hope this helps.

Last edited: Sep 14, 2019
Joyful! likes this.

Joined:
Jul 27, 2017
Messages:
292
144

Sep 14, 2019

It sounds like you want a histogram or dot plot, which I have never had much luck using Excel for. You might search Google for a histogram generator, in which you can just enter your data and it will instantly create one for you. You might have to try a few different websites in order to find one that allows you to specify a band width of 3 points.

For somebody without much statistical knowledge, I would not recommend trying to manipulate the data beyond just displaying it and calculating some key stats such as the mean and median. Note that if your histogram is not approximately single peaked and symmetric, the median may be a better indicator of center than the mean.

5. ### futuremathsprofPhenom

Joined:
Jun 27, 2014
Messages:
4,084
1,188

Sep 14, 2019

If the distribution is skewed left or right, then the median and IQR are better measures of center and spread, respectively. If the distribution is not skewed, i.e. symmetric, then the mean and standard deviation are generally better measures of center and spread, respectively. However, if the distribution is approximately normal (single-peaked with about the same area in both tails), then it okay to use either the mean or median as they are approximately equal to each other.

Also, good advice with finding a histogram generator online; though, Excel is pretty straightforward and less clunky then online generators I’ve seen.