## Calculating the Mean and Standard Deviation

Since we are going to build a reporting system that produces statistical reports about the behavior of our system, let's look at some of the statistical functions that we will be using.

Quite possibly, the most commonly used function is for calculating the average value of a series of elements. The NumPy library provides two functions to calculate the average of all numbers in an array: mean() and average().

The mean() function calculates a simple mathematical mean of any given set of numbers.

>>> a = np.arange(10.) >>> a array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) >>> np.mean(a)

The average() function accepts an extra parameter, which allows you to provide weights that will be used to calculate the average value of an array. Keep in mind that the array of weights must be the same length as the primary array.

5.4000000000000004

>>> np.average(a, weights=np.array([1, 1, 1, 5, 10]))

5.833333333333333 >>>

You may wonder why you would use a weighted average. One of the most popular use cases is when you want to make some elements more significant than the others, especially if the elements are listed in a time sequence. Using the preceding example, let's assume that the numbers we used initially (5, 5, 5, 6, 6) represent the system load readings, and the readings were obtained every minute. Now we can calculate the average (or the arithmetic mean) by simply adding all the numbers together and then dividing them by the total number of elements in the array (this is what the mean() function does). In our example, that result is 5.4. However, the last readings—the most recent—are usually of greater interest and importance. Therefore, we use weights in the calculation that effectively tell the average() function which numbers are more important to us. As you can see from the result, the last two values of 6 more heavily influenced the end result once we indicated their importance.

The less known and used statistical functions are variance and standard deviation. Both of these indicators are closely related to each other and are measures of how spread out a distribution is. Simply stated, these are the functions that measure variability of a dataset. The variance is calculated as an average of the square of the distance of each data point from the mean. In mathematical terms, the variance shows the statistical dispersion of data. As an example, let's assume we have a set of random data in an array: [1, 4, 3, 5, 6, 2]. The mean value of this array is 3.5. Now we need to calculate a squared distance from the mean for each element in the array. The squared distance is calculated as (value-mean)2. So, for example, the first value is (1 - 3.5)2 = (-2.5)2 = 6.25. The rest of the values are as follows: [6.25, 0.25, 0.25, 2.25, 6.25, 2.25]. All we need to do now to get the variance of the original array is calculate the mean of these numbers, which has a value of 2.9 (rounded) in our case. Here's how to perform all those calculations with a single NumPy function call:

>>> a array([ 1., 4., 3., 5., 6., 2.]) >>> np.var(a)

2.9166666666666665 >>>

We established that this figure indicates the average squared distance from the mean, but because the value is squared, it is a bit misleading. This is because it is not the actual distance, but rather an emphasized value of it. We now need to get the square root of this value to get it back in line with the rest of the values. The resulting value represents the standard deviation of a dataset. The square root of 2.9 is roughly equal to 1.7. This means that most elements in the array are not further than 1.7 from the mean, which is 3.5 in our case. Any element outside this range is an exception to the normal expected value. Figure 11-1 illustrates this concept. In the diagram, four out of the six elements are within the standard deviation, and two readings are outside the range. Keep in mind that due to the way the standard deviation is calculated, there are always going to be some values in a dataset that are at a distance from the mean that is greater than the standard deviation of the set.

0 0

### Responses

• agata
How to get mean and standard deviation of values in a python array?
8 years ago
• yvonne
How to calculate average in python?
8 years ago
• nebyat
HOW TO CALACULATE THE STD AND PUT IN THE BELSHAPE?
8 years ago
• elanor diggle
How to calculate standard deviation using python?
8 years ago
• Gabriele
How to mean variance and sd with module in python?
2 years ago
• sarah
How to change mean and standard deviation of rand function python?
1 year ago
• barbara nichols
How to include both mean and std in python?
11 months ago
• marcel drechsler
How to calculate mean for all features in python?
11 months ago
• Alan
How to calulate mean and standard deviation in python?
11 months ago
• luigina
How to group by mean and standard deviation python?
11 months ago
• christin
Which standard deviation formula does python use?
10 months ago
• bob
How to calculate probability in python using mean and sd?
9 months ago
• patrizio
What is the function of mean and sigma in python?
6 months ago
• Elen
How to find mean median and standard deviation in python?
5 months ago
• amalia milanesi
How to calculate mean, mode,median,std deviation in python?
5 months ago
• Darryl Walker
How to find standard deviation in a vector in python?
5 months ago
• miikka
How to find sigma value python?
4 months ago
• niklas
How to find sigma points in python?
3 months ago
• Malva
How to graph mean and standard deviation with python?
2 months ago
• FILIBERT
How to calculate stardard deviation by hand in python?
7 days ago