Unit 5
Statistical Methods

Method of least square
Suppose y = a + bx ………. (1) is the straight line has to be fitted for the data points given, Let be the theoretical value for Now For the minimum value of S  Or Now Or On solving equation (1) and (2), we get These two equations are known as the normal equations. Now on solving these two equations we get the values of a and b. 
Example: Find the straight line that best fits of the following data by using method of least square.
X  1  2  3  4  5 
y  14  27  40  55  68 
Sol.
Suppose the straight line
y = a + bx…….. (1)
Fits the best
Then
x  y  xy  
1  14  14  1 
2  27  54  4 
3  40  120  9 
4  55  220  16 
5  68  340  25 
Sum = 15  204  748  55 
Normal equations are Put the values from the table, we get two normal equations On solving the above equations, we get So that the best fit line will be (on putting the values of a and b in equation (1)) 
Example: Find the best values of a and b so that y = a + bx fits the data given in the table
x  0  1  2  3  4 
y  1.0  2.9  4.8  6.7  8.6 
Solution.
y = a + bx
x  y  xy  
0  1.0  0  0 
1  2.9  2.0  1 
2  4.8  9.6  4 
3  6.7  20.1  9 
4  8.6  13.4  16 
x = 10  y ,= 24.0  xy = 67.0 
Normal equations, y= na+ bx (2) On putting the values of On solving (4) and (5) we get, On substituting the values of a and b in (1) we get  To fit the parabola The normal equations are On solving three normal equations we get the values of a,b and c.

Note Change of scale
We change the scale if the data is large and given in equal interval.
As
Example: Fit a seconddegree parabola to the following data by least squares method.
1929  1930  1931  1932  1933  1934  1935  1936  1937  
352  356  357  358  360  361  361  360  359 
Solution: Taking Taking The equation is transformed to 
1929  4  352  5  20  16  80  64  256 
1930  3  360  1  3  9  9  27  81 
1931  2  357  0  0  4  0  8  16 
1932  1  358  1  1  1  1  1  1 
1933  0  360  3  0  0  0  0  0 
1934  1  361  4  4  1  4  1  1 
1935  2  361  4  8  4  16  8  16 
1936  3  360  3  9  9  27  27  81 
1937  4  359  2  8  16  32  64  256 
Total 

Normal equations are
On solving these equations, we get 
Example: Find the least squares approximation of second degree for the discrete data
x  2  1  0  1  2 
y  15  1  1  3  19 
Solution. Let the equation of seconddegree polynomial be
x  y  xy  
2  15  30  4  60  8  16 
1  1  1  1  1  1  1 
0  1  0  0  0  0  0 
1  3  3  1  3  1  1 
2  19  38  4  76  8  16 
x=0  y=39  xy=10 
Normal equations are On putting the values of x, y, xy, have On solving (5),(6),(7), we get, The required polynomial of second degree is 
Example: Fit a seconddegree parabola to the following data.
X = 1.0  1.5  2.0  2.5  3.0  3.5  4.0 
Y = 1.1  1.3  1.6  2.0  2.7  3.4  4.1 
Solution
We shift the origin to (2.5, 0) antique 0.5 as the new unit. This amounts to changing the variable x to X, by the relation X = 2x – 5.
Let the parabola of fit be y = a + bXThe values of X etc. Are calculated as below:
x  X  y  Xy  
1.0  3  1.1  3.3  9  9.9  27  81 
1.5  2  1.3  2.6  4  5.2  5  16 
2.0  1  1.6  1.6  1  1.6  1  1 
2.5  0  2.0  0.0  0  0.0  0  0 
3.0  1  2.7  2.7  1  2.7  1  1 
3.5  2  3.4  6.8  4  13.6  8  16 
4.0  3  4.1  12.3  9  36.9  27  81 
Total  0  16.2  14.3  28  69.9  0  196 
The normal equations are
7a + 28c =16.2; 28b =14.3;. 28a +196c=69.9 Solving these as simultaneous equations we get Replacing X bye 2x – 5 in the above equation we get Which simplifies to y = This is the required parabola of the best fit. 
Example: Fit the curve by using the method of least square.
X  1  2  3  4  5  6 
Y  7.209  5.265  3.846  2.809  2.052  1.499 
Sol.
Here Now put Then we get 
x  Y  xY  
1  7.209  1.97533  1.97533  1 
2  5.265  1.66108  3.32216  4 
3  3.846  1.34703  4.04109  9 
4  2.809  1.03283  4.13132  16 
5  2.052  0.71881  3.59405  25 
6  1.499  0.40480  2.4288  36 
Sum = 21 
 7.13988  19.49275  91 
Normal equations are Putting the values form the table, we get 7.13988 = 6c + 21b 19.49275 = 21c + 91b On solving, we get b = 0.3141 and c = 2.28933 c = Now put these values in equations (1), we get 
Example: Estimate the chlorine residual in a swimming pool 5 hours after it has been treated with chemicals by fitting an exponential curve of the form
of the data given below
Hours(X)  2  4  6  8  10  12 
Chlorine residuals (Y)  1.8  1.5  1.4  1.1  1.1  0.9 
Sol.
Taking log on the curve which is nonlinear, We get Put
Then Which is the linear equation in X, 
Its normal equations are

Here N = 6,
Thus the normal equations are On solving, we get
Or A = 2.013 and B = 0.936 Hence the required least square exponential curve Prediction Chlorine content after 5 hours 
Key takeaways
When two variables are related in such a way that change in the value of one variable affects the value of the other variable, then these two variables are said to be correlated and there is correlation between two variables.
Example Height and weight of the persons of a group.
The correlation is said to be perfect correlation if two variables vary in such a way that their ratio is constant always.
Scatter diagram
Karl Pearson’s coefficient of correlation
Here and 
Note
1. Correlation coefficient always lies between 1 and +1.
2. Correlation coefficient is independent of change of origin and scale.
3. If the two variables are independent then correlation coefficient between them is zero.
Perfect Correlation: If two variables vary in such a way that their ratio is always constant, then the correlation is said to be perfect.
Correlation coefficient  Type of correlation 
+1  Perfect positive correlation 
1  Perfect negative correlation 
0.25  Weak positive correlation 
0.75  Strong positive correlation 
0.25  Weak negative correlation 
0.75  Strong negative correlation 
0  No correlation 
Example: Find the correlation coefficient between Age and weight of the following data
Age  30  44  45  43  34  44 
Weight  56  55  60  64  62  63 
Sol.
x  y  ())  
30  56  10  100  4  16  40 
44  55  4  16  5  25  20 
45  60  5  25  0  0  0 
43  64  3  9  4  16  12 
34  62  6  36  2  4  12 
44  63  4  16  3  9  12 
Sum= 240 
360 
0 
202 
0 
70

32 
Karl Pearson’s coefficient of correlation
Here the correlation coefficient is 0.27. which is the positive correlation (weak positive correlation), this indicates that the as age increases, the weight also increase.
Example:
Ten students got the following percentage of marks in Economics and Statistics
Calculate the of correlation.
Roll No.  
Marks in Economics  
Marks in 
Solution:
Let the marks of two subjects be denoted by and respectively.
Then the mean for marks and the mean ofy marks 
and are deviations ofx’s and ’s from their respective means, then the data may be arranged in the following form:
Shortcut method to calculate correlation coefficient
Here, 
Example: Find the correlation coefficient between the values X and Y of the dataset given below by using shortcut method
X  10  20  30  40  50 
Y  90  85  80  60  45 
Sol.
X  Y  
10  90  20  400  20  400  400 
20  85  10  100  15  225  150 
30  80  0  0  10  100  0 
40  60  10  100  10  100  100 
50  45  20  400  25  625  500 
Sum = 150 
360 
0 
1000 
10 
1450 
1150 
Shortcut method to calculate correlation coefficient 
Spearman’s rank correlation
Solution.
Let be the ranks of individuals corresponding to two characteristics. Assuming nor two individuals are equal in either classification, each individual takes the values 1, 2, 3, and hence their arithmetic means are, each Let , , , be the values of variable and , , those of Then where and y are deviations from the mean. Clearly, and 
SPEARMAN’S RANK CORRELATION COEFFICIENT:
where denotes rank coefficient of correlation and refers to the difference ofranks between paired items in two series.
Example: Compute the Spearman’s rank correlation coefficient of the dataset given below
Person  A  B  C  D  E  F  G  H  I  J 
Rank in test1  9  10  6  5  7  2  4  8  1  3 
Rank in test2  1  2  3  4  5  6  7  8  9  10 
Sol.
Person  Rank in test1  Rank in test2  d =  
A  9  1  8  64 
B  10  2  8  64 
C  6  3  3  9 
D  5  4  1  1 
E  7  5  2  4 
F  2  6  4  16 
G  4  7  3  9 
H  8  8  0  0 
I  1  9  8  64 
J  3  10  7  49 
Sum 


 280 
Example: If X and Y are uncorrelated random variables, the of correlation between and
Solution.
Let and Then Now Similarly Now Also (As and are not correlated, we have ) Similarly 
Regression
If the scatter diagram indicates some relationship between two variables and , then the dots of the scatter diagram will be concentrated round a curve. This curve is called the curve ofregression. Regression analysis is the method used for estimating the unknown values of one variable corresponding to the known value of another variable.
Or in other words, Regression is the measure of average relationship between independent and dependent variable
Regression can be used for two or more than two variables.
There are two types of variables in regression analysis.
1. Independent variable
2. Dependent variable
The variable which is used for prediction is called independent variable.
It is known as predictor or regressor.
The variable whose value is predicted by independent variable is called dependent variable or regressed or explained variable.
The scatter diagram shows relationship between independent and dependent variable, then the scatter diagram will be more or less concentrated round a curve, which is called the curve of regression.
When we find the curve as a straight line then it is known as line of regression and the regression is called linear regression.
Note regression line is the best fit line which expresses the average relation between variables.
LINE OF REGRSSION
When the curve is a straight line, it is called a line of regression. A line of regression is the straight line which gives the best fit in the least square sense to the given frequency.
Equation of the line of regression
Let
y = a + bx ………….. (1) is the equation of the line of y on x. Let be the estimated value of for the given value of . 
So that, according to the principle of least squares, we have the determined ‘a’ and ‘b’ so that the sum of squares of deviations of observed values of y from expected values of y,
That means
Or …….. (2) Is minimum. Form the concept of maxima and minima, we partially differentiate U with respect to ‘a’ and ‘b’ and equate to zero. Which means And These equations (3) and (4) are known as normal equation for straight line. Now divide equation (3) by n, we get This indicates that the regression line of y on x passes through the point We know that The variance of variable x can be expressed as Dividing equation (4) by n, we get From the equation (6), (7) and (8) Multiply (5) by, we get Subtracting equation (10) from equation (9), we get Since ‘b’ is the slope of the line of regression y on x and the line of regression passes through the point (), so that the equation of the line of regression of y on x is This is known as regression line of y on x. Note are the coefficients of regression. 2. 
Example: Two variables X and Y are given in the dataset below, find the two lines of regression.
x  65  66  67  67  68  69  70  71 
y  66  68  65  69  74  73  72  70 
Sol.
The two lines of regression can be expressed as
And 
x  y  xy  
65  66  4225  4356  4290 
66  68  4356  4624  4488 
67  65  4489  4225  4355 
67  69  4489  4761  4623 
68  74  4624  5476  5032 
69  73  4761  5329  5037 
70  72  4900  5184  5040 
71  70  5041  4900  4970 
Sum = 543  557  36885  38855  37835 
Now And Standard deviation of x Similarly Correlation coefficient
Put these values in regression line equation, we get Regression line y on x Regression line x on y 
Regression line can also be find by the following method
Example: Find the regression line of y on x for the given dataset.
X  4.3  4.5  5.9  5.6  6.1  5.2  3.8  2.1 
Y  12.6  12.1  11.6  11.8  11.4  11.8  13.2  14.1 
Sol.
Let y = a + bx is the line of regression of y on x, where ‘a’ and ‘b’ are given as We will make the following table
Using the above equations we get
On solving these both equations, we get a = 15.49 and b = 0.675 So that the regression line is – y = 15.49 – 0.675x
Note – Standard error of predictions can be find by the formula given below

Difference between regression and correlation
1. Correlation is the linear relationship between two variables while regression is the average relationship between two or more variables.
2. There are only limited applications of correlation as it gives the strength of linear relationship while the regression is to predict the value of the dependent varibale for the given values of independent variables.
3. Correlation does not consider dependent and independent variables while regression consider one dependent variable and other indpendent variables.
Key takeaways
2. Perfect Correlation: If two variables vary in such a way that their ratio is always constant, then the correlation is said to be perfect. 3. Shortcut method to calculate correlation coefficient 4. Spearman’s rank correlation 5. The variable which is used for prediction is called independent variable. It is known as predictor or regressor. 6. regression line is the best fit line which expresses the average relation between variables. 7. regression line of y on x. 
Probability is the study of chances. Probability is the measurement of the degree of uncertainty and therefore, of certainty of the occurrence of events.
A probability space is a threetuple (S, F, P) in which the three components are
 Sample space: A nonempty set S called the sample space, which represents all possible outcomes.
 Event space: A collection F of subsets of S, called the event space.
 Probability function: A function P : FR, that assigns probabilities to the events in F.
Basic definitions
1. Exhaustive events The set of all possible outcomes of an experiment is called exhaustive event of sample space.
Example
1. If we toss a coin then the sample space is S = {H, T}, where H and T denote head and tail respectively and n(S) = 2 2. If a coin is tossed thrice or three coins are tossed simultaneously, then the sample space is S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} and n(S) = 8. 3. If a coin is tossed 4 times or four coins are tossed simultaneously then the sample space is S = {HHHH, HHHT, HHTH, HTHH, THHH, HHTT, HTHT, HTTH, THHT, THTH, TTHH, HTTT, THTT, TTHT, TTTH, TTTT} and n(S) = 16. Each outcome is called sample point. Example If a die is thrown twice, then getting (1, 1) or (1, 2) or (1, 3) or…or (6, 6) is a sample point. 
2. Mutually exclusive events When the occurrence of one event excludes the occurrence of the other then these two events said to be mutually exclusive
3. Equally likely Two events are said to be equally likely if one of them cannot be occur in preference to the others.
Random experiment
An experiment in which all the possible outcomes are known in advance but we cannot predict as to which of them will occur when we perform the experiment.
Example ‘Throwing a die’ and ‘Drawing a card from a well shuffled pack of 52 playing cards ‘are the examples of random experiment
Event
Set of one or more possible outcomes of an experiment constitutes what is known as event. Thus, an event can be defined as a subset of the sample space
Favourable cases
The cases which favour to the happening of an event are called favourable cases
ExampleFor the event of getting an even number in throwing a die, the number of favourable cases is 3 and the event in this case is {2, 4, 6}.
Odds in favour of an event and odds against an event
If the number of favourable cases are ‘m’ and the number or not favourable cases are ‘n’.
Then
1. Odds in favour of the event = m/n
2. Odds against the event = n/m
Classical definition of probability
Suppose there are ‘n’ exhaustive cases in a random experiment which is equally likely and mutually exclusive.
Let ‘m’ cases are favourable for the happening of an event A, then the probability of happening event A can be defined as
Probability of nonhappening of the event A is defined as
Note Always remember that the probability of any events lies between 0 and 1.
Expected value
Let are the probabilities of events and respectively. Then the expected value can be defined as

Example: A bag contains 7 red and 8 black balls then find the probability of getting a red ball.
Sol.
Here total cases = 7 + 8 = 15
According to the definition of probability,
So that, here favourable cases red balls = 7
Then,
NOTES:
• In general, an event has associated to it a probability, which is a real number between 0 and 1.
• Events which are unlikely have low (close to 0) probability, and events which are likely have high (close to 1) probability.
• The probability of an event which is certain to occur is 1; the probability of an impossible event is 0.
Addition law
If are the probabilities of mutually exclusive events, then the probability P, that any of these events will happen is given by
Note
If two events A and B are not mutually exclusive then then probability of the event that either A or B or both will happen is given by
Example: A box contains 4 white and 2 black balls and a second box contains three balls of each colour. Now a bag is selected at random and a ball is drawn randomly from the chosen box. Then what will be the probability that the ball is white.
Sol.
Here we have two mutually exclusive cases
1. The first bag is chosen
2. The second bag is chosen
The chance of choosing the first bag is 1/2. And if this bag is chosen then the probability of drawing a white ball is 4/6.
So that the probability of drawing a white ball from first bag is
And the probability of drawing a white ball from second bag is
Here the events are mutually exclusive, then the required probability is 
Example25 lottery tickets are marked with first 25 numerals. A ticket is drawn at random.
Find the probability that it is a multiple of 5 or 7.
Sol:
Let A be the event that the drawn ticket bears a number multiple of 5 and B be the event that it bears a number multiple of 7.
So that
A = {5, 10, 15, 20, 25}
B = {7, 14, 21}
Here, as A B = ,
A and B are mutually exclusive
Then,

Conditional Probability
Suppose A and B are two events of a sample space S and P(B) is nonzero, then conditional probability of the event A, given B,
It is given by P(A/B) and read as Probability of A given B
Defined by
Note If two events A and B are independent then

Multiplication theorem for conditional probability
Example: A bag contains 12 pens of which 4 are defective. Three pens are picked at random from the bag one after the other.
Then find the probability that all three are nondefective.
Sol. here the probability of the first which will be nondefective = 8/12
By the multiplication theorem of probability,
If we draw pens one after the other then the required probability will be
Example: The probability of A hits the target is 1 / 4 and the probability that B hits the target is 2/ 5. If both shoot the target then find the probability that at least one of them hits the target.
Sol.
Here it is given that Now we have to find Both two events are independent. So that 
Example: A factory has two machines A and B making 60% and 40% respectively of the total production. Machine A produces 3% defective items, and B produces 5% defective items. Find the probability that a given defective part came from A.
SOLUTION
We consider the following events: A: Selected item comes from A. B: Selected item comes from B. D: Selected item is defective. We are looking for . We know: Now, So we need
Since, D is the union of the mutually exclusive events and (the entire sample space is the union of the mutually exclusive events A and B)

Example: Three urns contain 6 red, 4 black; 4 red, 6 black; 5 red, 5 black balls respectively. One of the urns is selected at random and a ball is drawn from it. If the ball drawn is red find the probability that it is drawn from the first urn.
Solution:
:The ball is drawn from urnI.
: The ball is drawn from urnII.
: The ball is drawn from urnIII.
R:The ball is red.
We have to find
Since the three urns are equally likely to be selected
Also,
From (i), we have

Key takeaways

Multiplication law
For two events A and B Here is called conditional probability of B given that A has already happened.
Now If A and B are two independent events, then
Because in case of independent events

Example: A bag contains 9 balls, two of which are red three blue and four black.
Three balls are drawn randomly. What is the probability that
1. The three balls are of different colours
2. The three balls are of the same colours.
Sol.
1. Three balls will be of different colour if one ball is red, one blue and one black ball are drawn Then the probability will be 2. Three balls will be of same colour if one ball is red, one blue and one black ball are drawn Then the probability will be

Example: A die is rolled. If the outcome is a number greater than three. What is the probability that it is a prime number.
Sol.
The sample space is S = {1, 2, 3, 4, 5, 6} Let A be the event that the outcome is a number which is greater than three and B be the event that it is a prime. So that A = {4, 5, 6} and B = {2, 3, 5} and hence
P(A) = 3/6, P(B) = 3/6 and
Now the required probability

Example: Two cards are drawn from a pack of playing cards in succession with replacement of first card. Find the probability that the both are the cards of heart.
Sol.
Let A be the event that first card drawn is a heart and B be the event that second card is a heart card. As the cards are drawn with replacement, Here A and B are independent and the required probability will be

Example: Two male and female candidates appear in an interview for two positions in the same post. The probability that the male candidate is selected is 1/7 and the female candidate selected is 1/5.
What is the probability that
1. Both of them will be selected
2. Only one of them will be selected
3. None of them will be selected.
Sol.
Here, P (male’s selection) = 1/7 And P (female’s selection) = 1/5 Then 1.
2.
3.

Example: A can hit a target 3 times in 5 shots, B 2 times in 5 shots and C 3 times in 4 shots. All of them fire one shot each simultaneously at the target.
What is the probability that
1. Two shots hit
2. At least two shots hit
Sol.
1. Now probability that 2 shots hit the target 2. Probability of at least two shots hitting the target

Baye’s theorem
If , are mutually exclusive events with of a random experiment then for any arbitrary event of the sample space of the above experiment with , we have (for )

Example1: An urn contains 3 white and 4 red balls and an urn lI contains 5 white and 6 red balls. One ball is drawn at random from one ofthe urns and isfound to be white. Find the probability that it was drawn from urn 1.
Solution:
Let : the ball is drawn from urn I : the ball is drawn from urn II : the ball is white. We have to find By Bayes Theorem ... (1)
Since two urns are equally likely to be selected, (a white ball is drawn from urn ) (a white ball is drawn from urn II) From(1), 
Example2:
Three urns contains 6 red, 4 black, 4 red, 6 black; 5 red, 5 black balls respectively. One of the urns is selected at random and a ball is drawn from it. lf the ball drawn is red find the probability that it is drawn from thefirst urn.
Solution:
Let: the ball is drawn from urn 1. : the ball is drawn from urn lI. : the ball is drawn from urn 111. : the ball is red. We have to find . By Baye’s Theorem, ... (1) Since the three urns are equally likely to be selected Also (a red ball is drawn from urn ) (R/) (a red ball is drawn from urn II) (a red ball is drawn from urn III) From (1), we have 
Example3: ln a bolt factory machines and manufacturerespectively 25%, 35% and 40% of the total. lf their output 5, 4 and 2 per cent are defective bolts. A bolt is drawn at random from theproduct and is found to be defective. What is the probability that it was manufactured by machine B.?
Solution:
Bolt is manufactured by machine : bolt is manufactured by machine : bolt is manufactured by machine The probability ofdrawing a defective bolt manufactured by machine is (D/A) Similarly, (D/B) and (D/C) By Baye’s theorem 
Key takeaways
Baye’s theorem 
References
 E. Kreyszig, “Advanced Engineering Mathematics”, John Wiley & Sons, 2006.
 P. G. Hoel, S. C. Port And C. J. Stone, “Introduction To Probability Theory”, Universal Book Stall, 2003.
 S. Ross, “A First Course in Probability”, Pearson Education India, 2002.
 W. Feller, “An Introduction To Probability Theory and Its Applications”, Vol. 1, Wiley, 1968.
 N.P. Bali and M. Goyal, “A Text Book of Engineering Mathematics”, Laxmi Publications, 2010.
 B.S. Grewal, “Higher Engineering Mathematics”, Khanna Publishers, 2000.
 T. Veerarajan, “Engineering Mathematics”, Tata McgrawHill, New Delhi, 2010
 Higher engineering mathematics, HK Dass