MCom I Semester Statistical Analysis Chi Square Test Coefficient Contingency Study Material Notes

///

MCom I Semester Statistical Analysis Chi-Square Test Coefficient Contingency Study Material Notes

MCom I Semester Statistical Analysis Chi-Square Test Coefficient Contingency Study Material Notes: Meaning of Chi-Square Test Procedure of Chi-Square Test Conditions for the Validity of Chi-Square Test Area of Application of Chi-Square Test Yates Correction Calculation of X2 in contingency Table PearsonCo Efficient of Mean Square Contingency Long Answer Question Questions Short Answer Questions Objective Questions  :

Chi Square Test Coefficient
Chi-Square Test Coefficient

CTET Paper Level 2 Questions Answer Language II English Model paper

Chi-square Test and Coefficient of Contingency

In the chapter of ‘Association of Attributes’, we have studied that if ‘A’ and ‘B’ two attributes are independent then actual and expected frequencies of (AB) will be equal to each other. On the other hand, if actual frequencies are more or less than the expected frequencies then it is assumed that said attributes are associated. It means association between two attributes is found only when actual frequencies differ from the expected frequencies. But difference of frequencies can not be considered as a perfect indicator of association. In some cases the difference between Observed and expected frequencies may be due to fluctuations of sampling. If the difference of frequencies is due to sampling fluctuations then we do not consider it and those attributes are to be assumed independent. Under such circumstances it becomes necessary to obtain an idea about the extent to which the difference between the observed and expected frequencies can be due to chance fluctuations.

Infect, Chi-square test is used to test whether the difference between the observed and expected frequencies can be attributed by chance or not. The x2 (Chi-square) test was first used by Karl Pearson in the year 1900.

Meaning of Chi-Square

Test The y test (pronounced as Chi-Square test) is one of the simplest and most widely used non-parametric test in statistical work. It is a measure to study the divergence of actual and expected frequencies. If there is no difference between the actual and expected frequencies then the value of Chi-square will be zero. Value of will increase as the difference between actual and expected frequencies increase.

Important Characteristics of x2 Test:

(i) This test is based on frequencies and not on the parameters like mean and standard deviation.

(ii) This test is used for testing the hypothesis and is not useful for estimation.

(iii) This test possesses the additive property as has already been explained.

(iv) This test can also be applied to a complex contingency table with several classes and as such is a very useful test in research work.

(v) This test is an important non-parametric (or a distribution-free) test as no rigid assumptions are necessary in regard to the type of population, no need of the parameter values, and less mathematical details are involved.

Chi Square Test Coefficient 

Procedure of Chi-Square

Test The following procedure is adopted with regard to the Chi-Square test:

(1) Assumption of Null Hypothesis

(2) Calculation of

(3) Testing the Hypothesis –

(1) Assumption of Null Hypothesis : The null hypothesis y indicate that there o a difference between observed and expected frequencies. In other words, y les is based on the assumption of null hypothesis.

(2) Calculation of y?: The following procedure is adopted for calculating the value of y:

(i) Find out the expected frequencies.

(ii) Compute the difference between observed and expected frequencies.

i.e., Co – 1.) or (0-E).

(iii) Square the above calculated difference, i.e., o – ) or (0-E).

(iv) Divide the fo-) by its expected frequency.

(V) Add all the values obtained in step (iv), i.e., find the is the required x’ value.

where foor O = Observed frequencies.

for E = Expected frequencies.

It should be noted that the value of x is always positive and its upper limit is infinite. Also since x? is derived from observations, it is a statistics and not a parameter (there is no parameter corresponding to it). The x test is, therefore, termed non-parametric. It is one of the great advantages of this test that it involves no assumption about the form of the original distributions from which the observations come.

(3) Testing the Hypothesis : After finding out the value of x’, we make the hypothesis testing. Under hypothesis testing, the observed value of x- is compared with the relevant table value of y’ for given degrees of freedom at a certain level of significance (generally a 5% level is taken) to conclude whether the difference between actual and observed frequencies is due to the sampling fluctuations and as such insignificant or whether the difference is due to some other reason and as such significant. If the calculated value of x-exceeds the table value, the difference between the observed and expected frequencies is taken as significant which means that in such a case the null hypothesis is rejected. Contrary to it, if the computed value of y is less than the table value then the difference between the observed and expected frequencies is considered as insignificant, i.e., considered to have arisen as a result of chance and as such can be ignored. In such a case the null hypothesis is accepted. In short: 1 If calculated x > Table value of the difference is significant or Null Hypothesis (Ho) fails.

If calculated x’ s Table value of the difference is not significant or Null Hypothesis is correct.

To see the x table we should understand the concept of degree of freedom and level of significance.

Chi Square Test Coefficient 

Level of Significance: The maximum probability of making type I error specified in a test of hypothesis is called the level of significance. The commonly used levels of significance are 5% (0.05) and 1% (0.01). If we adopt 5% level of significance. this implies that we can have 95% confidence in accepting a hypothesis or we could be wrong 5%. Level of significance desired is always fixed in advance before applying the test.

Degrees of Freedom : Degrees of freedom is also required to see the x- table. The degree of freedom, abbreviated as d… denotes the extent of independence (freedom) enjoyed by a given set of observed frequencies. In other words, the term degree of freedom refers to the number of independent constraints in a set of data, i.e., the number of classes to which the value can be assigned arbitrarily or at will without violating the restrictions. For example, we have six numbers as listed below having a given total :

  1. 13, 30. 9, 7, 4 Total = 90

It will be apparent that any five of the numbers could be changed simultaneously, but to achieve the total of 90, the remaining number would be given. Thus our choice of freedom is reduced by one, on the condition that the total be 90. Therefore, the restriction placed on the freedom is one and the degree of freedom is five. In such cases, the degrees of freedom are equal to n – 1 where n is the number of frequencies (or values in case of a series of independent observations).

In a contingency table the degrees of freedom are calculated in a slightly different manner. The marginal total or frequencies place the limit on our choice of selecting cell frequencies. The cell frequencies of all columns less one (c – 1) and of all rows less one (r-1) can be assigned arbitrarily and so the number of degrees of freedom for all cell frequencies is (c − 1)(r – 1) where c refers to columns and r refers to rows. Thus in a 2 x 2 table, the degrees of freedom would be 2 – 1) (2 – 1) = 1 and in a 3×3 table, the degrees of freedom would be (3 – 1)(3-1) = 4.

Suppose there is a 2 x 2 association table and the actual frequencies of the various classes are as follows:

Suppose that we presume that the two attributes A and B are independent then the expected frequency of the class (AB) would be

h 30 x 40 _ e 80 = 15. Now once

we decide the expected frequency of the class (AB), the expected frequencies of the remaining three classes are automatically fixed. Thus for the class (aB) expected frequency must be 40 – 15 = 25 and similarly for the class (AB) the frequency must be 30 – 15 = 15 and for (ap) it must be 50 – 25 = 25. It means that so far as this table is concerned we have only one choice of our own and in the remaining three classes we have no freedom to fill the frequencies as we like. It means that we have only one degree of freedom so far as this table is concerned. There is one independent constraint here and three constraints are dependent. In short:

(1) For a given set of observed frequencies (say m).

(2) For a contingency table with r number of rows and c number of columns, the degrees of freedom are :

d.f. = (r – 1)(0 – 1)

Alternative Formula for Finding the value of Chi-Square in a (2 x 2) Table: There is an alternative method of calculating the value of x’ in the case of a (2 x 2) table. If we write the cell frequencies and marginal totals in case of a (2 x 2) table as follows:

where N means the total frequency, ad means the larger cross product, bc means the smaller cross product and (a + c)(b + d), (a + b) and (c + d) are the marginal totals. The alternative formula is rarely used in finding out the value of Chi-Square as it is not applicable uniformly in all cases but can be used only in a (2 x 2) contingency table.

Chi Square Test Coefficient 

Conditions for the Validity of Chi-Square

Test The Chi-Square test can be used only if the following conditions are satisfied:

(1) The observations recorded and used are collected on a random basis.

(2) The sample observation should be independent, i.e., no individual item should be included twice or more in the sample.

(3) The total number of observations should be reasonably large, say, more than 50.

(4) The given distribution should not be replaced by relative frequencies or proportions but the data should be given in original units.

(5) The constraints must be linear. Constraints which involve linear equations in the cell frequencies of a contingency table (i.e., equations containing no squares or higher powers of the frequencies) are known as linear constraints such as 2O = 2E = N.

(6) No theoretical frequency should be small. Small is a relative term. Preferably each theoretical frequency should be larger than 10 but in any case not less than 5. If any theoretical frequency is less than 5 then we cannot apply x- test as such. In that case we use the technique of pooling which consists in adding the frequencies which are less than 5 with the preceding or succeeding frequency (frequencies) so that the resulting sum is greater than 5 and adjust for the degrees of freedom accordingly.

Area of Application of Chi-Square Test

or

Uses of x-test

Chi-Square test has a number of applications, some of which are enumerated as : (1) As a test of independence of attributes, (2) As a test of goodness of fit. (3) As a test of homogeneity.

(1) Chi-Square Test for Independence of Attributes: Y test enables to explain whether or not two attributes are associated. For instance, we may be interest knowing whether a new medicine is effective in controlling fever or not and X” test will help us in deciding this issue. In such a situation we proceed on the Null-hypothesis that the two attributes (viz., new medicine and control of fever) are independent which means that new medicine is not effective in controlling fever. On this basis we first calculate the expected frequencies and then work out the value of x. If the calculated value of x is less than its table value at a certain level of significance for a given degree of freedom, then we conclude that our hypothesis stands which means the two attributes are independent or not associated (i.e., new medicine is not effective in controlling the fever). But if the calculated value of xis greater than its table value, then our inference would be that hypothesis does not hold good which means the two attributes are associated and the association is not because of some chance factor but it exists in reality (i.e., new medicine is effective in controlling the fever and as such may be prescribed). It may, however, be stated here that y’ is not a measure of the degree of relationship or the form of relationship between two attributes but it simply is a technique of judging the significance of such association or relationship between two attributes.

(2) yes a Test of Goodness of Fit : Karl Pearson in 1900 developed the x test to test the goodness of fit and is used to test the deviation between observed and theoretical values which can be attributed to chance (fluctuations of sampling) or are due to some inadequacy of the theory to fit the observed data.

If the calculated value of xis less than the table value of x- at an appropriate degree of freedom and at a certain level of significance, the fit is considered to be good and the null hypothesis is accepted. On the other hand, if the calculated value of x is more than the table value of x- the null hypothesis is rejected and the conclusion is drawn that the two distributions do not exhibit a good fit.

The term of ‘goodness of fit’ is also used for comparison of observed sample distribution with expected probability distribution such as the Binomial, Poisson, Normal etc.

(3) yes a Test of Homogeneity: Test of Homogeneity is framed to see whether the two or more samples are drawn from the same population or different populations having the identical features or different features. For example, we want to analyse the study pattern at all levels in a university. It is a cross checking of the data to draw the inference. Infact, the x-test of homogeneity is an extension of the Chi-Square test of independence.

It should be noted that in both the types of tests, i.e., test of independence and homogeneity, we are concerned with cross-classified data. The same testing Statistic used for tests of independence is used for tests of homogeneity. These two types of tests are, however, different in a number of ways, they are associated with different kinds of problems. Tests of independence are concerned with the problem of whether one attribute is independent from another, while tests of homogeneity are concerned with whether different samples come from the same population. Secondly, the former involves a single sample taken from one population; but the latter involves two or more independent samples one from each of the possible populations in question.

EXAMINATION QUESTIONS

Long Answer Questions

1 What is y test ? Under what conditions it is applicable?

2. Write a short note on x2 test and Degree of Freedom.

3. What is Chi-square test? How does it help in finding out the significance of difference between theory and observations.

4. What is Chi-square test of goodness of fit ? Discuss the uses and limitations of Chi-square test.

Short Answer Type Questions

1 What is Chi-square test?

2. Write a short note on null hypothesis.

3. Discuss the uses of Chi-square test.

4. Under what conditions x test is applicable ?

5. Explain the Yate’s Correction. 6. Explain the additive property of x.

Chi Square Test Coefficient 

Objective Type Questions are ‘true’ or ‘false’ :

1 The x- test was first used by Karl Pearson in the year 1900.

(True)

2. The value of x- can be positive as well as negative.

(False)

3. If there is no difference between observed and expected frequencies, then the value of x- will be zero.

(True)

4. If the calculated value of x’ is more than the table value, then null hypothesis will be correct.

(False)

5. Yate’s correction is done only in 2 x 2 table.

(True)

6. The Yate’s Correction tends to under compensate.

(False)

7. Chi-square test is a non-parametric test.

(True)

8. If the calculated value of y’ is greater than the table value, the fit is considered to be poor.

(True)

Fill in the blanks :

1 For the analytical study of the difference between observed and expected frequencies ………….. is used.

2. If there is no difference between observed and expected frequencies, then the value of x will be

3. If calculated value of x is less or equal to table value then, null hypothesis (H ) is

4. The main condition of xtest is that no cell frequency should be less than

5. When xvalues are to be added …………… should not be applied.

6. The Chi-square test should not be used if N is less than ………

7. Chi-square test is popularly known as a test or..

Ans. (1) Chi-square, (2) Zero, (3) Accepted, (4) 5. (5) Yate’s Correction, (6) 50, (7) Goodness of fit.

Chi Square Test Coefficient 

Select the correct option:

1 The calculated value of y2 is:

(a) Always Positive

(b) Always Negative

(c) Can be either positive or negative

(d) None of these

2. Yates correction are generally made when degrees of freedom is :

(a) 5

(b) Greater than 5

(c) 4

(d) 1

3. In a contingency table, degrees of freedom are determined by :

(a) (r – 1) (0 – 1)

(b) (r – 1) (c + 1)

(c) (c – 1) ()

(d) (r + 1) (c + 1)

4. The number of degrees of freedom in a 3 x 3 contingency table is :

(a) 8

(b) 4

(C) 3

(d) 1

 

 

Chi-Square Test Coefficient 

 

chetansati

Admin

https://gurujionlinestudy.com

Leave a Reply

Your email address will not be published.

Previous Story

MCom I Semester Statistical Analysis Association Attributes Study Material Notes

Next Story

MCom I Semester Statistical Analysis Probability Study Material notes

Latest from MCom I Semester Statistical Analysis