Assumptions of Chi Square Test

The chi square test is a statistical test. It is used to find whether the observed frequencies (O) are significantly different from the expected frequencies (E). The chi square analysis is used when there is a need to examine the similarities between two or more populations or variables on some characteristics of interest. Other statistical test does pair wise comparison, but the chi square can handle more than one variable or population at the same time.

 

Formula

      The formula for calculating chi square test statistics,

             `frX^2= (sum(O-E)^2)/E`

Here,

O – Observed Frequency

E – Excepted Frequency

`frX^2`  – chi square

 

In this article we shall discuss about assumption of chi square test with suitable example problems.

Assumption of Chi Square Test -example 1:

 

Find the value of the chi square using the following data

Observed Frequency

Excepted Frequency

20

10

30

20

40

30

50

40

 

Solution:

      The formula for calculating chi square test statistics,

                         `frX^2= (sum(O-E)^2)/E`

 

Here,

O – Observed Frequency

E – Excepted Frequency

`frX^2`  – chi square

 

Observed Frequency

Excepted Frequency

 

    O-E

 

     (O-E)2

 

     (O-E)2/ E

20

10

10

100

10

30

20

10

100

5

40

30

10

100

3.33

50

40

10

100

2.5

 

 

 

Total

20.83

 

                    `frX^2` =10+5+3.33+2.5

                       =20.83

The chi-square value is 20.83

 

Assumption of Chi Square Test -example 2:

 

A School has 1390 students; they were classified by gender (girls and boys) and by groups (G-1, G-2, and G-3). Results are shown in the following table.

 

                      Groups

 

Row total

G-1

G-2

G-3

Girls

200

210

250

660

Boys

250

260

220

730

Column total

450

470

470

1390

 

Is there a gender gap? Do the girls groups differ significantly from the boys group?

Solution:

Degree of freedom=(r-1)*(c-1)=(2-1)*(3-1)=2

r,c=(nr*nc)/n

Here ,

r-no. of rows

c-no. of columns

nr-total value of row

nc-total value of column

n-total value of the row total (or) total value of the column total

E 1, 1= `(660 xx 450)/1390` = 213.7

E 1, 2=` (660 xx 770)/1390`  = 223.2

E 1, 3= `(660 xx 770)/1390` =223.2

E 2, 1= `(730 xx 450)/1390` =236.3

E 2, 2=` (730 xx 470)/1390` =246.8

E 2, 3=  `(730 xx 470)/1390`  =246.8

           `frX^2=sum [(Or,c-Er,c)^2/E_(r,c) ]`

               = `((200-213.7)^2)/213.7 ` + `((210-223.2)^2)/223.2` + `((250-223.2)^2)/223.2`

                  + `((250-236.3)^2)/236.3 ` + `((260-246.8)^2) /246.8` + `((220-246.8)^2)/246.8`

               = 0.8783+ 0.7806+ 3.2179+ 0.7943+ 0.7060+ 2.9102

               = 9.2873

     The P-value is the probability that a chi square having 2 degrees of freedom is more extreme than 9.2873

We use the chi square distribution Calculator to find P (`frX^2` < CV) = 0.99

                                                                                    P(`frX^2`  > CV) =1-0.99

                                                                                                        =0.01