[R] Poisson Distribution - problem with Chi Square Goodness of Fit test

saggak Fri, 29 Aug 2008 02:14:31 -0700



Chi Square Test for Goodness of Fit

Â 

I have got a discrete data
as given below (R script)

Â 

No_of_Frauds<-c(1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,2,1,1,2,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,1,2,1,1,1,1,1,1,1,3,2,1,1,1,2,1,1,2,1,1,1,1,1,2,1,3,1,2,1,2,14,2,1,1,38,3,3,2,44,1,4,1,4,1,2,2,1,3)

Â 

I am trying to fit Poisson
distribution to this data using R.

Â 

My R script is as under :

Â 

________________________________________________________

Â 

# R SCRIPT for Fitting
Poisson Distribution

Â 

No_of_Frauds<-c(1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,2,1,1,2,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,1,2,1,1,1,1,1,1,1,3,2,1,1,1,2,1,1,2,1,1,1,1,1,2,1,3,1,2,1,2,14,2,1,1,38,3,3,2,44,1,4,1,4,1,2,2,1,3)

Â 

N Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â Â Â Â Â Â Â Â Â  length(No_of_Frauds)

Â 

AverageÂ Â Â Â  <-Â Â Â Â Â Â Â Â Â Â Â Â  mean(No_of_Frauds)

Â 

LambdaÂ Â Â Â  <-Â Â Â Â Â Â Â Â Â Â Â Â  Average

Â 

iÂ Â Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â Â Â Â Â Â Â Â Â  c(0:(N-1))

Â 

pmfÂ Â Â  Â Â Â Â Â Â  <-Â Â Â Â Â Â Â Â Â Â Â Â  dpois(i, Lambda, log = FALSE)

Â 

#
----------------------------------------------------------------------------

Â 

# Ho: The data follow Poisson
Distribution Vs H1: Not Ho

Â 

# observed frequencies (Oi)

Â 

variable.cnts
Â Â Â Â Â  <- Â Â Â  table(No_of_Frauds)

variable.cnts.prs<-Â Â Â Â  dpois(as.numeric(names(variable.cnts)),
lambda)

variable.cnts
Â Â Â Â Â  <-Â Â Â Â  c(variable.cnts, 0)

Â 

variable.cnts.prs <-Â Â Â Â  c(variable.cnts.prs,
1-sum(variable.cnts.prs))

tst
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â  chisq.test(variable.cnts,
p=variable.cnts.prs)

Â 

chi_squared
Â Â Â Â Â Â  <-Â Â Â Â  as.numeric(unclass(tst)$statistic)

p_valueÂ  Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â  as.numeric(unclass(tst)$p.value)

df
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â  tst[2]$parameter

Â 

Â 

cv1Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â  qchisq(p=.01, 
df=tst[2]$parameter, lower.tail = FALSE, log.p =
FALSE)

Â 

cv2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â  qchisq(p=.05, 
df=tst[2]$parameter, lower.tail = FALSE, log.p =
FALSE)

Â 

cv3Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  <-Â Â Â Â  qchisq(p=.1, 
df=tst[2]$parameter, lower.tail = FALSE, log.p =
FALSE)

Â 

#-----------------------------------------------------------------------------

Â 

# Expected value

Â 

# variable.cnts.prs *
sum(variable.cnts) 

Â 

Â 

#
if tst > cv reject Ho at alpha confidence level

Â 

#-----------------------------------------------------------------------------

Â 

if(chi_squared > cv1)

Â 

Conclusion1 <- 'Sample
does not come from the postulated probability distribution at 1% los' else

Conclusion1 <- 'Sample
comes from postulated prob. distribution at 1% los'

Â 

Â 

if(chi_squared > cv2)

Â 

Conclusion2 <- 'Sample
does not come from the postulated probability distribution at 5% los' else

Conclusion2 <- 'Sample
comes from postulated prob. distribution at 1% los'

Â 

if(chi_squared > cv3)

Conclusion3 <- 'Sample
does not come from the postulated probability distribution at 10% los' else

Conclusion3 <- 'Sample
come from postulated prob distribution at 1% los'

Â 

#-----------------------------------------------------------------------------

Â 

# Printing RESULTS 

Â 

print(chi_squared)

Â 

print(p_value)

Â 

print(df)

Â 

print(cv1)

Â 

print(cv2)

Â 

print(cv3)

Â 

print(Conclusion1)

Â 

print(Conclusion2)

Â 

print(Conclusion3)

Â 

Â 

##### End of R Script
########

Â 

________________________________________________________

Â 

Problem Faced :

Â 

When I run this script using
R â console,

Â 

I am getting value of Chi â Square Statistics as
high as â6.95753e+37â

Â 

When I did the same calculations in Excel, I got
the Chi Square Statistics value = 138.34.


Â 

Although it is clear that the sample data doesnât
follow Poisson distribution, and I will have to look for other discrete
distribution, my problem is the HIGH Value of Chi Square test statistics. When
I analyzed further, I understood the problem. 

Â 

(A) By convention, if your Expected
frequency is less than 5, then by we put together such classes and form a new
class such that Expected frequency is greater than 5 and also accordingly
adjust the observed frequencies.

Â 





  
  X
  
  
  Oi
  
  
  Ei
  
  
  ((Oi - Ei)^2)/Ei
  


  
  0
  
  
  0
  
  
  10
  
  
  9.96
  


  
  1
  
  
  72
  
  
  23
  
  
  103.79
  


  
  2
  
  
  17
  
  
  27
  
  
  3.54
  


  
  3
  
  
  5
  
  
  21
  
  
  11.85
  


  
  4
  
  
  3
  
  
  12
  
  
  6.71
  


  
  5
  
  
  4
  
  
  9
  
  
  2.51
  


  
  Total
  
  
  101
  
  
  101
  
  
  138.34
  





Â 

Â 

When I apply this logic in Excel, I am getting the
reasonable result (i.e. 138.34), however in Excel also, if I donât apply this
logic, my Chi square test statistic value is as high as 4.70043E+37.

Â 

My
question is how do I modify my R â script, so that the logic mentioned in (A)
i.e. adjusting the Expected frequencies (and accordingly Observed frequencies) 
is
applied so that the expected frequency becomes greater than 5 for a given
class, thereby resulting in reasonable value of Chi Square test Statistics.

Â 

I am also attaching the xls file for ready
reference.

Â 

I sincerely apologize for taking liberty of writing
such a long mail and since I am very new to this âR languageâ can someone 
help
me out.

Â 

Thanking in advance for your kind co-operation.

Â 

Ashok (Mumbai,
 India)

Â 

Â 

Â 

Â 

Â 

Â 





o.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Poisson Distribution - problem with Chi Square Goodness of Fit test

Reply via email to