Chi Square Test for Goodness of Fit  I have got a discrete data as given below (R script)  No_of_Frauds<-c(1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,2,1,1,2,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,1,2,1,1,1,1,1,1,1,3,2,1,1,1,2,1,1,2,1,1,1,1,1,2,1,3,1,2,1,2,14,2,1,1,38,3,3,2,44,1,4,1,4,1,2,2,1,3)  I am trying to fit Poisson distribution to this data using R.  My R script is as under :  ________________________________________________________  # R SCRIPT for Fitting Poisson Distribution  No_of_Frauds<-c(1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,2,1,1,2,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,5,1,2,1,1,1,1,1,1,1,3,2,1,1,1,2,1,1,2,1,1,1,1,1,2,1,3,1,2,1,2,14,2,1,1,38,3,3,2,44,1,4,1,4,1,2,2,1,3)  N             <-            length(No_of_Frauds)  Average    <-            mean(No_of_Frauds)  Lambda    <-            Average  i              <-            c(0:(N-1))  pmf         <-            dpois(i, Lambda, log = FALSE)  # ----------------------------------------------------------------------------  # Ho: The data follow Poisson Distribution Vs H1: Not Ho  # observed frequencies (Oi)  variable.cnts      <-    table(No_of_Frauds) variable.cnts.prs<-    dpois(as.numeric(names(variable.cnts)), lambda) variable.cnts      <-    c(variable.cnts, 0)  variable.cnts.prs <-    c(variable.cnts.prs, 1-sum(variable.cnts.prs)) tst                   <-    chisq.test(variable.cnts, p=variable.cnts.prs)  chi_squared       <-    as.numeric(unclass(tst)$statistic) p_value           <-    as.numeric(unclass(tst)$p.value) df                    <-    tst[2]$parameter   cv1                   <-    qchisq(p=.01, df=tst[2]$parameter, lower.tail = FALSE, log.p = FALSE)  cv2                   <-    qchisq(p=.05, df=tst[2]$parameter, lower.tail = FALSE, log.p = FALSE)  cv3                   <-    qchisq(p=.1, df=tst[2]$parameter, lower.tail = FALSE, log.p = FALSE)  #-----------------------------------------------------------------------------  # Expected value  # variable.cnts.prs * sum(variable.cnts)   # if tst > cv reject Ho at alpha confidence level  #-----------------------------------------------------------------------------  if(chi_squared > cv1)  Conclusion1 <- 'Sample does not come from the postulated probability distribution at 1% los' else Conclusion1 <- 'Sample comes from postulated prob. distribution at 1% los'   if(chi_squared > cv2)  Conclusion2 <- 'Sample does not come from the postulated probability distribution at 5% los' else Conclusion2 <- 'Sample comes from postulated prob. distribution at 1% los'  if(chi_squared > cv3) Conclusion3 <- 'Sample does not come from the postulated probability distribution at 10% los' else Conclusion3 <- 'Sample come from postulated prob distribution at 1% los'  #-----------------------------------------------------------------------------  # Printing RESULTS  print(chi_squared)  print(p_value)  print(df)  print(cv1)  print(cv2)  print(cv3)  print(Conclusion1)  print(Conclusion2)  print(Conclusion3)   ##### End of R Script ########  ________________________________________________________  Problem Faced :  When I run this script using R â console,  I am getting value of Chi â Square Statistics as high as â6.95753e+37â  When I did the same calculations in Excel, I got the Chi Square Statistics value = 138.34.  Although it is clear that the sample data doesnât follow Poisson distribution, and I will have to look for other discrete distribution, my problem is the HIGH Value of Chi Square test statistics. When I analyzed further, I understood the problem.  (A) By convention, if your Expected frequency is less than 5, then by we put together such classes and form a new class such that Expected frequency is greater than 5 and also accordingly adjust the observed frequencies.  X Oi Ei ((Oi - Ei)^2)/Ei 0 0 10 9.96 1 72 23 103.79 2 17 27 3.54 3 5 21 11.85 4 3 12 6.71 5 4 9 2.51 Total 101 101 138.34   When I apply this logic in Excel, I am getting the reasonable result (i.e. 138.34), however in Excel also, if I donât apply this logic, my Chi square test statistic value is as high as 4.70043E+37.  My question is how do I modify my R â script, so that the logic mentioned in (A) i.e. adjusting the Expected frequencies (and accordingly Observed frequencies) is applied so that the expected frequency becomes greater than 5 for a given class, thereby resulting in reasonable value of Chi Square test Statistics.  I am also attaching the xls file for ready reference.  I sincerely apologize for taking liberty of writing such a long mail and since I am very new to this âR languageâ can someone help me out.  Thanking in advance for your kind co-operation.  Ashok (Mumbai, India)       o.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/ [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.