from:"anthony"

[R] Rserve/RandomForest does not work with a CSV?

2009-01-10 Thread anthony

Hi all,

We're using Rserve and RandomForest to do classification from within a
Java program.  The total is about 4 lines of R code:

library('randomForest')
x
y
future
fit<-randomForest(x,y,no.action=na.roughfix,importance=T,proximity=T)
p<-predict(fit, future)

What is very frustrating is that we have tried this two different ways
(both work in R):

1.  Load x, y, and future from a CSV.  If I do this, Rserve throws an
error when randomForest() is called.

2.  Load x, y, and future by using arrays, and manually building them.  If
I do this, randomForest() works fine.

Either way can be done inside of R, and they work great.

Rserve is running as root, and our Java application is running inside of
Tomcat, and is also running as root.

The actual code looks something like:

RConnection conn = new RConnection("127.0.0.1");
conn.voidEval("library('randomForest')");
conn.voidEval("train<-read.csv(\"" + (outfile.getAbsolutePath()) +
"\",header=FALSE)");
conn.voidEval("x<-train[1:" + totalTrainData + ",1:11]");
conn.voidEval("y<-as.factor(train[1:" + totalTrainData + ",12])");
conn.voidEval("future<-train[" + (totalTrainData + 1) + ":" +
(totalTrainData + totalPredictions) + ",1:11]");
conn.voidEval("fit<-randomForest(x,y,no.action=na.roughfix,importance=T,proximity=T)");
conn.voidEval("p<-predict(fit, future)");
conn.voidEval("write.csv(p, file=\"" + (filename.getAbsolutePath()) + "\")");

Every time we use this, it errors on the randomForest() call.

(If I run this in R, it works perfectly fine).

Any ideas why I cannot call randomForest() this way, but if instead, the x
/ y / future values are built using the array command, it works fine?

As a secondary question, is it faster/slower to do it this way?  Certainly
is pretty convenient to use the CSV's.

This one is driving us bonkers!


--
Anthony

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svyglm

2017-12-09 Thread Anthony Damico

hi, could you create a reproducible example starting from
http://asdfree.com/pesquisa-nacional-de-saude-pns.html  ?  thanks

On Mon, Dec 4, 2017 at 9:56 AM, Luciane Maria Pilotto via R-help <
r-help@r-project.org> wrote:

> Hi,
> I am trying to run analyzes incorporating sample weight, strata and
> cluster (three-stage sample) with PNS data (national health survey) and is
> giving error. I describe below the commands used. I could not make the code
> reproducible properly.
> Thanks,
> #
> library(survey)change to 0 and 1 variable outcomedent2<-ifelse(
> consdentcat2==2,0,1)table(dent2) 
> dent2<-as.factor(dent2)str(dent2)reg<-cbind(reg,
> dent2)
> #tchange to factor str(sexo)reg$sexo <- as.factor((reg$sexo))
> #
> pns2013design<-svydesign(id=~upa, nest=TRUE, strata = estrato, weight =
> peso, data = reg)
>
> PNS<-svyglm(dent2~sexo,design=pns2013design, method="logistic", data =
> reg)
>
> Error in logistic(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> :   unused arguments (x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1, 1, 1,
> 
> 
>
> ###NOT running SVYGLM command#data for R-helpbteste <- reg[1:10, c(1, 4,
> 26, 27, 28)]#dput(bteste)###omitting most of the lines "9967.223281",
> "9967.870849", "9968.207979", "997.0045537", "997.1451224",
> "997.1574275", "997.1782368", "997.1812338", "997.2480231",
> "997.2531051", "997.3803145", "997.4345619", "997.4363662",
> "997.5344599", "997.5964572", "997.7439813", "997.8360773",
> "997.8770935", "997.8811374", "997.9147096", "997.9563562",
> "9974.30392", "9974.344482", "9975.656981", "9977.382263",
> "9979.999691", "998.0053953", "998.1069038", "998.2140192",
> "998.2655421", "998.308316", "998.3090242", "998.3579509",
> "998.3656231", "998.3766007", "998.6844831", "998.7030027",
> "998.7112321", "998.8021132", "998.8839799", "998.9225688",
> "998.9270228", "998.9337225", "9983.555066", "9985.353117",
> "9989.517638", "999.0713699", "999.0771916", "999.1021413",
> "999.1779133", "999.2539765", "999.3435971", "999.3809978",
> "999.6348707", "999.7597985", "999.8002746", "999.8819267",
> "999.8821907", "999.8921074", "999.9211427", "9991.102816",
> "9991.440035", "9994.626337", "9994.723654", "9996.637923",
> "9998.491819"), class = "factor")), .Names = c("consdentcat2", "sexo",
> "estrato", "upa", "pesomorcc"), row.names = c(NA, 10L), class =
> "data.frame")##
> bteste1 <-bteste[1:10, ]#bteste1   consdentcat2 sexo estrato upa
>  pesomorcc1 21 1110011 112 418.76819022 2
>   1 1110011 112 317.13175793 21 1110011 112
> 467.09452884 11 1110011 112 209.38409515 2
>   1 1110011 112 209.38409516 21 1110011 112
> 418.76819027 21 1110011 112 233.54726448 2
>   1 1110011 112 628.15228539 21 1110011 112
> 317.13175791022 1110011 112 321.5014524>
>  _
> Luciane Maria Pilotto
>
>
>
> |  | Livre de vírus. www.avast.com.  |
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provi

Re: [R] exporting data to stata

2018-03-22 Thread Anthony Damico

hi, you can export the dataset from an R  survey design to stata with

install.packages('survey')
library(survey)
library(foreign)
write.dta( data1$variables , "c:/path/to/file.dta" )

then type "data1" into the R console to look at the weight, id/cluster,
strata, and fpc variables to use for the "svyset" command




On Thu, Mar 22, 2018, 12:33 PM Ista Zahn  wrote:

> On Thu, Mar 22, 2018 at 4:52 AM, Raja, Dr. Edwin Amalraj
>  wrote:
> > Hi ,
> >
> > library(foreign)
> > write.dta(data1,  "data1.dta")
> >
> > should work.
>
>
> I don't think so:
>
> > library(foreign)
> > example(svydesign)
> > write.dta(dstrat, "~/Downloads/foo.dta")
> Error in write.dta(dstrat, "~/Downloads/foo.dta") :
>   The object "dataframe" must have class data.frame
>
>
> The file will be saved in the working directory.
> > Use
> > getwd()
> > to know the working directory.
> >
> > Best wishes
> > Amalraj Raja
> >
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of rosario
> scandurra
> > Sent: 22 March 2018 07:47
> > To: r-help@r-project.org
> > Subject: [R] exporting data to stata
> >
> > Hi,
> >
> > I am new to R and I want to export data into Stata. Could somebody help
> with that? Thanks a lot.
> >
> > This is the code I am using:
> >
> >
> >> setwd("D:/datasets/Seg-bcn/ESBD")
> >> data1 <- readRDS("r17045_ESDB_Habitatges_BDD_V_1_0.rds")
> >> library(foreign)
> >> write.dta(data="data1", file = "D:/datasets/data1.dta")
> > Error in write.dta(data = "data1", file = "D:/datasets/data1.dta") :
> >   The object "dataframe" must have class data.frame
> >> class (data1)
> > [1] "survey.design2" "survey.design"
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > The University of Aberdeen is a charity registered in Scotland, No
> SC013683.
> > Tha Oilthigh Obar Dheathain na charthannas clàraichte ann an Alba, Àir.
> SC013683.
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] more floating point sensitivity in 3.5.0?

2018-05-18 Thread Anthony Damico

hi all, in the past two days, i've found two places in unrelated code where
i needed to substitute something like

`x == y`

with

`isTRUE(all.equal(x,y))`

to fix problems that started occurring in 3.5.0 on windows.  the release
news[1] makes one mention of floating points, but i'm not sure it's
related.  the fixes aren't hard, so it'll be the first place i look if
other code seems to not work.

just curious if anyone's experiencing similar issues?  thanks!



[1]https://cran.r-project.org/bin/windows/base/NEWS.R-3.5.0.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multinomial Logistic Regression with Complex Survey using 'Survey' Package in R

2018-06-19 Thread Anthony Damico

hi, check out the news page..
https://cran.r-project.org/web/packages/survey/NEWS

On Tue, Jun 19, 2018 at 5:54 PM, Mackenzie Jones 
wrote:

> Dear R Users,
>
> I want to use a multinomial logistic regression model with survey data in
> the “survey” package. The original package did not have a function for
> multinomial logistic regression, so Thomas Lumley suggested creating
> replicate weights for the survey and doing a multinomial regression with
> frequency weights in the mlogit package. See the below message for
> reference:
>
> There isn't an implementation of multinomial regression in the survey
> package.  The easiest way to do this would be to create replicate weights
> for your survey if it doesn't already have them (with
> as.svrepdesign()) and then use withReplicates() to do the regression using
> a function that does multinomial regression with frequency weights, such as
> mlogit() in the mlogit package.  The example on the withReplicates() help
> page shows how to do this for quantile regression, and it should be similar.
>
> However, there has been a more recent release of the “survey” package in
> May 2018, so I am wondering if there is now a function that does
> multinomial logistic regression with the survey. Please let me know if
> anyone knows of this update, or has any additional advice on how to perform
> this function.
>
> Thank you,
> Mackenzie
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ESS issue: lines moved right 40 spaces

2018-07-26 Thread Anthony Hirst

I don't know the answer but here is the info for the ess list.
ess-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help



On Thu, Jul 26, 2018 at 11:26 AM Rich Shepard 
wrote:

>I used to be subscribed to the ess SIG, but cannot find any saved
> messages
> from that list and I cannot find it in the list of mail lists on the
> r-project
> web site. So I'll ask here.
>
>Running ess-5.14 on emacs-25.3 I'm seeing a different behavior when I
> write scripts than I had seen in the past. I would like to learn how to fix
> this issue. I invoke ess using M-x R when I start emacs.
>
>When typing comments and pressing [Enter] at the end of the line to
> start
> a new line, the row just left is inset to column 40 from column 0. Annoying
> behavior, to be sure. This does not happen when I write a bash shell or
> python script using emacs so it seems to be specific to R.
>
>All thoughts, ideas, and suggestions are welcome.
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot load .sav-files in R 3.4.0

2017-05-01 Thread Anthony Damico

hi, i don't think foreign::read.spss or haven::read_spss have ever worked
with a handful of the ess files, but library(memisc) does.  you are better
off loading ess with library(lodown) because the drudge work has already
been done--


library(devtools)
devtools::install_github("ajdamico/lodown")
library(lodown)
ess_cat <- get_catalog( "ess" , output_dir = "C:/My Directory/ESS" )

# which entries do you want?
head(ess_cat)

# how about wave 7 only
sub_ess_cat <- subset( ess_cat , wave == 7 )

# replace the email address with whatever you registered with
lodown( "ess" , sub_ess_cat , your_email = "em...@address.com" )


x <- readRDS( "C:/My Directory/ESS/2014/ESS7csCH.rds" )

# looks good
head( x )



On Mon, May 1, 2017 at 6:22 AM,  wrote:

> after updating R from 3.3.3. to 3.4.0 i cannot import spss-data files
> anymore. for the european social survey (europeansocialsurvey.org) i get
> this warning:
> re-encoding from CP1252
> Fehler in levels<-(*tmp*, value = if (nl == nL) as.character(labels) else
> paste0(labels, :
> factor level [3] is duplicated
> Zusätzlich: Warnmeldung:
> In read.spss(file, use.value.labels = use.value.labels, to.data.frame =
> to.data.frame, :
> //filepath/ESS7CH.sav: Unrecognized record type 7, subtype 18 encountered
> in system file
>
> using the package foreign does the same.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot load .sav-files in R 3.4.0

2017-05-01 Thread Anthony Damico

did my code work?  thanks

On Mon, May 1, 2017 at 11:35 AM,  wrote:

> hi, thanks for the reply!
> it always worked until 3.4.0. i got warning but they did not stop R
> loading the file ...
>
> Am 01.05.2017 16:10 schrieb Anthony Damico:
>
>> hi, i don't think foreign::read.spss or haven::read_spss have ever
>> worked with a handful of the ess files, but library(memisc) does.  you
>> are better off loading ess with library(lodown) because the drudge
>> work has already been done--
>>
>> library(devtools)
>> devtools::install_github("ajdamico/lodown")
>> library(lodown)
>> ess_cat <- get_catalog( "ess" , output_dir = "C:/My Directory/ESS"
>> )
>>
>> # which entries do you want?
>> head(ess_cat)
>>
>> # how about wave 7 only
>> sub_ess_cat <- subset( ess_cat , wave == 7 )
>>
>> # replace the email address with whatever you registered with
>> lodown( "ess" , sub_ess_cat , your_email = "em...@address.com" )
>>
>> x <- readRDS( "C:/My Directory/ESS/2014/ESS7csCH.rds" )
>>
>> # looks good
>> head( x )
>>
>> On Mon, May 1, 2017 at 6:22 AM, 
>> wrote:
>>
>> after updating R from 3.3.3. to 3.4.0 i cannot import spss-data
>>> files anymore. for the european social survey
>>> (europeansocialsurvey.org [1]) i get this warning:
>>> re-encoding from CP1252
>>> Fehler in levels<-(*tmp*, value = if (nl == nL)
>>> as.character(labels) else paste0(labels, :
>>> factor level [3] is duplicated
>>> Zusätzlich: Warnmeldung:
>>> In read.spss(file, use.value.labels = use.value.labels,
>>> to.data.frame = to.data.frame, :
>>> //filepath/ESS7CH.sav: Unrecognized record type 7, subtype 18
>>> encountered in system file
>>>
>>> using the package foreign does the same.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help [2]
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html [3]
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> Links:
>> --
>> [1] http://europeansocialsurvey.org
>> [2] https://stat.ethz.ch/mailman/listinfo/r-help
>> [3] http://www.R-project.org/posting-guide.html
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot load .sav-files in R 3.4.0

2017-05-02 Thread Anthony Damico

are you able to install anything from github?  like

devtools::install_github( "hadley/dplyr" )



On Tue, May 2, 2017 at 9:36 AM,  wrote:

> unfortunately it failed with the installation of lodown:
>
>> devtools::install_github("ajdamico/lodown")
>>
> Downloading GitHub repo ajdamico/lodown@master
> from URL https://api.github.com/repos/ajdamico/lodown/zipball/master
> Installing lodown
> "C:/PROGRA~1/R/R-33~1.3/bin/x64/R" --no-site-file --no-environ --no-save
> \
>   --no-restore --quiet CMD INSTALL  \
>   "C:/Users/mandersk/AppData/Local/Temp/Rtmpch0aTn/devtools18f
> 4305b4ab5/ajdamico-lodown-d235a3e"  \
>   --library="\\unetna01/mandersk$/Daten/R/win-library/3.3" --install-tests
>
> * installing *source* package 'lodown' ...
> ** R
> ** inst
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded
> *** arch - i386
> Warnung in library(pkg_name, lib.loc = lib, character.only = TRUE,
> logical.return = TRUE)
>   there is no package called 'lodown'
> Fehler: Laden fehlgeschlagen
> Ausführung angehalten
> *** arch - x64
> Warnung in library(pkg_name, lib.loc = lib, character.only = TRUE,
> logical.return = TRUE)
>   there is no package called 'lodown'
> Fehler: Laden fehlgeschlagen
> Ausführung angehalten
> ERROR: loading failed for 'i386', 'x64'
> * removing '\\unetna01/mandersk$/Daten/R/win-library/3.3/lodown'
> Error: Command failed (1)
>
> Am 01.05.2017 18:15 schrieb Anthony Damico:
>
>> did my code work?  thanks
>>
>> On Mon, May 1, 2017 at 11:35 AM, 
>> wrote:
>>
>> hi, thanks for the reply!
>>> it always worked until 3.4.0. i got warning but they did not stop R
>>> loading the file ...
>>>
>>> Am 01.05.2017 16:10 schrieb Anthony Damico:
>>> hi, i don't think foreign::read.spss or haven::read_spss have ever
>>> worked with a handful of the ess files, but library(memisc) does.
>>> you
>>> are better off loading ess with library(lodown) because the drudge
>>> work has already been done--
>>>
>>> library(devtools)
>>> devtools::install_github("ajdamico/lodown")
>>> library(lodown)
>>> ess_cat <- get_catalog( "ess" , output_dir = "C:/My
>>> Directory/ESS"
>>> )
>>>
>>> # which entries do you want?
>>> head(ess_cat)
>>>
>>> # how about wave 7 only
>>> sub_ess_cat <- subset( ess_cat , wave == 7 )
>>>
>>> # replace the email address with whatever you registered with
>>> lodown( "ess" , sub_ess_cat , your_email = "em...@address.com"
>>> )
>>>
>>> x <- readRDS( "C:/My Directory/ESS/2014/ESS7csCH.rds" )
>>>
>>> # looks good
>>> head( x )
>>>
>>> On Mon, May 1, 2017 at 6:22 AM, 
>>> wrote:
>>>
>>> after updating R from 3.3.3. to 3.4.0 i cannot import spss-data
>>> files anymore. for the european social survey
>>> (europeansocialsurvey.org [1] [1]) i get this warning:
>>> re-encoding from CP1252
>>> Fehler in levels<-(*tmp*, value = if (nl == nL)
>>> as.character(labels) else paste0(labels, :
>>> factor level [3] is duplicated
>>> Zusätzlich: Warnmeldung:
>>> In read.spss(file, use.value.labels = use.value.labels,
>>> to.data.frame = to.data.frame, :
>>> //filepath/ESS7CH.sav: Unrecognized record type 7, subtype 18
>>> encountered in system file
>>>
>>> using the package foreign does the same.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help [2] [2]
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html [3] [3]
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> Links:
>>> --
>>> [1] http://europeansocialsurvey.org [1]
>>> [2] https://stat.ethz.ch/mailman/listinfo/r-help [2]
>>> [3] http://www.R-project.org/posting-guide.html [3]
>>>
>>
>>
>>
>> Links:
>> --
>> [1] http://europeansocialsurvey.org
>> [2] https://stat.ethz.ch/mailman/listinfo/r-help
>> [3] http://www.R-project.org/posting-guide.html
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RJDBC

2017-05-02 Thread Nelson Anthony

Hi all,



I am trying to connect to Database  using RJDBC but due to some DB & Server
timezone mismatch I am facing below error message.



Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect",
as.character(url)[1],  :

  java.sql.SQLException: ORA-00604: error occurred at recursive SQL level 1

ORA-01882: timezone region not found



I tried to fix this issue using java by using this parameter
“-Doracle.jdbc.timezoneAsRegion=false”
and it worked



But I am trying to apply the same setting in R using option settings, but
it is not working. Also, I tried updating Rprofile



*My code snippet:*



Sys.setenv(JAVA_HOME='/usr/lib/jvm/java-8-openjdk-amd64')

options(java.parameters = "-Xmx8g")

options(java.oracle.jdbc.timezoneAsRegion="false")

dbcDriver <- JDBC(driverClass="oracle.jdbc.OracleDriver",
classPath="/usr/lib/oracle/12.2/client64/lib/ojdbc8.jar")

pcm_stg_conn <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@//hostname:1521/SID",
"username", "password").



Can you please help in resolving this issue





Thanks & Regards.

Anthony Nelson

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Svyglm Error

2017-07-05 Thread Anthony Damico

hi, i am not hitting an error when i copy and paste your code into a fresh
console.  maybe compare your sessionInfo() to mine?


> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] grid  stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] survey_3.32-1   survival_2.41-3 Matrix_1.2-10   RCurl_1.95-4.8
bitops_1.0-6

loaded via a namespace (and not attached):
[1] compiler_3.4.1  splines_3.4.1   lattice_0.20-35





On Wed, Jul 5, 2017 at 2:24 PM, Courtney Benjamin 
wrote:

> Greetings,
>
> I am revisiting code from several different files I have saved from the
> past and all used to run flawlessly; now when I run any of the svyglm
> related functions, I am coming up with an error:
>
> Error in model.frame.default(formula = F3ATTAINB ~ F1PARED, data = data,  :
>   the ... list does not contain 4 elements
> The following is a minimal reproducible example:
> library(RCurl)
> library(survey)
>
> data <- getURL("https://raw.githubusercontent.com/
> cbenjamin1821/careertech-ed/master/elsq1adj.csv")
> elsq1ch <- read.csv(text = data)
>
> #Specifying the svyrepdesign object which applies the BRR weights
> elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
> "BRR")
> elsq1ch_brr
>
> ##Resetting baseline levels for predictors
> elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg
> or Less") )
> elsq1ch_brr <- update( elsq1ch_brr , BYINCOME = relevel(BYINCOME,"0-25K") )
> elsq1ch_brr <- update( elsq1ch_brr , F1RACE = relevel(F1RACE,"White") )
> elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male") )
> elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC = relevel(F1RTRCC,"Other") )
>
> ##Univariate testing for Other subset
> Othpared <- svyglm(formula=F3ATTAINB~F1PARED,family="quasibinomial"
> ,design=subset(elsq1ch_brr,BYSCTRL==1&G10COHRT==1&
> F1RTRCC=="Other"),na.action=na.omit)
> summary(Othpared)?
>
>
> Any help in resolving this concern would be greatly appreciated.
>
> Sincerely,
>
> Courtney
>
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenj...@btboces.org
>
> 607-763-8633
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with ftable.svyby

2017-07-09 Thread Anthony Damico

try resetting your factor levels and re-run?

q50 <- update( q50 , INCOME = factor( INCOME ) , AGECL = factor( AGECL ) ,
RACECL = factor( RACECL ) )




On Sun, Jul 9, 2017 at 2:59 PM, Orsola Costantini via R-help <
r-help@r-project.org> wrote:

> Hi all,
>
> When I try the following with pkg Survey it returns the error below:
>
> ftable(svyby(~INCOME, ~AGECL+RACECL, svymean, design=q50),
> rownames=list(AGECL=c("<35", "35-44", "45-54", "55-64",
> "65-74", ">=75"),
>RACECL=c("white non hispanic", "non white or hispanic"))
>
> Error in rbind(matrix("", nrow = length(xcv), ncol = length(xrv)),
> charQuote(makeNames(xrv)),  :
>   number of columns of matrices must match (see arg 3)
>
> When I do the follwing instead all is good. But it only works for small
> subsets!
>
> h<-svymean(~interaction(INCOME, AGECL, RACECL), q3)
>
> fh<-ftable(h, rownames=list(AGECL=c("<35", "35-44", "45-54", "55-64",
> "65-74", ">=75"),
>RACECL=c("white non his", "non white or hispanic")))
>
>
> any idea why?
>
> Thanks!!!
>
> U.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] readLines without skipNul=TRUE causes crash

2017-07-15 Thread Anthony Damico

hello, the last line of the code below causes a segfault for me on 3.4.1.
i think i should submit to https://bugs.r-project.org/  unless others have
advice?  thanks





install.packages( "devtools" )
devtools::install_github("ajdamico/lodown")
devtools::install_github("jimhester/archive")


file_folder <- file.path( tempdir() , "file_folder" )

tf <- tempfile()

# large download!  cachaca saves on your local disk if already downloaded
lodown::cachaca( '
http://download.inep.gov.br/microdados/microdados_enem2009.rar' , tf , mode
= 'wb' )

archive::archive_extract( tf , dir = normalizePath( file_folder ) )

unzipped_files <- list.files( file_folder , recursive = TRUE , full.names =
TRUE  )

infile <- grep( "DADOS(.*)\\.txt$" , unzipped_files , value = TRUE )

# works
R.utils::countLines( infile )

# works with warning
my_file <- readLines( infile , skipNul = TRUE )

# crash
my_file <- readLines( infile )


# run just before crash
sessionInfo()
# R version 3.4.1 (2017-06-30)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 15063)

# Matrix products: default

# locale:
# [1] LC_COLLATE=English_United States.1252
# [2] LC_CTYPE=English_United States.1252
# [3] LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C
# [5] LC_TIME=English_United States.1252

# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base

# loaded via a namespace (and not attached):
 # [1] httr_1.2.1 compiler_3.4.1 R6_2.2.1   withr_1.0.2
 # [5] tibble_1.3.3   curl_2.6   Rcpp_0.12.11
memoise_1.1.0
 # [9] R.methodsS3_1.7.1  git2r_0.18.0   digest_0.6.12  lodown_0.1.0
# [13] R.utils_2.5.0  rlang_0.1.1devtools_1.13.2R.oo_1.21.0
# [17] archive_0.0.0.9000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-15 Thread Anthony Damico

hi, thanks Dr. Murdoch


i'd appreciate if anyone on r-help could help me narrow this down?  i
believe the segfault occurs because there's a single line with 4GB and also
embedded nuls, but i am not sure how to artificially construct that?


the lodown package can be removed from my example..  it is just for file
download cacheing, so `lodown::cachaca` can be replaced with
`download.file`  my current example requires a huge download, so sort of
painful to repeat but i'm pretty confident that's not the issue.


the archive::archive_extract() function unzips a (probably corrupt) .RAR
file and creates a text file with 80,937 lines.  this file is 4GB:

> file.size(infile)
[1] 4078192743


i am pretty sure that nearly all of that 4GB is contained on a single line
in the file.  here's what happens when i create a file connection and scan
through..

> file_con <- file( infile , 'r' )
>
> first_80936_lines <- readLines( file_con , n = 80936 )
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "123930632009"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "36F2924009PAULO"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "AFONSO"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "BA11"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "0"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "00"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "2924009PAULO"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "AFONSO"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "BA"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "467.20"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "346.10"
> scan( w , n = 1 , what = character() )
Read 1 item
[1] "414.40"
> scan( w , n = 1 , what = character() )
Error in scan(w, n = 1, what = character()) :
  could not allocate memory (2048 Mb) in C function
'R_AllocStringBuffer'



making a huge single-line file does not reproduce the problem, i think the
embedded nuls have something to do with it--


# WARNING do not run with less than 64GB RAM
tf <- tempfile()
    a <- rep( "a" , 10 )
b <- paste( a , collapse = '' )
writeLines( b , tf ) ; rm( b ) ; gc()
d <- readLines( tf )



On Sat, Jul 15, 2017 at 9:17 AM, Duncan Murdoch 
wrote:

> On 15/07/2017 7:35 AM, Anthony Damico wrote:
>
>> hello, the last line of the code below causes a segfault for me on 3.4.1.
>> i think i should submit to https://bugs.r-project.org/  unless others
>> have
>> advice?  thanks
>>
>
> Segfaults are usually worth reporting as bugs.  Try to come up with a
> self-contained example, not using the lodown and archive packages.  I
> imagine you can do this by uploading the file you downloaded, or enough of
> a subset of it to trigger the segfault.  If you can't do that, then likely
> the bug is with one of those packages, not with R.
>
> Duncan Murdoch
>
>
>>
>>
>>
>>
>> install.packages( "devtools" )
>> devtools::install_github("ajdamico/lodown")
>> devtools::install_github("jimhester/archive")
>>
>>
>> file_folder <- file.path( tempdir() , "file_folder" )
>>
>> tf <- tempfile()
>>
>> # large download!  cachaca saves on your local disk if already downloaded
>> lodown::cachaca( '
>> http://download.inep.gov.br/microdados/microdados_enem2009.rar' , tf ,
>> mode
>> = 'wb' )
>>
>> archive::archive_extract( tf , dir = normalizePath( file_folder ) )
>>
>> unzipped_files <- list.files( file_folder , recursive = TRUE , full.names
>> =
>> TRUE  )
>>
>> infile <- grep( "DADOS(.*)\\.txt$" , unzipped_files , value = TRUE )
>>
>> # works
>> R.utils::countLines( infile )
>>
>> # works with warning
>> my_file <- readLines( infile , skipNul = TRUE )
>>
>> # crash
>> my_file <- readLines( infile )
>>
>>
>> # run just before crash
>> sessionInfo()
>> # R version 3.4.1 (2017-06-30)
>> # Platform: x86_64-w64-mingw32/x64 (64-bit)
>> # Running under: Windows 10 x64 (build 15063)
>>
>> # Matrix products: default
>>
>> # locale:
>> # [1] LC_COLLATE=English_United States.1252
>> # [2] LC_CTYPE=English_United States.1252
>> # [3] LC_MONETARY=English_United

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-15 Thread Anthony Damico

hi, i realized that the segfault happens on the text file in a new R
session.  so, creating the segfault-generating text file requires a
contributed package, but prompting the actual segfault does not -- pretty
sure that means this is a base R bug?  submitted here:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311  hopefully i am
not doing something remarkably stupid.  the text file itself is 4GB so
cannot upload it to bugzilla, and from the R_AllocStringBugger error in the
previous message, i think most or all of it needs to be there to trigger
the segfault.  thanks!


On Sat, Jul 15, 2017 at 10:32 AM, Anthony Damico  wrote:

> hi, thanks Dr. Murdoch
>
>
> i'd appreciate if anyone on r-help could help me narrow this down?  i
> believe the segfault occurs because there's a single line with 4GB and also
> embedded nuls, but i am not sure how to artificially construct that?
>
>
> the lodown package can be removed from my example..  it is just for file
> download cacheing, so `lodown::cachaca` can be replaced with
> `download.file`  my current example requires a huge download, so sort of
> painful to repeat but i'm pretty confident that's not the issue.
>
>
> the archive::archive_extract() function unzips a (probably corrupt) .RAR
> file and creates a text file with 80,937 lines.  this file is 4GB:
>
> > file.size(infile)
> [1] 4078192743 <(407)%20819-2743>
>
>
> i am pretty sure that nearly all of that 4GB is contained on a single line
> in the file.  here's what happens when i create a file connection and scan
> through..
>
> > file_con <- file( infile , 'r' )
> >
> > first_80936_lines <- readLines( file_con , n = 80936 )
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "123930632009"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "36F2924009PAULO"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "AFONSO"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "BA11"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "0"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "00"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "2924009PAULO"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "AFONSO"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "BA"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "467.20"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "346.10"
> > scan( w , n = 1 , what = character() )
> Read 1 item
> [1] "414.40"
> > scan( w , n = 1 , what = character() )
> Error in scan(w, n = 1, what = character()) :
>   could not allocate memory (2048 Mb) in C function
> 'R_AllocStringBuffer'
>
>
>
> making a huge single-line file does not reproduce the problem, i think the
> embedded nuls have something to do with it--
>
>
> # WARNING do not run with less than 64GB RAM
> tf <- tempfile()
> a <- rep( "a" , 10 )
> b <- paste( a , collapse = '' )
> writeLines( b , tf ) ; rm( b ) ; gc()
> d <- readLines( tf )
>
>
>
> On Sat, Jul 15, 2017 at 9:17 AM, Duncan Murdoch 
> wrote:
>
>> On 15/07/2017 7:35 AM, Anthony Damico wrote:
>>
>>> hello, the last line of the code below causes a segfault for me on 3.4.1.
>>> i think i should submit to https://bugs.r-project.org/  unless others
>>> have
>>> advice?  thanks
>>>
>>
>> Segfaults are usually worth reporting as bugs.  Try to come up with a
>> self-contained example, not using the lodown and archive packages.  I
>> imagine you can do this by uploading the file you downloaded, or enough of
>> a subset of it to trigger the segfault.  If you can't do that, then likely
>> the bug is with one of those packages, not with R.
>>
>> Duncan Murdoch
>>
>>
>>>
>>>
>>>
>>>
>>> install.packages( "devtools" )
>>> devtools::install_github("ajdamico/lodown")
>>> devtools::install_github("jimhester/archive")
>>>
>>>
>>> file_folder <- file.path( tempdir() , "file_folder" )
>>>
>>> tf <- tempfile()
>>>
>>>

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-16 Thread Anthony Damico

thank you for taking the time to write this.  i set it running last night
and it's still going -- if it doesn't finish by tomorrow, i will try to
find a site to host the problem file and add that link to the bug report so
the archive package can be avoided at least.  i'm sorry for the bother

On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch 
wrote:

> On 15/07/2017 11:33 AM, Anthony Damico wrote:
>
>> hi, i realized that the segfault happens on the text file in a new R
>> session.  so, creating the segfault-generating text file requires a
>> contributed package, but prompting the actual segfault does not --
>> pretty sure that means this is a base R bug?  submitted here:
>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311  hopefully i
>> am not doing something remarkably stupid.  the text file itself is 4GB
>> so cannot upload it to bugzilla, and from the R_AllocStringBugger error
>> in the previous message, i think most or all of it needs to be there to
>> trigger the segfault.  thanks!
>>
>
> I don't want to download the big file or install the archive package.
> Could you run the code below on the bad file?  If you're right and it's
> only nulls that matter, this might allow me to create a file that triggers
> the bug.
>
> f <-  # put the filename of the bad file here
>
> con <- file(f, open="rb")
> zeros <- numeric()
> repeat {
>   bytes <- readBin(con, "int", 100, size=1)
>   zeros <- c(zeros, count + which(bytes == 0))
>   count <- count + length(bytes)
>   if (length(bytes) < 100) break
> }
> close(con)
> cat("File length=", count, "\n")
> cat("Nulls:\n")
> zeros
>
> Here's some code to recreate a file of the same length with nulls in the
> same places, and spaces everywhere else:
>
> size <- count
> f2 <- tempfile()
> con <- file(f2, open="wb")
> count <- 0
> while (count < size) {
>   nonzeros <- min(c(size - count, 100, zeros - 1))
>   if (nonzeros) {
> writeBin(rep(32L, nonzeros), con, size = 1)
> count <- count + nonzeros
>   }
>   zeros <- zeros - nonzeros
>   if (length(zeros) && min(zeros) == 1) {
> writeBin(0L, con, size = 1)
> count <- count + 1
> zeros <- zeros[-1] - 1
>   }
> }
> close(con)
>
> Duncan Murdoch
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-16 Thread Anthony Damico

hi, thank you for attempting this. it looks like your unix machine unzipped
the txt file without corruption -- if you copied over the same txt file to
windows 7, i don't think that would reproduce the problem?  i think it
needs to be the corrupted text file where   R.utils::countLines( txtfile
)   gives 809367.  i am able to reproduce on two distinct windows machines
but no guarantee i'm not doing something dumb

On Sat, Jul 15, 2017 at 6:29 PM, Jeff Newmiller 
wrote:

> I am not able to reproduce your segfault on a Windows 7 platform either:
>
> ##
> fn1 <- "d:/DADOS_ENEM_2009.txt"
> sessionInfo()
> ## R version 3.4.1 (2017-06-30)
> ## Platform: x86_64-w64-mingw32/x64 (64-bit)
> ## Running under: Windows 7 x64 (build 7601) Service Pack 1
> ##
> ## Matrix products: default
> ##
> ## locale:
> ## [1] LC_COLLATE=English_United States.1252
> ## [2] LC_CTYPE=English_United States.1252
> ## [3] LC_MONETARY=English_United States.1252
> ## [4] LC_NUMERIC=C
> ## [5] LC_TIME=English_United States.1252
> ##
> ## attached base packages:
> ## [1] stats graphics  grDevices utils datasets  methods   base
> ##
> ## loaded via a namespace (and not attached):
> ## [1] compiler_3.4.1
> tools::md5sum( fn1 )
> ## d:/DADOS_ENEM_2009.txt
> ## "83e61c96092285b60d7bf6b0dbc7072e"
> dat <- readLines( fn1 )
> length( dat )
> ## [1] 4148721
>
>
> On Sat, 15 Jul 2017, Jeff Newmiller wrote:
>
> I am not able to reproduce this on a Linux platform:
>>
>> ###3
>> fn1 <- "/home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
>> 2009/DADOS_ENEM_2009.txt"
>> sessionInfo()
>> ## R version 3.4.1 (2017-06-30)
>> ## Platform: x86_64-pc-linux-gnu (64-bit)
>> ## Running under: Ubuntu 14.04.5 LTS
>> ##
>> ## Matrix products: default
>> ## BLAS: /usr/lib/libblas/libblas.so.3.0
>> ## LAPACK: /usr/lib/lapack/liblapack.so.3.0
>> ##
>> ## locale:
>> ##  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>> ##  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>> ##  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>> ##  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>> ##  [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> ##
>> ## attached base packages:
>> ## [1] stats graphics  grDevices utils datasets  methods   base
>> ##
>> ## loaded via a namespace (and not attached):
>> ## [1] compiler_3.4.1
>> tools::md5sum( fn1 )
>> ## /home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
>> 2009/DADOS_ENEM_2009.txt
>> ##
>> "83e61c96092285b60d7bf6b0dbc7072e"
>> dat <- readLines( fn1 )
>> length( dat )
>> ## [1] 4148721
>>
>> No segfault occurs.
>>
>> On Sat, 15 Jul 2017, Anthony Damico wrote:
>>
>> hi, i realized that the segfault happens on the text file in a new R
>>> session.  so, creating the segfault-generating text file requires a
>>> contributed package, but prompting the actual segfault does not -- pretty
>>> sure that means this is a base R bug?  submitted here:
>>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311  hopefully i
>>> am
>>> not doing something remarkably stupid.  the text file itself is 4GB so
>>> cannot upload it to bugzilla, and from the R_AllocStringBugger error in
>>> the
>>> previous message, i think most or all of it needs to be there to trigger
>>> the segfault.  thanks!
>>>
>>>
>>> On Sat, Jul 15, 2017 at 10:32 AM, Anthony Damico 
>>> wrote:
>>>
>>> hi, thanks Dr. Murdoch
>>>>
>>>>
>>>> i'd appreciate if anyone on r-help could help me narrow this down?  i
>>>> believe the segfault occurs because there's a single line with 4GB and
>>>> also
>>>> embedded nuls, but i am not sure how to artificially construct that?
>>>>
>>>>
>>>> the lodown package can be removed from my example..  it is just for file
>>>> download cacheing, so `lodown::cachaca` can be replaced with
>>>> `download.file`  my current example requires a huge download, so sort of
>>>> painful to repeat but i'm pretty confident that's not the issue.
>>>>
>>>>
>>>> the archive::archive_extract() function unzips a (probably corrupt) .RAR
>>>> file and creates a text file with 80,937 lines.  this file is 4GB:
>>>>
>>>>> file.size(infile)
>

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-16 Thread Anthony Damico

sorry, typo, 80937 not 809367

On Sun, Jul 16, 2017 at 6:21 AM, Anthony Damico  wrote:

> hi, thank you for attempting this. it looks like your unix machine
> unzipped the txt file without corruption -- if you copied over the same txt
> file to windows 7, i don't think that would reproduce the problem?  i think
> it needs to be the corrupted text file where   R.utils::countLines( txtfile
> )   gives 809367.  i am able to reproduce on two distinct windows machines
> but no guarantee i'm not doing something dumb
>
> On Sat, Jul 15, 2017 at 6:29 PM, Jeff Newmiller 
> wrote:
>
>> I am not able to reproduce your segfault on a Windows 7 platform either:
>>
>> ##
>> fn1 <- "d:/DADOS_ENEM_2009.txt"
>> sessionInfo()
>> ## R version 3.4.1 (2017-06-30)
>> ## Platform: x86_64-w64-mingw32/x64 (64-bit)
>> ## Running under: Windows 7 x64 (build 7601) Service Pack 1
>> ##
>> ## Matrix products: default
>> ##
>> ## locale:
>> ## [1] LC_COLLATE=English_United States.1252
>> ## [2] LC_CTYPE=English_United States.1252
>> ## [3] LC_MONETARY=English_United States.1252
>> ## [4] LC_NUMERIC=C
>> ## [5] LC_TIME=English_United States.1252
>> ##
>> ## attached base packages:
>> ## [1] stats graphics  grDevices utils datasets  methods   base
>> ##
>> ## loaded via a namespace (and not attached):
>> ## [1] compiler_3.4.1
>> tools::md5sum( fn1 )
>> ## d:/DADOS_ENEM_2009.txt
>> ## "83e61c96092285b60d7bf6b0dbc7072e"
>> dat <- readLines( fn1 )
>> length( dat )
>> ## [1] 4148721
>>
>>
>> On Sat, 15 Jul 2017, Jeff Newmiller wrote:
>>
>> I am not able to reproduce this on a Linux platform:
>>>
>>> ###3
>>> fn1 <- "/home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
>>> 2009/DADOS_ENEM_2009.txt"
>>> sessionInfo()
>>> ## R version 3.4.1 (2017-06-30)
>>> ## Platform: x86_64-pc-linux-gnu (64-bit)
>>> ## Running under: Ubuntu 14.04.5 LTS
>>> ##
>>> ## Matrix products: default
>>> ## BLAS: /usr/lib/libblas/libblas.so.3.0
>>> ## LAPACK: /usr/lib/lapack/liblapack.so.3.0
>>> ##
>>> ## locale:
>>> ##  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>> ##  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>> ##  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>> ##  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>> ##  [9] LC_ADDRESS=C   LC_TELEPHONE=C
>>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>> ##
>>> ## attached base packages:
>>> ## [1] stats graphics  grDevices utils datasets  methods   base
>>> ##
>>> ## loaded via a namespace (and not attached):
>>> ## [1] compiler_3.4.1
>>> tools::md5sum( fn1 )
>>> ## /home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
>>> 2009/DADOS_ENEM_2009.txt
>>> ##
>>> "83e61c96092285b60d7bf6b0dbc7072e"
>>> dat <- readLines( fn1 )
>>> length( dat )
>>> ## [1] 4148721
>>>
>>> No segfault occurs.
>>>
>>> On Sat, 15 Jul 2017, Anthony Damico wrote:
>>>
>>> hi, i realized that the segfault happens on the text file in a new R
>>>> session.  so, creating the segfault-generating text file requires a
>>>> contributed package, but prompting the actual segfault does not --
>>>> pretty
>>>> sure that means this is a base R bug?  submitted here:
>>>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311  hopefully
>>>> i am
>>>> not doing something remarkably stupid.  the text file itself is 4GB so
>>>> cannot upload it to bugzilla, and from the R_AllocStringBugger error in
>>>> the
>>>> previous message, i think most or all of it needs to be there to trigger
>>>> the segfault.  thanks!
>>>>
>>>>
>>>> On Sat, Jul 15, 2017 at 10:32 AM, Anthony Damico 
>>>> wrote:
>>>>
>>>> hi, thanks Dr. Murdoch
>>>>>
>>>>>
>>>>> i'd appreciate if anyone on r-help could help me narrow this down?  i
>>>>> believe the segfault occurs because there's a single line with 4GB and
>>>>> also
>>>>> embedded nuls, but i am not sure how to artificially construct that?
>>>>>
>>>>>
>>>>> the lodown package can be removed from my example..  it is just

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-16 Thread Anthony Damico

hi, the text file that prompts the segfault is 4gb but only 80,937 lines

> file.info( "S:/temp/crash.txt")
size isdir mode   mtime
ctime   atime exe
S:/temp/crash.txt 4078192743 FALSE  666 2017-07-15 17:24:35 2017-07-15
17:19:47 2017-07-15 17:19:47  no




On Sun, Jul 16, 2017 at 6:34 AM, Duncan Murdoch 
wrote:

> On 16/07/2017 6:17 AM, Anthony Damico wrote:
>
>> thank you for taking the time to write this.  i set it running last
>> night and it's still going -- if it doesn't finish by tomorrow, i will
>> try to find a site to host the problem file and add that link to the bug
>> report so the archive package can be avoided at least.  i'm sorry for
>> the bother
>>
>>
> How big is that text file?  I wouldn't expect my script to take more than
> a few minutes even on a huge file.
>
> My script might have a bug...
>
> Duncan Murdoch
>
> On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch
>> mailto:murdoch.dun...@gmail.com>> wrote:
>>
>> On 15/07/2017 11:33 AM, Anthony Damico wrote:
>>
>> hi, i realized that the segfault happens on the text file in a
>> new R
>> session.  so, creating the segfault-generating text file requires
>> a
>> contributed package, but prompting the actual segfault does not --
>> pretty sure that means this is a base R bug?  submitted here:
>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311
>> <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311>
>> hopefully i
>> am not doing something remarkably stupid.  the text file itself
>> is 4GB
>> so cannot upload it to bugzilla, and from the
>> R_AllocStringBugger error
>> in the previous message, i think most or all of it needs to be
>> there to
>> trigger the segfault.  thanks!
>>
>>
>> I don't want to download the big file or install the archive
>> package. Could you run the code below on the bad file?  If you're
>> right and it's only nulls that matter, this might allow me to create
>> a file that triggers the bug.
>>
>> f <-  # put the filename of the bad file here
>>
>> con <- file(f, open="rb")
>> zeros <- numeric()
>> repeat {
>>   bytes <- readBin(con, "int", 100, size=1)
>>   zeros <- c(zeros, count + which(bytes == 0))
>>   count <- count + length(bytes)
>>   if (length(bytes) < 100) break
>> }
>> close(con)
>> cat("File length=", count, "\n")
>> cat("Nulls:\n")
>> zeros
>>
>> Here's some code to recreate a file of the same length with nulls in
>> the same places, and spaces everywhere else:
>>
>> size <- count
>> f2 <- tempfile()
>> con <- file(f2, open="wb")
>> count <- 0
>> while (count < size) {
>>   nonzeros <- min(c(size - count, 100, zeros - 1))
>>   if (nonzeros) {
>> writeBin(rep(32L, nonzeros), con, size = 1)
>> count <- count + nonzeros
>>   }
>>   zeros <- zeros - nonzeros
>>   if (length(zeros) && min(zeros) == 1) {
>> writeBin(0L, con, size = 1)
>> count <- count + 1
>> zeros <- zeros[-1] - 1
>>   }
>> }
>> close(con)
>>
>> Duncan Murdoch
>>
>>
>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-16 Thread Anthony Damico

hi, yep, there are two problems -- but i think only the segfault is within
the scope of a base R issue?  i need to look closer at the corrupted
decompression and figure out whether i should talk to the brazilian
government agency that creates that .rar file or open an issue with the
archive package maintainer.  my goal in this thread is only to figure out
how to replicate the goofy text file so the r team can turn it into an
error instead of a segfault.

the original example i sent stores the .txt file somewhere inside the
tempdir(), but when i copy it over elsewhere on my machine, the md5sum()
gives the same result.  thanks again for looking at this

> tools::md5sum(infile)

C:\\Users\\AnthonyD\\AppData\\Local\\Temp\\RtmpIBy7qt/file_folder/Microdados
ENEM 2009/Dados Enem 2009/DADOS_ENEM_2009.txt
"30beb57419486108e98d42ec7a2f8b19"


> tools::md5sum( "S:/temp/crash.txt" )
 S:/temp/crash.txt
"30beb57419486108e98d42ec7a2f8b19"




On Sun, Jul 16, 2017 at 10:10 AM, Jeff Newmiller 
wrote:

> So you are saying there are two problems... one that produces a corrupt
> file from a valid compressed file, and one that segfaults when presented
> with that corrupt file? Can you please confirm the file name and run md5sum
> on it and share the result so we can tell when the file problem has been
> reproduced?
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 16, 2017 3:21:21 AM PDT, Anthony Damico 
> wrote:
> >hi, thank you for attempting this. it looks like your unix machine
> >unzipped
> >the txt file without corruption -- if you copied over the same txt file
> >to
> >windows 7, i don't think that would reproduce the problem?  i think it
> >needs to be the corrupted text file where   R.utils::countLines(
> >txtfile
> >)   gives 809367.  i am able to reproduce on two distinct windows
> >machines
> >but no guarantee i'm not doing something dumb
> >
> >On Sat, Jul 15, 2017 at 6:29 PM, Jeff Newmiller
> >
> >wrote:
> >
> >> I am not able to reproduce your segfault on a Windows 7 platform
> >either:
> >>
> >> ##
> >> fn1 <- "d:/DADOS_ENEM_2009.txt"
> >> sessionInfo()
> >> ## R version 3.4.1 (2017-06-30)
> >> ## Platform: x86_64-w64-mingw32/x64 (64-bit)
> >> ## Running under: Windows 7 x64 (build 7601) Service Pack 1
> >> ##
> >> ## Matrix products: default
> >> ##
> >> ## locale:
> >> ## [1] LC_COLLATE=English_United States.1252
> >> ## [2] LC_CTYPE=English_United States.1252
> >> ## [3] LC_MONETARY=English_United States.1252
> >> ## [4] LC_NUMERIC=C
> >> ## [5] LC_TIME=English_United States.1252
> >> ##
> >> ## attached base packages:
> >> ## [1] stats graphics  grDevices utils datasets  methods
> >base
> >> ##
> >> ## loaded via a namespace (and not attached):
> >> ## [1] compiler_3.4.1
> >> tools::md5sum( fn1 )
> >> ## d:/DADOS_ENEM_2009.txt
> >> ## "83e61c96092285b60d7bf6b0dbc7072e"
> >> dat <- readLines( fn1 )
> >> length( dat )
> >> ## [1] 4148721
> >>
> >>
> >> On Sat, 15 Jul 2017, Jeff Newmiller wrote:
> >>
> >> I am not able to reproduce this on a Linux platform:
> >>>
> >>> ###3
> >>> fn1 <- "/home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
> >>> 2009/DADOS_ENEM_2009.txt"
> >>> sessionInfo()
> >>> ## R version 3.4.1 (2017-06-30)
> >>> ## Platform: x86_64-pc-linux-gnu (64-bit)
> >>> ## Running under: Ubuntu 14.04.5 LTS
> >>> ##
> >>> ## Matrix products: default
> >>> ## BLAS: /usr/lib/libblas/libblas.so.3.0
> >>> ## LAPACK: /usr/lib/lapack/liblapack.so.3.0
> >>> ##
> >>> ## locale:
> >>> ##  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >>> ##  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >>> ##  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
> >>> ##  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >>> ##  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> >>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>> ##
> >>> ## attached base packages:
> >>> ## [1] stats graphics  grDevices utils datasets  methods
> >base
> >>> ##
> >>> ## loaded via a namespace (and not attached):
> >>> ## [1] compiler_3.4.1
> >>> tools::md5sum( f

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-17 Thread Anthony Damico

hi, thanks again for taking the time.  since corrupted compression prompted
the segfault for me in the first place, i've just posted the text file
as-is.  it's a 2.4GB file so to be avoided on a metered internet
connection.  i've updated the bugzilla report at
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311 with more
relevant info.  these lines of code crash both windows R 3.4.1 and also
linux R 3.3.3 for me.  thanks again


# consider changing `tempfile()` to a permanent location
# so you don't lose the large downloaded file after the crash
tf <- tempfile()
download.file( "https://sisyphus.project.cwi.nl/r-bug-17311-crash.txt";
, tf , mode = 'wb' )
sessionInfo()
x <- readLines( tf )




On Sun, Jul 16, 2017 at 2:22 PM, Jeff Newmiller 
wrote:

> I am stuck. The archive package won't compile for me on Ubuntu, and the
> CRANextra repo seems to be down so I cannot install packages on Windows
> right now. Perhaps you can zip the corrupt text file and put it online
> somewhere? Don't use the archive package to pack it since there seem to be
> issues with that tool on your machine.
>
> I would discourage you from harassing the Brazilian government about their
> RAR file because the RAR file seems fine (no NUL characters appear in the
> text file) when extracted using the file-roller archive tool on Ubuntu.
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 16, 2017 9:37:17 AM PDT, Anthony Damico 
> wrote:
> >hi, yep, there are two problems -- but i think only the segfault is
> >within
> >the scope of a base R issue?  i need to look closer at the corrupted
> >decompression and figure out whether i should talk to the brazilian
> >government agency that creates that .rar file or open an issue with the
> >archive package maintainer.  my goal in this thread is only to figure
> >out
> >how to replicate the goofy text file so the r team can turn it into an
> >error instead of a segfault.
> >
> >the original example i sent stores the .txt file somewhere inside the
> >tempdir(), but when i copy it over elsewhere on my machine, the
> >md5sum()
> >gives the same result.  thanks again for looking at this
> >
> >> tools::md5sum(infile)
> >
> >C:\\Users\\AnthonyD\\AppData\\Local\\Temp\\RtmpIBy7qt/file_
> folder/Microdados
> >ENEM 2009/Dados Enem 2009/DADOS_ENEM_2009.txt
> >"30beb57419486108e98d42ec7a2f8b19"
> >
> >
> >> tools::md5sum( "S:/temp/crash.txt" )
> > S:/temp/crash.txt
> >"30beb57419486108e98d42ec7a2f8b19"
> >
> >
> >
> >
> >On Sun, Jul 16, 2017 at 10:10 AM, Jeff Newmiller
> >
> >wrote:
> >
> >> So you are saying there are two problems... one that produces a
> >corrupt
> >> file from a valid compressed file, and one that segfaults when
> >presented
> >> with that corrupt file? Can you please confirm the file name and run
> >md5sum
> >> on it and share the result so we can tell when the file problem has
> >been
> >> reproduced?
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> On July 16, 2017 3:21:21 AM PDT, Anthony Damico 
> >> wrote:
> >> >hi, thank you for attempting this. it looks like your unix machine
> >> >unzipped
> >> >the txt file without corruption -- if you copied over the same txt
> >file
> >> >to
> >> >windows 7, i don't think that would reproduce the problem?  i think
> >it
> >> >needs to be the corrupted text file where   R.utils::countLines(
> >> >txtfile
> >> >)   gives 809367.  i am able to reproduce on two distinct windows
> >> >machines
> >> >but no guarantee i'm not doing something dumb
> >> >
> >> >On Sat, Jul 15, 2017 at 6:29 PM, Jeff Newmiller
> >> >
> >> >wrote:
> >> >
> >> >> I am not able to reproduce your segfault on a Windows 7 platform
> >> >either:
> >> >>
> >> >> ##
> >> >> fn1 <- "d:/DADOS_ENEM_2009.txt"
> >> >> sessionInfo()
> >> >> ## R version 3.4.1 (2017-06-30)
> >> >> ## Platform: x86_64-w64-mingw32/x64 (64-bit)
> >> >> ## Running under: Windows 7 x64 (build 7601) Service Pack 1
> >> >> ##
> >> >> ## Matrix products: default
> >> >> ##
> >> >> ## locale:
> >> >> ## [1] LC_COLLATE=English_United

Re: [R] readLines without skipNul=TRUE causes crash

2017-07-17 Thread Anthony Damico

;- readLines( fn1 )
>>> length( dat )
>>> ## [1] 4148721
>>>
>>>
>>> On Sat, 15 Jul 2017, Jeff Newmiller wrote:
>>>
>>> I am not able to reproduce this on a Linux platform:
>>>>
>>>> ###3
>>>> fn1 <- "/home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
>>>> 2009/DADOS_ENEM_2009.txt"
>>>> sessionInfo()
>>>> ## R version 3.4.1 (2017-06-30)
>>>> ## Platform: x86_64-pc-linux-gnu (64-bit)
>>>> ## Running under: Ubuntu 14.04.5 LTS
>>>> ##
>>>> ## Matrix products: default
>>>> ## BLAS: /usr/lib/libblas/libblas.so.3.0
>>>> ## LAPACK: /usr/lib/lapack/liblapack.so.3.0
>>>> ##
>>>> ## locale:
>>>> ##  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>>> ##  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>>> ##  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>>> ##  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>>> ##  [9] LC_ADDRESS=C   LC_TELEPHONE=C
>>>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>> ##
>>>> ## attached base packages:
>>>> ## [1] stats graphics  grDevices utils datasets  methods   base
>>>> ##
>>>> ## loaded via a namespace (and not attached):
>>>> ## [1] compiler_3.4.1
>>>> tools::md5sum( fn1 )
>>>> ## /home/jdnewmil/Downloads/Microdados ENEM 2009/Dados Enem
>>>> 2009/DADOS_ENEM_2009.txt
>>>> ##
>>>> "83e61c96092285b60d7bf6b0dbc7072e"
>>>> dat <- readLines( fn1 )
>>>> length( dat )
>>>> ## [1] 4148721
>>>>
>>>> No segfault occurs.
>>>>
>>>> On Sat, 15 Jul 2017, Anthony Damico wrote:
>>>>
>>>> hi, i realized that the segfault happens on the text file in a new R
>>>>> session.  so, creating the segfault-generating text file requires a
>>>>> contributed package, but prompting the actual segfault does not --
>>>>> pretty
>>>>> sure that means this is a base R bug?  submitted here:
>>>>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311  hopefully
>>>>> i am
>>>>> not doing something remarkably stupid.  the text file itself is 4GB so
>>>>> cannot upload it to bugzilla, and from the R_AllocStringBugger error
>>>>> in the
>>>>> previous message, i think most or all of it needs to be there to
>>>>> trigger
>>>>> the segfault.  thanks!
>>>>>
>>>>>
>>>>> On Sat, Jul 15, 2017 at 10:32 AM, Anthony Damico 
>>>>> wrote:
>>>>>
>>>>> hi, thanks Dr. Murdoch
>>>>>>
>>>>>>
>>>>>> i'd appreciate if anyone on r-help could help me narrow this down?  i
>>>>>> believe the segfault occurs because there's a single line with 4GB
>>>>>> and also
>>>>>> embedded nuls, but i am not sure how to artificially construct that?
>>>>>>
>>>>>>
>>>>>> the lodown package can be removed from my example..  it is just for
>>>>>> file
>>>>>> download cacheing, so `lodown::cachaca` can be replaced with
>>>>>> `download.file`  my current example requires a huge download, so sort
>>>>>> of
>>>>>> painful to repeat but i'm pretty confident that's not the issue.
>>>>>>
>>>>>>
>>>>>> the archive::archive_extract() function unzips a (probably corrupt)
>>>>>> .RAR
>>>>>> file and creates a text file with 80,937 lines.  this file is 4GB:
>>>>>>
>>>>>>> file.size(infile)
>>>>>> [1] 4078192743 <(407)%20819-2743>
>>>>>>
>>>>>>
>>>>>> i am pretty sure that nearly all of that 4GB is contained on a single
>>>>>> line
>>>>>> in the file.  here's what happens when i create a file connection and
>>>>>> scan
>>>>>> through..
>>>>>>
>>>>>>> file_con <- file( infile , 'r' )
>>>>>>>
>>>>>>> first_80936_lines <- readLines( file_con , n = 80936 )
>>>>>>> scan( w , n = 1 , what = character() )

Re: [R] Import selected columns from sas7bdat file

2017-08-10 Thread Anthony Damico

hi, the sas universal viewer might be a free, non-R way to convert a
sas7bdat file to non-proprietary formats, not sure if it's windows-only.
those other formats should be easier to import only a subset of columns
into R..

https://support.sas.com/downloads/browse.htm?fil=&cat=74

On Thu, Aug 10, 2017 at 7:42 AM, peter dalgaard  wrote:

> I had a look at this a while back and it didn't seem to be easy. The path
> of least resistance would seem to be to use SAS itself to create a data set
> with fewer columns, but of course that requires you to get access to SAS.
>
> Otherwise, I think you'd have to modify sas7bdat::read.sas7bdat to drop
> unselected columns. That function is pure R code, so it might not be quite
> as hard as it sounds.
>
> Incidentally, do teach your mailer to not send plain text. It is not much
> of a problem this time, but HTML mails can become quite unreadable on the
> list.
>
>
> -pd
>
>
> > On 10 Aug 2017, at 12:24 , Utkarsh Singhal 
> wrote:
> >
> > Hello everyone,
> >
> > I want to import data from huge sas files with 100s of columns. The good
> > thing is that I am only interested in a few selected columns. Is there
> any
> > way to do that without loading the full dataset.
> >
> > I have tried two functions: (1) read.sas7bdat *[from library
> 'sas7bdat']*,
> > and (2) read_sas *[from library 'haven']. *But couldn't find what I am
> > looking for.
> >
> > Best regards,
> > Utkarsh Singhal
> > 91.96508.54333
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] code to provoke a crash running rterm.exe on windows

2016-05-28 Thread Anthony Damico

hi, here's a minimal reproducible example that crashes my R 3.3.0 console
on a powerful windows server.  below the example, i've put the error (not
crash) that occurs on R 3.2.3.

should this be reported to http://bugs.r-project.org/ or am i doing
something silly?  thanx





# C:\Users\AnthonyD>"c:\Program Files\R\R-3.3.0\bin\x64\Rterm.exe"

# R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
# Copyright (C) 2016 The R Foundation for Statistical Computing
# Platform: x86_64-w64-mingw32/x64 (64-bit)

# R is free software and comes with ABSOLUTELY NO WARRANTY.
# You are welcome to redistribute it under certain conditions.
# Type 'license()' or 'licence()' for distribution details.

  # Natural language support but running in an English locale

# R is a collaborative project with many contributors.
# Type 'contributors()' for more information and
# 'citation()' on how to cite R or R packages in publications.

# Type 'demo()' for some demos, 'help()' for on-line help, or
# 'help.start()' for an HTML browser interface to help.
# Type 'q()' to quit R.

sessionInfo()
# R version 3.3.0 (2016-05-03)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows Server 2012 R2 x64 (build 9600)

# locale:
# [1] LC_COLLATE=English_United States.1252
# [2] LC_CTYPE=English_United States.1252
# [3] LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C
# [5] LC_TIME=English_United States.1252

# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base

memory.limit()
# [1] 229247

# works fine
grpsize = ceiling(10^5/26)

# simple data.frame
my_df <-
  data.frame(
  x=rep(LETTERS,each=26*grpsize),
  v=runif(grpsize*26),
  stringsAsFactors=FALSE
  )

# mis-match the number of elements
my_df <-
  data.frame(
  x=rep(LETTERS,each=26*grpsize),
  v=runif(grpsize*26),
  stringsAsFactors=FALSE
  )

# make this much bigger
grpsize = ceiling(10^8/26)

# simple data.frame
my_df <-
  data.frame(
  x=rep(LETTERS,each=grpsize),
  v=runif(grpsize*26),
  stringsAsFactors=FALSE
  )

# mis-match the number of elements
my_df <-
  data.frame(
  x=rep(LETTERS,each=26*grpsize),
  v=runif(grpsize*26),
  stringsAsFactors=FALSE
  )

# CONSOLE CRASH WITHOUT EXPLANATION
C:\Users\AnthonyD>



# # # # # running the exact same commands on r version 3.2.3 on windows:

C:\Users\AnthonyD>"C:\Program Files\R\R-3.2.3\bin\x64\Rterm.exe"

memory.limit()
# [1] 229247

grpsize = ceiling(10^8/26)

# mis-matched number of elements
my_df <-
  data.frame(
  x=rep(LETTERS,each=26*grpsize),
  v=runif(grpsize*26),
  stringsAsFactors=FALSE
  )
# Error in if (mirn && nrows[i] > 0L) { :
  # missing value where TRUE/FALSE needed
# In addition: Warning message:
# In as.data.frame.vector(x, ..., nm = nm) :
  # NAs introduced by coercion to integer range

# # # # but console does not crash # # # #

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] asking for large memory - crash running rterm.exe on windows

2016-05-28 Thread Anthony Damico

hi, thanks to you both!  note the large memory.limit() on the machine
before the crash (200+ gb) so i'm not sure it's a simple overloading
explosion?  i've filed a bug report..

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16927



On Saturday, May 28, 2016, Martin Maechler 
wrote:

> >>>>> Ben Bolker 
> >>>>> on Sat, 28 May 2016 15:42:45 + writes:
>
> > Anthony Damico  gmail.com> writes:
> >>
> >> hi, here's a minimal reproducible example that crashes my
> >> R 3.3.0 console on a powerful windows server.  below the
> >> example, i've put the error (not crash) that occurs on R
> >> 3.2.3.
> >>
> >> should this be reported to http://bugs.r-project.org/ or
> >> am i doing something silly?  thanx
>
>
> > From the R FAQ (9.1):
>
> > If R executes an illegal instruction, or dies with an
> > operating system error message that indicates a problem in
> > the program (as opposed to something like “disk full”),
> > then it is certainly a bug.
>
> >   So you could submit a bug report, *or* open a discussion
> > on r-de...@r-project.org (which I'd have said was a more
> > appropriate venue for this question in any case) ...
>
> Indeed.
> In this case, this is a known problem -- not just of R, but of
> many programs that you can run ---
> You are requesting (much) more memory than your computer has
> RAM, and in this situation -- depending on the OS ---
> your computer will kill R (what you saw) or your it will become
> very slow trying to shove all memory to R and start swapping
> (out to disk other running / sleeping processes on the
> computer).
>
> Both is very unpleasant...
> But it is you as R user who asked R to allocate an object of
> about 41.6 Gigabytes (26 * 1.6, see below).
>
> As Ben mentioned this may be worth a discussion on R-devel ...
> or you rather follow up the existing thread opened by Marius
> Hofert  three weeks ago, with subject
>  "[Rd] R process killed when allocating too large matrix (Mac OS X)"
>
>   -->  https://stat.ethz.ch/pipermail/r-devel/2016-May/072648.html
>
> His simple command to "crash R" was
>
>matrix(0, 1e5, 1e5)
>
> which for some of use gives an error such as
>
> > x <- matrix(0, 1e5,1e5)
> Error: cannot allocate vector of size 74.5 Gb
>
> but for others it had the same effect as your example.
> BTW: I repeat it here in a functionalized form with added
>  comments which makes apparent what's going on:
>
>
> ## Make simple data.frame
> mkDf <- function(grpsize, wrongSize = FALSE) {
> ne <- (if(wrongSize) 26 else 1) *grpsize
> data.frame(x = rep(LETTERS, each = ne),
>v = runif(grpsize*26), stringsAsFactors=FALSE)
> }
>
> g1 <- ceiling(10^5/26)
> d1 <- mkDf(g1) # works fine
> str(d1)
> ## 'data.frame':100022 obs. of  2 variables:
>
> dP <- mkDf(g1, wrong=TRUE)# mis-matching the number of elements
>
> str(dP) # is 26 times larger
> ## 'data.frame': 2600572 obs. of  2 variables: .
>
>
> # make this much bigger
> gLarge <- ceiling(10^8/26)
>
> dL <- mkDf(gLarge) # works "fine" .. (well, takes time!!)
> str(dL)
> ## 'data.frame': 10004 obs. of  2 variables:
> as.numeric(print(object.size(dL)) / 1e6)
> ## 162088 bytes
> ## [1] 1600.002  Mega  i.e.,  1.6 GBytes
>
> ## Well, this will be 26 times larger than already large ==> your R may
> crash *OR*
>  ## your computer may basically slow down to a crawl, when R requests all
> its memory...
> if(FALSE) ## ==> do *NOT* evaluate the following lightly !!
> dLL <- mkDf(gLarge, wrong=TRUE)
> # CONSOLE CRASH WITHOUT EXPLANATION
> # C:\Users\AnthonyD>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svykappa using the survey package

2016-06-20 Thread Anthony Damico

hi pradip, this should give you what you want


library(foreign)
library(survey)

tf <- tempfile()

download.file( "
https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip"; , tf , mode =
'wb' )

z <- unzip( tf , exdir = tempdir() )

x <- read.xport( z )

names( x ) <- tolower( names( x ) )

design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
data=x, nest=TRUE)

# include missings as "No" values here
design <-
update(design,
xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'),
xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing')
)

# subset out records that were missing for either variable
svykappa( ~ xbpchek53 + xcholck53 , subset(design, bpchek53 > 0 &
cholck53 > 0 ) )


















On Mon, Jun 20, 2016 at 7:49 PM, Muhuri, Pradip (AHRQ/CFACT) <
pradip.muh...@ahrq.hhs.gov> wrote:

> Hello,
>
> My goal is to calculate the weighted kappa measure of agreement between
> two factors  using the R  survey package.  I am getting the following error
> message (the console is appended below; sorry no data provided).
>
> > # calculate survey Kappa
> > svykappa(~xbpchek53+xcholck53, design)
> Error in names(probs) <- nms :
>   'names' attribute [15] must be the same length as the vector [8]
>
> I have followed the following major steps:
>
> 1) Used the "haven" package to read the sas data set into R.
> 2) Used the dplyr mutate() to create 2 new variables and converted to
> factors [required for the svykappa()?].
> 3) Created an object (named design) using the survey design variables and
> the data file.
> 4) Used the svykappa() to compute the kappa measure of agreement.
>
> I will appreciate if someone could give me hints on how to resolve the
> issue.
>
> Thanks,
>
> Pradip Muhuri
>
> ###  The detailed console is appended below
> 
>
> > setwd ("U:/A_PSAQ")
> > library(haven)
> > library(dplyr)
> > library(survey)
> > library(srvyr)
> > library(Hmisc)
> > my_hc2013_data <- read_sas("pc2013.sas7bdat")
> >
> > # Function to convert var names in upper cases to var names in lower
> cases
> > lower <- function (df) {
> +   names(df) <- tolower(names(df))
> +   df
> + }
> > my_hc2013_data <- lower(my_hc2013_data)
> >
> > # Check the contents - Hmisc package (as above) required
> > # contents(my_hc2013_data)
> >
> > # create two new variables
> > my_hc2013_data <- mutate(my_hc2013_data,
> +  xbpchek53 = ifelse(bpchek53 ==1, 1,
> + ifelse(bpchek53 %in% 2:6, 2,NA)),
> +  xcholck53 = ifelse(cholck53 ==1, 1,
> +ifelse(cholck53 %in% 2:6, 2,NA)))
> >
> > # convert the numeric variables to factors for the kappa measure
> > my_hc2013_data$xbpchek53 <- as.factor(my_hc2013_data$xbpchek53)
> > my_hc2013_data$xcholck53 <- as.factor(my_hc2013_data$xcholck53)
> >
> > # check whether the variables are factors
> > is.factor(my_hc2013_data$xbpchek53)
> [1] TRUE
> > is.factor(my_hc2013_data$xcholck53)
> [1] TRUE
> >
> >
> > # check the data from the cross table
> > addmargins(with(my_hc2013_data, table(bpchek53,xbpchek53 )))
> xbpchek53
> bpchek53 1 2   Sum
>  -9  0 0 0
>  -8  0 0 0
>  -7  0 0 0
>  -1  0 0 0
>  1   19778 0 19778
>  2   0  2652  2652
>  3   0  1014  1014
>  4   0   538   538
>  5   0   737   737
>  6   0   623   623
>  Sum 19778  5564 25342
> > addmargins(with(my_hc2013_data, table(cholck53,xcholck53 )))
> xcholck53
> cholck53 1 2   Sum
>  -9  0 0 0
>  -8  0 0 0
>  -7  0 0 0
>  -1  0 0 0
>  1   14850 0 14850
>  2   0  3153  3153
>  3   0  1170  1170
>  4   0   696   696
>  5   0   909   909
>  6   0  3764  3764
>  Sum 14850  9692 24542
> > addmargins(with(my_hc2013_data, table(xbpchek53,xcholck53 )))
>  xcholck53
> xbpchek53 1 2   Sum
>   1   14667  4379 19046
>   2 163  5225  5388
>   Sum 14830  9604 24434
> >
> > # create an object with design variables and data
> > design<-svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
> data=my_hc2013_data, nest=TRUE)
> >
> > # calculate survey Kappa
> > svykappa(~xbpchek53+xcholck53, design)
> Error in names(probs) <- nms :
>   'names' attribute [15] must be the same length as the vector [8]
>
> #
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Muhuri,
> Pradip (AHRQ/CFACT)
> Sent: Thursday, June 16, 2016 2:06 PM
> To: David Winsemius
> Cc: r-help@r-project.org
> Subject: Re: [R] dplyr's arrange function - 3 solutions receiv

Re: [R] svymean using the survey package - strata containing no subpopulation members

2016-06-22 Thread Anthony Damico

hi pradip, with meps you should be able to match precisely between r+survey
and those other languages[1]

if i had to guess, i would say that your sas and stata code is actually
doing the equivalent of this, which is not correct.  check the journal
article table #1 for syntax comparisons

design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
data=subset(x, racethx==4 & diabdx==1), nest=TRUE)
svymean(~ totexp13, design)
#Error in onestrat(x[index, , drop = FALSE], clusters[index],
nPSU[index][1],  :
#  Stratum (1004) has only one PSU at stage 1


more detailed discussion of lonely psu behavior at [2] including how to
override this error -- but i think it comes from faulty design
specification so should not be necessary?  thanks


[1] https://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf
[2] http://faculty.washington.edu/tlumley/old-survey/exmample-lonely.html





On Wed, Jun 22, 2016 at 9:32 PM, Muhuri, Pradip (AHRQ/CFACT) <
pradip.muh...@ahrq.hhs.gov> wrote:

> Hi,
>
> Below is a reproducible example that produces the estimate of "totexp13"
> (total health care expenditure 2013) for the subpopulation that includes
> "Asians with diabetes diagnosed" in MEPS. The R script below downloads file
> from the web for processing.
>
> Issue/Question: The R/survey package does not seem to provide a NOTE
> regarding the number of strata containing NO SUBOPOPULATION MEMBERS (in
> this case - Asians with diabetes diagnosed in MEPS 2013). Is there a way to
> get this count or ask R to provide this information?  Any hints will be
> appreciated.
>
> Acknowledgements:   The current R script is a tweaked-version of the code
> originally sent (on this forum) by Anthony Damico for another application.
> Thanks to Anthony!
>
> Good news: The estimate is almost the same as the estimates obtained from
> SAS, SUDAAN and STATA runs.
>
> Additional Information:  STATA provides a NOTE that " 84 strata omitted
> because no subpopulation members".
> SAS LOG (proc surveymeans) provides a NOTE that "Only one cluster in a
> stratum in domain Asian_with_diab for variable(s) TOTEXP13. The estimate of
> variance for TOTEXP13 will
>   omit this stratum".
>
>
> Thanks,
>
> Pradip Muhuri
>
>
>
>
> library(foreign)
> library(survey)
> library(dplyr)
>
> tf <- tempfile()
>
> download.file( "https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip";,
> tf , mode = 'wb' )
>
> z <- unzip( tf , exdir = tempdir() )
>
> x <- read.xport( z )
>
> names( x ) <- tolower( names( x ) )
>
> mydata <- select(x, varstr, varpsu, perwt13f, diabdx, totexp13, racethx)
>
> mydata[mydata <=0] <- NA
>
> design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, data=x,
> nest=TRUE)
>
>
> # include missings as "No" values here
> #design <-
> #  update(design,
> #xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'),
>   #   xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing')
>   #)
>
> # get the estimate for "totexp13" for the subset that includes Asians with
> diabetes diagnosed
> svymean(~ totexp13, subset(design, racethx==4 & diabdx==1))
>
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] r code for multilevel latent class analysis

2016-07-07 Thread Anthony Damico

start at
https://github.com/ajdamico/asdfree/blob/master/European%20Social%20Survey/structural%20equation%20modeling%20examples.R
maybe?

On Thu, Jul 7, 2016 at 6:26 AM, Cristina Cametti  wrote:

> Dear all,
>
> I am not able to find a reliable r code to run a multilevel latent class
> model. Indeed, I have to analyze how social trust (three variables form the
> ESS survey) might vary between countries (21 countries in my database). I
> tried to use the poLCA package but I am not sure if my code is right. This
> is my code:
> lca <- cbind(ppltrst+1,pplfair+1,pplhlp+1)~cntry
> lc <- poLCA(lca,mydata)
>
> However, I get an error message:
> Error in `[.data.frame`(data, , match(colnames(y), colnames(data))[j]) :
> undefined columns selected
>
> How can I solve this? Is the code completely wrong or I missed some
> passages?
> Thank you very much for your help!
>
> Cristina
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svytable: How do i create a table informing a third variable?

2016-09-02 Thread Anthony Damico

# mean
svymean( ~ income_variable , NN )
svyby( ~ income_variable , ~ age + sex , NN , svymean )

# median
svyquantile( ~ income_variable , NN )
svyby( ~ income_variable , ~ age + sex , NN , svyquantile , 0.5 )




On Fri, Sep 2, 2016 at 3:04 PM, Juan Ceccarelli Arias 
wrote:

> Hello
> Im analyzing a survey and i need to obtain some statistics per groups.
> Im able to create a table with sex and age. However, if i want to know how
> much income earns the population by sex and age, i can't.
> Im loading the dataset as describe the line below
> NN <- svydesign(ids = ~1, data = encuesta, weights = fact)
> Some simple table i can create
> table(svytable(~age+sex,design=NN))
> But im not able to handle the same tabulate referencing a income variable,
> in this case, wage.
> Can you help me?
> Thanks for your replies and time.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Svyglm Error in Survey Package

2016-09-23 Thread Anthony Damico

hi could you make this a minimal reproducible example?

On Sep 24, 2016 12:03 PM, "Courtney Benjamin"  wrote:

> In attempting to use the svyglm call in the R Survey Package, I am
> receiving the error: Error in pwt[i] : invalid subscript type 'list'
>
> I have not been able to find a lot of information on how to resolve the
> error; one source advised it was related to how the subsetting command was
> executed.
>
> This was my initial attempt:
>
> mc1 <- 
> svyglm(F3ATTAINMENT~F1SES2QU+F1RGPP2,elsq1ch_brr,subset(elsq1ch_brr,BYSCTRL==1
> & G10COHRT==1),na.action)
> summary(mc1)
> This was my second approach trying to change up how I had subsetted the
> data:
> summary(mc1)
> samp1 <- subset(elsq1ch_brr,BYSCTRL==1 & G10COHRT==1)
> dim(samp1)
> mc1 <- svyglm(F3ATTAINMENT~F1SES2QU+F1RGPP2,elsq1ch_brr,subset=
> samp1,na.action)
> summary(mc1)?
>
> Both attempts resulted in the same error stated above.  Any advisement in
> how to resolve this error would be greatly appreciated.
> Sincerely,
> Courtney Benjamin
>
> ?
>
>
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenj...@btboces.org
>
> 607-763-8633
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot by FIPS Code using Shapefiles

2015-05-05 Thread Anthony Damico

hi, after running each individual line of code above, check that the object
still has the expected number of records and unique county fips codes.  it
looks like length( shapes$GEOID ) == 3233 but nrow( merged_data ) == 3109.
the way for you to debug this is for you to go through line by line after
creating each new object  :)

i'm also not sure it's safe to work with gis objects as you're doing, there
are some well-documented examples of working with tiger files here
https://github.com/davidbrae/swmap



On Tue, May 5, 2015 at 11:00 AM, Shouro Dasgupta  wrote:

> I am trying to plot data by FIPS code using county shapes files.
>
> library(data.table)
> > library(rgdal)
> > library(colourschemes)
> > library(RColorBrewer)
> > library(maptools)
> > library(maps)
> > library(ggmap)
>
>
> I have data by FIPS code which looks like this:
> >
> >
> > dput(head(max_change))
> > structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009",
> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06,
> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06,
> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06,
> > 5.82369276823497e-06,
> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06,
> > 5.60128903156804e-06), change = c(-1.47141054005866, -0.904829303986895,
> > -1.47141054005866, -1.58621746782168, -1.49938750670105,
> -1.47141054005866
> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), class =
> > c("data.table",
> > "data.frame"), row.names = c(NA, -6L), .internal.selfref =  > 0x00110788>)
>
>
>  I add leading zeroes by:
>
> max_change <- as.data.table(max_change)
> max_change$FIPS <- sprintf("%05d",as.numeric(max_change$FIPS))
>
> I downloaded shapefiles from here:
> ftp://ftp2.census.gov/geo/tiger/TIGER2014/COUNTY/.
>
> I obtain the FIPS codes from the shapefiles and order them using:
>
> shapes_fips <- shapes$GEOID
> > shapes_fips <- as.data.table(shapes_fips)
> > setnames(shapes_fips, "shapes_fips", "FIPS")
> > shapes_fips <- shapes_fips[with(shapes_fips, order(FIPS)), ]
> > shapes_fips$FIPS <- as.character(shapes_fips$FIPS)
>
>
> Then I merge the FIPS codes with my original dataset using:
>
> >
> > merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T)
> > merged_data <- as.data.table(merged_data)
>
>
> Which looks like this:
>
> structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009",
> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06,
> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06,
> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06,
> > 5.82369276823497e-06,
> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06,
> > 5.60128903156804e-06), change = c(-1.47141054005866, -0.904829303986895,
> > -1.47141054005866, -1.58621746782168, -1.49938750670105,
> -1.47141054005866
> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), sorted =
> > "FIPS", class = c("data.table",
> > "data.frame"), row.names = c(NA, -6L), .internal.selfref =  > 0x00110788>)
>
>
> But when I try to merged data back to the SpatialPolygonsDataFrame called
> shapes, I get the following error:
>
> shapes$change <- merged_data$change
>
> Error in `[[<-.data.frame`(`*tmp*`, name, value = c(-1.47141054005866,  :
> >   replacement has 3109 rows, data has 3233
>
>
>  Apologies for the messy example, what am I doing wrong? Any help will be
> greatly appreciated. Thank you!
>
> Sincerely,
>
> Shouro
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot by FIPS Code using Shapefiles

2015-05-05 Thread Anthony Damico

so check the unique number of fips codes in the objects before and after

> merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T)

also note that all.X should be all.x and you might want to use FALSE for
one or both of those



On Tue, May 5, 2015 at 11:40 AM, Shouro Dasgupta  wrote:

> Hello,
>
> Thank you for your reply. My original data has 3109 FIPS codes. Is there a
> way to merge only this data into the shapefiles? I hope I am clear.
>
> Thank you also for the link, I am trying to do something like this:
> https://gist.github.com/reubano/1281134.
>
> Thanks again!
>
> Sincerely,
>
> Shouro
>
> On Tue, May 5, 2015 at 5:21 PM, Anthony Damico  wrote:
>
>> hi, after running each individual line of code above, check that the
>> object still has the expected number of records and unique county fips
>> codes.  it looks like length( shapes$GEOID ) == 3233 but nrow( merged_data
>> ) == 3109.  the way for you to debug this is for you to go through line by
>> line after creating each new object  :)
>>
>> i'm also not sure it's safe to work with gis objects as you're doing,
>> there are some well-documented examples of working with tiger files here
>> https://github.com/davidbrae/swmap
>>
>>
>>
>> On Tue, May 5, 2015 at 11:00 AM, Shouro Dasgupta 
>> wrote:
>>
>>> I am trying to plot data by FIPS code using county shapes files.
>>>
>>> library(data.table)
>>> > library(rgdal)
>>> > library(colourschemes)
>>> > library(RColorBrewer)
>>> > library(maptools)
>>> > library(maps)
>>> > library(ggmap)
>>>
>>>
>>> I have data by FIPS code which looks like this:
>>> >
>>> >
>>> > dput(head(max_change))
>>> > structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009",
>>> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06,
>>> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06,
>>> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06,
>>> > 5.82369276823497e-06,
>>> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06,
>>> > 5.60128903156804e-06), change = c(-1.47141054005866,
>>> -0.904829303986895,
>>> > -1.47141054005866, -1.58621746782168, -1.49938750670105,
>>> -1.47141054005866
>>> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), class =
>>> > c("data.table",
>>> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = >> > 0x00110788>)
>>>
>>>
>>>  I add leading zeroes by:
>>>
>>> max_change <- as.data.table(max_change)
>>> max_change$FIPS <- sprintf("%05d",as.numeric(max_change$FIPS))
>>>
>>> I downloaded shapefiles from here:
>>> ftp://ftp2.census.gov/geo/tiger/TIGER2014/COUNTY/.
>>>
>>> I obtain the FIPS codes from the shapefiles and order them using:
>>>
>>> shapes_fips <- shapes$GEOID
>>> > shapes_fips <- as.data.table(shapes_fips)
>>> > setnames(shapes_fips, "shapes_fips", "FIPS")
>>> > shapes_fips <- shapes_fips[with(shapes_fips, order(FIPS)), ]
>>> > shapes_fips$FIPS <- as.character(shapes_fips$FIPS)
>>>
>>>
>>> Then I merge the FIPS codes with my original dataset using:
>>>
>>> >
>>> > merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T)
>>> > merged_data <- as.data.table(merged_data)
>>>
>>>
>>> Which looks like this:
>>>
>>> structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009",
>>> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06,
>>> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06,
>>> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06,
>>> > 5.82369276823497e-06,
>>> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06,
>>> > 5.60128903156804e-06), change = c(-1.47141054005866,
>>> -0.904829303986895,
>>> > -1.47141054005866, -1.58621746782168, -1.49938750670105,
>>> -1.47141054005866
>>> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), s

Re: [R] confidence intervals for differences in proportions from complex survey design?

2015-05-10 Thread Anthony Damico

i don't know the answer to your larger question, but for confidence
intervals around proportions you might look at ?svyciprop.  one of the
method= options might yield the same result as your approximation, not sure

On Mon, May 11, 2015 at 12:40 AM, Brown, Tony Nicholas <
tony.n.br...@vanderbilt.edu> wrote:

> All:
>
> I need to generate confidence intervals for differences in proportions
> using data from a complex survey design. An example follows where I attempt
> to estimate the difference in depression prevalence by sex.
>
> # Data might look something like this:
> Dfr<-data.frame(depression=sample(c("yes","no"), size=30, replace=TRUE),
> sex=sample(c("M","F"), size=30, replace=TRUE),
> cluster=rep(1:10, times=3),
> stratum=rep(1:5, each=2, times=3),
> pweight=runif(n=30, min=1, max=3))
> Dfr
> library(survey)
> msdesign<-svydesign(id=~cluster, strata=~stratum, weights=~pweight,
> nest=TRUE,
> data=Dfr)
> # When searching online, one recommendation was to use svyglm() to
> generate an
> # approximation as follows:
> confint(with(Dfr, svyglm(I(depression=="yes")~sex,
> family=gaussian(link=identity),
> msdesign)), level=0.95, method="Wald")
>
> This question has been asked before on the listserv (circa 2007) and I
> contacted the original poster, who indicated that they never received a
> reply.
>
> Here is the question as described by the original poster:
>
> "I'm trying to get confidence intervals of proportions (sometimes for
> subgroups) estimated from complex survey data. Because a function like
> prop.test() does not exist for the "survey" package I tried the following:
>
> 1) Define a survey object (PSU of clustered sample, population weights);
> 2) Use svyglm() of the package "survey" to estimate a binary logistic
> regression (family='binomial'): For the confidence interval of a single
> proportion regress the binary dependent variable on a constant (1), for
> confidence intervals of that variable for subgroups regress this
> variable on the groups (factor) variable;
> 3) Use predict() to obtain estimated logits and the respective standard
> errors (mod.dat specifying either the constant or the subgroups):
>
> pred=predict(model,mod.dat,type='link',se.fit=T)
>
> and apply the following to obtain the proportion with its confidence
> intervals (for example, for conf.level=.95):
>
> lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred)
> hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred)
> prop = 1/(1+exp(-pred[1:length(pred)]))
> lo = 1/(1+exp(-lo.e))
> hi = 1/(1+exp(-hi.e))
>
> I think that in that way I get CI's based on asymptotic normality -
> either for a single proportion or split up into subgroups.
>
> Question: Is this a correct or a defensible procedure? Or should I use a
> different approach? Note that this approach should also allow to
> estimate CI's for proportions of subgroups taking into account the
> complex survey design."
>
> Thanks in advance for any help that you can provide.
>
> Tony
>
>
> --
> Tony N. Brown, Ph.D.
> Associate Chair and Associate Professor of Sociology
> Google Scholar Profile: http://tinyurl.com/lozlht8
> LinkedIn Profile:
> https://www.linkedin.com/pub/tony-nicholas-brown/a6/64/31a
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Households per Census block

2015-08-03 Thread Anthony Damico

hi, ccing the package maintainer.  one alternative is to pull the HU100
variable directly from the census bureau's summary files: that variable
starts at position 328 and ends at 336.  just modify this loop and you'll
get a table with one-record-per-census-block in every state.

https://github.com/davidbrae/swmap/blob/master/how%20to%20map%20the%20consumer%20expenditure%20survey.R#L104

(1) line 134 change the very last -9 to 9
(2) line 137 between "pop100" and "intptlat" add an "hu100"

summary file docs-

http://www.census.gov/prod/cen2010/doc/sf1.pdf#page=18

On Mon, Aug 3, 2015 at 11:55 AM, Keith S Weintraub  wrote:

> Folks,
>
> I am using the UScensus2010 package and I am trying to figure out the
> number of households per census block.
>
> There are a number of possible data downloads in the package but
> apparently I am not smart enough to figure out which data-set is
> appropriate and what functions to use.
>
> Any help or pointers or links would be greatly appreciated.
>
> Thanks for your time,
> Best,
> KW
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] minimal reproducible read.fwf() example that crashes the console on windows 8 with 32-bit R

2015-08-15 Thread Anthony Damico

hi, if i copy and paste this (pretty straightforward) code into R 3.2.2's
32-bit console, the program dies.  if i use 64-bit R, the console doesn't
die, but the process ends with a weird line-ending warning.  i'm under the
impression that if the console crashes, it's a bug?  but i wanted to check
with r-help that i'm not doing something silly before filing a formal bug
report..

i get the same crash using 3.2.1 but do not need setInternet2( FALSE )

if i use 3.2.22 with setInternet2( TRUE ) then the download throws an
internet connectivity error (but the console does not crash)

thanks!






sessionInfo()
# R version 3.2.2 (2015-08-14)
# Platform: i386-w64-mingw32/i386 (32-bit)
# Running under: Windows 8 x64 (build 9200)

# locale:
# [1] LC_COLLATE=English_United States.1252
# [2] LC_CTYPE=English_United States.1252
# [3] LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C
# [5] LC_TIME=English_United States.1252

# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base

setInternet2( FALSE )

widths <- c(5, 2, -3, 2, 2, 1, 1, 1, 1, 1, 1, 5, -2, 2, 1, 1, 1, 2, 2,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2, -1, 1, 2, 1, 2, 1, 2,
2, 1, 1, 1, 5, 1, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1,
2, 1, 2, 2, 1, 1, 1, 5, 1, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2,
2, 2, 1, 2, 1, 2, 2, 1, 1, 1, 5, 1, 1, 2, 2, 1, 1, 1, 5, 5, 5,
5, 5, 5, 1, 3, 5, 5, 3, 5, 5, 3, 5, 5, 3, 5, 5, 5, 1, 1, 1, 1,
1, 1, 1, 3, 4, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1,
2, 2, 2, 2, 2, 7, 7, 7, 7, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, -2369, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8)

varnames <- c("SEQNUM", "RECTYPE", "PREG_NUM", "PREGTYPE", "NUMBIRTH",
"OUTCOME1",
"OUTCOME2", "OUTCOME3", "DELIVERY", "NEWFLAG", "B14MO", "B_15",
"B_16", "BOX7", "B_17", "B_18", "B_19", "B_20", "B_21", "B_22",
"B_23", "B_24", "B25A", "B25B", "B25C", "B25D", "B25E", "B25F",
"B_26", "B_27", "B_28", "B29A", "B29B", "B29C", "B29D", "B29E",
"B29F", "B29G", "B_30", "BOX8", "BLIVEBIR", "LASTPREG", "B12_1",
"B31LB_1", "B31OZ_1", "B32_1", "BOX10_1", "B33A_1", "B33B_1",
"B33C_1", "B33D_1", "B33E_1", "B33F_1", "B34_1", "B35_1", "B36_1",
"B37_1", "B38_1", "BOX11_1", "B39_1", "B40_1", "B41MO_1", "BOX12_1",
"B42_1", "B43_1", "B44_1", "B12_2", "B31LB_2", "B31OZ_2", "B32_2",
"BOX10_2", "B33A_2", "B33B_2", "B33C_2", "B33D_2", "B33E_2",
"B33F_2", "B34_2", "B35_2", "B36_2", "B37_2", "B38_2", "BOX11_2",
"B39_2", "B40_2", "B41MO_2", "BOX12_2", "B42_2", "B43_2", "B44_2",
"B12_3", "B31LB_3", "B31OZ_3", "B32_3", "BOX10_3", "B33A_3",
"B33B_3", "B33C_3", "B33D_3", "B33E_3", "B33F_3", "B34_3", "B35_3",
"B36_3", "B37_3", "B38_3", "BOX11_3", "B39_3", "B40_3", "B41MO_3",
"BOX12_3", "B42_3", "B43_3", "B44_3", "B_45", "B_46", "C12A",
"C13F1MO", "C13T1MO", "C13F2MO", "C13T2MO", "C13F3MO", "C13T3MO",
"C_14", "C15M1", "C16M1MO", "C17M1MO", "C15M2", "C16M2MO", "C17M2MO",
"C15M3", "C16M3MO", "C17M3MO", "C15M4", "C16M4MO", "C17M4MO",
"C18MO", "C_19", "C_20", "C_21", "C_22", "C_23", "C_24", "C_25",
"PRGLNGTH", "AGEPREG", "WANTWIFE", "WANTMAN", "OUTCOME", "YRPREG",
"FMAROUT", "LIVBABY1", "LIVBABY2", "LIVBABY3", "LOW1", "LOW2",
"LOW3", "PREGTEST", "PNCAREWK", "PNCARENO", "RACE", "CEND84",
"BIRTH071", "BIRTH072", "BIRTH073", "PREGNUM7", "PREGNUM8", "W_1",
"W_2", "W_3", "W_4", "W_5", "FLAG341", "FLAG372", "FLAG373",
"FLAG374", "FLAG375", "FLAG376", "FLAG426", "FLAG427", "FLAG614",
"FLAG621", "FLAG991", "FLAG992", "REPWGT1", "REPWGT2", "REPWGT3",
"REPWGT4", "REPWGT5", "REPWGT6", "REPWGT7", "REPWGT8", "REPWGT9",
"REPWGT10", "REPWGT11", "REPWGT12", "REPWGT13", "REPWGT14", "REPWGT15",
"REPWGT16", "REPWGT17", "REPWGT18", "REPWGT19", "REPWGT20", "REPWGT21",
"REPWGT22", "REPWGT23", "REPWGT24", "REPWGT25", "REPWGT26", "REPWGT27",
"REPWGT28", "REPWGT29", "REPWGT30", "REPWGT31", "REPWGT32", "REPWGT33",
"REPWGT34", "REPWGT35", "REPWGT36", "REPWGT37", "REPWGT38", "REPWGT39",
"REPWGT40", "REPWGT41", "REPWGT42", "REPWGT43", "REPWGT44", "REPWGT45",
"REPWGT46", "REPWGT47", "REPWGT48", "REPWGT49", "REPWGT50", "REPWGT51",
"REPWGT52", "REPWGT53", "REPWGT54", "REPWGT55", "REPWGT56", "REPWGT57",
"REPWGT58", "REPWGT59", "REPWGT60", "REPWGT61", "REPWGT62", "REPWGT63",
"REPWGT64", "REPWGT65", "REPWGT66", "REPWGT67", "REPWGT68", "REPWGT69",
"REPWGT70", "REPWGT71", "REPWGT72", "REPWGT73", "REPWGT74", "REPWGT75",
"REPWGT76", "REPWGT77", "REPWGT78", "REPWGT79", "REPWGT80", "REPWGT81",
"REPWGT82", "REPWGT83", "REPWGT84", "REPWGT85", "REPWGT86", "REPWGT87",
"REPWGT88", "REPWGT89", "REPWGT90", "REPWGT91", "REPWGT92", "REPWGT93",
"REPWGT94", "REPWGT95", "REPWGT96", "REPWGT97", "REPWGT98", "REPWGT99",
"REPWGT100")

x <-
read.fwf(
file = "
ftp://ftp.cdc.gov/pub/Healt

Re: [R] Missing Data Imputation for Complex Survey Data

2014-12-12 Thread Anthony Damico

the mitools package is compatible with the survey package..  asdfree.com
has complete step-by-step R code examples to work with govt microdata.
here are the ones with multiply imputed survey data.  :)

national health interview survey
national survey of children's health
consumer expenditure survey
program of international student assessment
survey of consumer finances
survey of business owners
program for the international assessment of adult competencies

once you have the survey design constructed properly, you can just execute
the svyglm like this:

https://github.com/ajdamico/usgsd/blob/d300884bd63dd05c61e8a6fa76ed7293adae55c2/Consumer
Expenditure Survey/2011 fmly intrvw - analysis examples.R#659




On Fri, Dec 12, 2014 at 7:14 PM, N F  wrote:
>
> Dear all,
> I've got a bit of a challenge on my hands. I've got survey data produced by
> a government agency for which I want to use the person-weights in my
> analyses. This is best accomplished by specifying weights in {survey} and
> then calculating descriptive statistics/models through functions in that
> package.
>
> However, there is also missingness in this data that I'd like to handle
> with imputation via {mi}. To properly use imputed datasets in regression,
> they need to be pooled using the lm.mi function in {mi}. However, I can't
> figure out how to carry out a regression on data that is properly weighted
> that has also had its missing values imputed, because both packages use
> their own mutually incompatible data objects. Does anyone have any thoughts
> on this? I've done a lot of reading and I'm not really seeing anything on
> point.
>
> Thanks in advance!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R example codes for direct standardization of rates (Reference: Thoma's Lumley's survey package)

2014-12-30 Thread Anthony Damico

hi pradip hope you're doing well!  these two scripts have age adjustment
calculations, but neither are specific to nhis.  the nhanes example is
probably closer to what you're trying to do :)

https://github.com/ajdamico/usgsd/blob/master/National%20Health%20and%20Nutrition%20Examination%20Survey/2009-2010%20interview%20plus%20laboratory%20-%20download%20and%20analyze.R

https://github.com/ajdamico/usgsd/blob/master/National%20Vital%20Statistics%20System/replicate%20age-adjusted%20death%20rate.R

On Tue, Dec 30, 2014 at 2:55 PM, Muhuri, Pradip (SAMHSA/CBHSQ) <
pradip.muh...@samhsa.hhs.gov> wrote:

> Hello,
>
> I am looking for R  example codes to compute age-standardized death rates
> by smoking and psychological distress status using person-years of
> observation created from the National Health Interview Survey Linked
> Mortality Files.  Any help with the example codes or references will be
> appreciated.
>
> Thanks,
>
> Pradip
>
> Pradip K. Muhuri
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple imputed files

2015-01-26 Thread Anthony Damico

hi nate and annelies, the survey of consumer finances and consumer
expenditure survey folders both have examples of how to run a glm on
multiply-imputed survey data.. but these examples are specifically for
complex sample survey data, which might not be what you're working with.  :)

https://github.com/ajdamico/usgsd/search?utf8=%E2%9C%93&q=svyglm



On Mon, Jan 26, 2015 at 10:30 AM, N F  wrote:

> Hi,
> I think you want the {mitools} package.
> http://cran.r-project.org/web/packages/mitools/mitools.pdf. Anthony
> Damico's site, asdfree.com, has a lot of good code examples using various
> government datasets.
>
> Nate
>
> On Mon, Jan 26, 2015 at 5:23 AM, hnlki  wrote:
>
> > Dear,
> >
> > My dataset consists out of 5 imputed files (that I did not imputed
> myself).
> > Is was wondering what is the best way to analyse them in R. I am aware
> that
> > packages to perform multiple imputation (like Mice & Amelia) exist, but
> > they
> > are used to perform MI. As my data is already imputed, I would like to
> know
> > how I can split it and how I should obtain pooled regression results. If
> I
> > can use the existing MI packages, how should I define my imputation
> > variable?
> >
> > Kind regards,
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://r.789695.n4.nabble.com/multiple-imputed-files-tp4702289.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple imputed files

2015-01-26 Thread Anthony Damico

On Mon, Jan 26, 2015 at 2:13 PM, hnlki  wrote:

> Thank you for your answers.  In fact I am using the HFCS dataset,

cool, so survey data.  asdfree.com is a good place for examples.  also, if
you can match any officially-published statistics, i would love to
collaborate on a post with you  :)

> but I need
> 3-SLS and SUR. Does the code work with system.fit as well?
>

i am not sure if those techniques have been implemented for survey data
with R.  check

http://r-survey.r-forge.r-project.org/survey/

you might be able to do it without the survey package, but donig so will
incorrectly reduce your variance calculations (and make you more confident
of a result than you should be), i believe

> My  5 imputed files are stacked together in one dataset. In order to use
>  (mitools) I need several imputed datasets. Does that mean
> that I have to split my dataset first, in order to create the
> imputationList?
>
>
yes, follow any of the examples with "mitools" on the asdfree.com github
repository where i create five data.frame objects before making an
imputationList object

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting data with svyglm

2015-02-11 Thread Anthony Damico

hi brennan, survey design objects can be subsetted with the same subset()
syntax as data.frame objects, so following jeff's advice maybe you want

svyglm( formula , design = subset( surveydesign , variable %in% c( 'value
a' , 'value b' ) ) )

for some examples of how to construct a survey design with public use data,
see http://github.com/ajdamico/usgsd


On Wed, Feb 11, 2015 at 11:49 PM, Jeff Newmiller 
wrote:

> This seems like a fundamental  misunderstanding on your part of how
> operators, and in particular logical expressions, work in computer
> languages. Consider some examples:
>
> 1+2 has a numeric answer because 1 and 2 are both numeric.
> 1+"a" has at the very least not a numeric answer because the values on
> either side of the "+" sign are not both numeric.
> TRUE | FALSE  has a logical type of answer because both sides of the
> logical "or" operator are logical.
> However, you are expressing something like
> TRUE | "a string" which might mean something but that something generally
> is not a logical type of answer.
>
> Try
> variable=="value a" | variable=="value b"
> or
> variable %in% c( "value a", "value b" )
>
> You would probably find that the Introduction to R document that comes
> with R has some enlightening examples in it. You might also find Pat Burns'
> "The R Inferno" entertaining as well (search for it in your favorite search
> engine).
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On February 11, 2015 8:42:58 PM EST, Brennan O'Banion <
> brennan.oban...@gmail.com> wrote:
> >I am aware that it is possible to specify a subset with a single
> >logical operator when constructing a model, such as:
> >svyglm(formula, design=data, subset=variable=="value").
> >
> >What I can't figure out is how to specify a subset with two or more
> >logical operators:
> >svyglm(formula, design=data, subset=variable=="value a"|"value b").
> >
> >Is it possible to specify a subset in this way using *glm without
> >having to, in my case, subset the original data, create a survey
> >design, and then fit a model?
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in svychisq and svyttest with svrepdesign

2015-03-10 Thread Anthony Damico

hi anabela, please provide a complete reproducible example.  you need to
use ?dput  -- we are not able to import "dadosSPSS.sav" so we cannot
recreate your problem in order to help you.  thanks!

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example



On Tue, Mar 10, 2015 at 9:31 AM, Anabela Afonso  wrote:

> Dear Forum,
>
> I’m working with a complex sample and with replication weights. I defined
> my design svrepdesign function. I’m trying to run svychisq and
> svyttest function
> from the survey package and I get the error:
>
>
>
> Error in crossprod(x, y) :
>
>   requires numeric/complex matrix/vector arguments
>
>
>
> I can’t understand this error. I kindly ask if someone can help me out.
>
>
>
> Thanks in advance,
>
>
>
>
> Here is my code and some output:
>
> > library(foreign); library(survey)
>
> > dados<-read.spss("dadosSPSS.sav", use.value.labels=T, to.data.frame=T)
>
> > class(dados)
>
> [1] "data.frame"
>
> > str(dados)
>
> 'data.frame':7624 obs. of  4 variables:
>
>  $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
>
>  $ Computer: Factor w/ 2 levels "Yes","NO": 1 1 2 1 1 1 1 1 1 2 ...
>
>  $ Color   : Factor w/ 3 levels "Red","Green",..: 1 1 1 1 1 1 1 1 1 1 ...
>
>  $ Number  : num  2 1 0 2 1 2 1 2 1 0 ...
>
>  $ final.w : num  1267 596 1143 1069 542 ...
>
> # Note: Variable Color with NA
>
> > repdes<-svrepdesign(data=dados, repweights=rep.w, scale=1, rscales=r.sc,
> type="JKn", weights=~final.w, combined.weights=F)
>
> > summary(repdes)
>
> Call: svrepdesign.default(data = dados, repweights = rep.w, scale = 1,
>
> rscales = r.sc, type = "JKn", weights = ~final.w, combined.weights =
> F)
>
> Stratified cluster jackknife (JKn) with 428 replicates.
>
> Variables:
>
> [1] "Sex"  "Computer" "Color""Number"   "final.w"
>
>
>
> > svytable(~Sex+Computer, repdes)
>
> Computer
>
> SexYesNO
>
>   Male   1501598.7 1063055.3
>
>   Female 1485933.1  810557.9
>
>
>
> > svytable(~Sex+Color, repdes)  # NA are ignored
>
> Color
>
> SexRed GreenYellow
>
>   Male   2060708.5  219678.4  286038.6
>
>   Female 1840511.7  229763.8  22.0
>
>
>
> > svychisq(~Sex+Computer, repdes)
>
> Error in crossprod(x, y) :
>
>   requires numeric/complex matrix/vector arguments
>
>
>
> > svychisq(~Sex+Color, repdes)
>
> Error in crossprod(x, y) :
>
>   requires numeric/complex matrix/vector arguments
>
>
>
> > svyttest(Number ~Sex, repdes)
>
> Error in crossprod(x, y) :
>
>   requires numeric/complex matrix/vector arguments
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] iconv() replaces invalid characters with " " instead of " " (two spaces instead of one) on unix?

2015-03-14 Thread Anthony Damico

hello, i am trying to replace non-ASCII characters in a character string
with a single space.  the iconv() function works as i expect it to on
windows, but on unix, non-ASCII characters are getting replaced with two
spaces instead of one.  i suppose i could write a workaround for my code,
but i'm wondering if i'm making some other mistake?

in the output below, this is the result i'm getting:
[1] "cancelaci  n"

and this is the result i want:
[1] "cancelaci n"

thanks!!

=

> getOption( "encoding" )
[1] "windows-1252"

> a <- "cancelación"
> iconv(a,"","ASCII")
[1] NA
> iconv(a,"","ASCII",sub=" ")
[1] "cancelaci  n"

=

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
 [1] R.utils_1.34.0R.oo_1.18.0   R.methodsS3_1.6.1 descr_1.0.4
 [5] SAScii_1.0downloader_0.3foreign_0.8-61MonetDB.R_0.9.5
 [9] digest_0.6.6  DBI_0.3.1

loaded via a namespace (and not attached):
[1] xtable_1.7-4

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] not a Stata version 5-12 .dta file

2015-03-19 Thread Anthony Damico

hi nicole, i have published easy to reproduce, well-documented code to
download and then analyze every file from every wave of the world values
survey here.  the download automation script should solve your problem, or
at least work around it  :)

http://www.asdfree.com/search/label/world%20values%20survey%20%28wvs%29





On Thu, Mar 19, 2015 at 6:09 PM, Nicole Ford  wrote:

> the text  didn’t send for the R data file erros.  sorry about that.
>
>
> 2015-03-19 17:41:38.544 R[398:5728] Unable to simultaneously satisfy
> constraints:
> (
> " H:|-(0)-[NSView:0x6032da20]   (Names: '|':FIFinderView:0x6080003612c0
> )>",
> " H:[NSView:0x6032da20]-(0)-|   (Names: '|':FIFinderView:0x6080003612c0
> )>",
> " H:|-(0)-[FIFinderView:0x6080003612c0]   (Names:
> '|':NSNavFinderViewFileBrowser:0x608000375fc0 )>",
> " H:[FIFinderView:0x6080003612c0]-(0)-|   (Names:
> '|':NSNavFinderViewFileBrowser:0x608000375fc0 )>",
> " H:[NSNavFinderViewFileBrowser:0x608000375fc0(585)]>",
> "  (Names: '|':NSView:0x6032da20 )>",
> "  (Names: '|':NSView:0x6032da20 )>",
> " H:[FILocationPopUp:0x603fda00(207)]>",
> " FILocationPopUp:0x603fda00.centerX>",
> " H:[FILocationPopUp:0x603fda00]-(>=10)-[SGTSearchField:0x603bf800]>",
> " H:[SGTSearchField:0x603bf800]-(11)-|   (Names:
> '|':NSView:0x60133ec0 )>",
> " H:[SGTSearchField:0x603bf800(>=218)]>"
> )
>
> Will attempt to recover by breaking constraint
>  H:[SGTSearchField:0x603bf800(>=218)]>
>
> Set the NSUserDefault
> NSConstraintBasedLayoutVisualizeMutuallyExclusiveConstraints to YES to have
> -[NSWindow visualizeConstraints:] automatically called when this happens.
> And/or, break on objc_exception_throw to catch this in the debugger.
> > str(dat)
>  chr [1:2] ".Traceback" "WV3_Data_rdata_v_2014_09_21"
> > ls(dat)
> Error in as.environment(pos) :
>   no item called ".Traceback" on the search list
> > dat
> [1] ".Traceback"  "WV3_Data_rdata_v_2014_09_21"
> > sessionInfo()
> R version 3.1.3 (2015-03-09)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.10.1 (Yosemite)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
>  [1] readstata13_0.5-3 effects_3.0-3 lattice_0.20-30   lme4_1.1-7
>   Rcpp_0.11.2   Matrix_1.1-5  runjags_1.2.1-0   sm_2.2-5.4
>  [9] foreign_0.8-63car_2.0-25
>
> loaded via a namespace (and not attached):
>  [1] coda_0.16-1  colorspace_1.2-4 grid_3.1.3   MASS_7.3-39
> mgcv_1.8-4   minqa_1.2.3  nlme_3.1-120 nloptr_1.0.0
>  nnet_7.3-9
> [10] parallel_3.1.3   pbkrtest_0.4-2   quantreg_5.11SparseM_1.6
> splines_3.1.3tools_3.1.3
> >
>
>
>
>
> > On Mar 19, 2015, at 5:08 PM, Nicole Ford  wrote:
> >
> > Ista,
> >
> > I am pulling multiple countries and multiple waves, but here is one
> country in one wave.  I know if I can get one to work, I can get them all
> to work.  I have used WVS data in the past and never encountered any
> issues, so I am at a loss here.  Thanks again!
> >
> > http://www.worldvaluessurvey.org/WVSDocumentationWV3.jsp <
> http://www.worldvaluessurvey.org/WVSDocumentationWV3.jsp>
> >
> >
> > ~Nicole
> >
> >
> >
> >
> >
> >
> >> On Mar 19, 2015, at 4:59 PM, Ista Zahn  istaz...@gmail.com>> wrote:
> >>
> >> Is the file publicly available? What is the URL?
> >>
> >> Best,
> >> Ista
> >>
> >> On Thu, Mar 19, 2015 at 4:46 PM, Nicole Ford  > wrote:
> >>> Hello, Ista.
> >>>
> >>> Honestly, I am uncertain.  I don't have STATA -- I downloaded this
> from the data source website.
> >>>
> >>> I can't imagine it is 13 because the data are old (2006).
> >>>
> >>> I tried package readstata13 out of desperation, but didn't think it
> would resolve.
> >>>
> >>> Thanks for the suggestion!
> >>>
> >>> Sent from my iPhone
> >>>
>  On Mar 19, 2015, at 4:41 PM, Ista Zahn  istaz...@gmail.com>> wrote:
> 
>  Hi Nicole,
> 
>  Is it a stata 13 data file? If so your best bet is to open it in Stata
>  and use the "saveold" command to save it as a stata 12 file.
> 
>  Best,
>  Ista
> 
> > On Thu, Mar 19, 2015 at 3:00 PM, Nicole Ford  > wrote:
> > Hello,
> >
> > I recently updated to the newest version of R and I am encountering
> issues.  Please find my error and session info below.  My data are
> attached.  I have tried the readstata13 package just in case to no avail.
> Unless I am missing something, google isn’t helping.
> >
> >
> >
> > Thank you in advance.
> >
> >
> >
> > error:
> >> dat <- read.dta(file.choose())
> > Error in read.dta(file.choose()) : not a Stata version 5-12 .dta file
> > 2015-03-19 14:14:21.445 R[398:5728] Unable to simultaneously satisfy
> constraints:
> > (
> >   " H:|-(0)-[NS

Re: [R] not a Stata version 5-12 .dta file

2015-03-19 Thread Anthony Damico

doesn't just running this solve your problem?

https://github.com/ajdamico/usgsd/blob/master/World%20Values%20Survey/download%20all%20microdata.R


On Fri, Mar 20, 2015 at 12:04 AM, Nicole Ford  wrote:

> Anthony,
>
> Thanks for this.  The issue I am having is WVS didn’t save all of their
> stata files, it seems, as .dta.  Further, the .rdata files are not loading
> correctly, either, giving me .Traceback or crashes R when I try to source
> it.  I will poke around your link to see if it can provide any insight.
>
> ~n
>
>
>
>
>
>
>
> On Mar 19, 2015, at 9:09 PM, Anthony Damico  wrote:
>
> hi nicole, i have published easy to reproduce, well-documented code to
> download and then analyze every file from every wave of the world values
> survey here.  the download automation script should solve your problem, or
> at least work around it  :)
>
> http://www.asdfree.com/search/label/world%20values%20survey%20%28wvs%29
>
>
>
>
>
> On Thu, Mar 19, 2015 at 6:09 PM, Nicole Ford  wrote:
>
>> the text  didn’t send for the R data file erros.  sorry about that.
>>
>>
>> 2015-03-19 17:41:38.544 R[398:5728] Unable to simultaneously satisfy
>> constraints:
>> (
>> "> H:|-(0)-[NSView:0x6032da20]   (Names: '|':FIFinderView:0x6080003612c0
>> )>",
>> "> H:[NSView:0x6032da20]-(0)-|   (Names: '|':FIFinderView:0x6080003612c0
>> )>",
>> "> H:|-(0)-[FIFinderView:0x6080003612c0]   (Names:
>> '|':NSNavFinderViewFileBrowser:0x608000375fc0 )>",
>> "> H:[FIFinderView:0x6080003612c0]-(0)-|   (Names:
>> '|':NSNavFinderViewFileBrowser:0x608000375fc0 )>",
>> "> H:[NSNavFinderViewFileBrowser:0x608000375fc0(585)]>",
>> ">  (Names: '|':NSView:0x6032da20 )>",
>> ">  (Names: '|':NSView:0x6032da20 )>",
>> "> H:[FILocationPopUp:0x603fda00(207)]>",
>> "> FILocationPopUp:0x603fda00.centerX>",
>> "> H:[FILocationPopUp:0x603fda00]-(>=10)-[SGTSearchField:0x603bf800]>",
>> "> H:[SGTSearchField:0x603bf800]-(11)-|   (Names:
>> '|':NSView:0x60133ec0 )>",
>> "> H:[SGTSearchField:0x603bf800(>=218)]>"
>> )
>>
>> Will attempt to recover by breaking constraint
>> > H:[SGTSearchField:0x603bf800(>=218)]>
>>
>> Set the NSUserDefault
>> NSConstraintBasedLayoutVisualizeMutuallyExclusiveConstraints to YES to have
>> -[NSWindow visualizeConstraints:] automatically called when this happens.
>> And/or, break on objc_exception_throw to catch this in the debugger.
>> > str(dat)
>>  chr [1:2] ".Traceback" "WV3_Data_rdata_v_2014_09_21"
>> > ls(dat)
>> Error in as.environment(pos) :
>>   no item called ".Traceback" on the search list
>> > dat
>> [1] ".Traceback"  "WV3_Data_rdata_v_2014_09_21"
>> > sessionInfo()
>> R version 3.1.3 (2015-03-09)
>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>> Running under: OS X 10.10.1 (Yosemite)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> other attached packages:
>>  [1] readstata13_0.5-3 effects_3.0-3 lattice_0.20-30   lme4_1.1-7
>> Rcpp_0.11.2   Matrix_1.1-5  runjags_1.2.1-0   sm_2.2-5.4
>>  [9] foreign_0.8-63car_2.0-25
>>
>> loaded via a namespace (and not attached):
>>  [1] coda_0.16-1  colorspace_1.2-4 grid_3.1.3   MASS_7.3-39
>> mgcv_1.8-4   minqa_1.2.3  nlme_3.1-120 nloptr_1.0.0
>>  nnet_7.3-9
>> [10] parallel_3.1.3   pbkrtest_0.4-2   quantreg_5.11SparseM_1.6
>> splines_3.1.3tools_3.1.3
>> >
>>
>>
>>
>>
>> > On Mar 19, 2015, at 5:08 PM, Nicole Ford  wrote:
>> >
>> > Ista,
>> >
>> > I am pulling multiple countries and multiple waves, but here is one
>> country in one wave.  I know if I can get one to work, I can get them all
>> to work.  I have used WVS data in the past and never encountered any
>> issues, so I am at a loss here.  Thanks again!
>> >
>> > http://www.worldvaluessurvey.org/WVSDocumentationWV3.jsp<
>> http://www.worldvaluessurvey.org/WVSDocumentationWV3.jsp>
>> >
>> >
>>

Re: [R] Having trouble with gdata read in

2015-03-25 Thread Anthony Damico

maybe

library(xlsx)
tf <- tempfile()
ami <- "
http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls
"
download.file( ami , tf , mode = 'wb' )
ami.data2008 <- read.xlsx( tf , sheetIndex = 1 )





On Wed, Mar 25, 2015 at 5:01 PM, Benjamin Baker  wrote:

> Trying to read and clean up the FERC data on Advanced Metering
> infrastructure. Of course it is in XLS for the first two survey years and
> then converts to XLSX for the final two. Bad enough that it is all in
> excel, they had to change the survey design and data format as well. Still,
> I’m sorting through it. However, when I try and read in the 2008 data, I’m
> getting this error:
> ###
> Wide character in print at
> /Library/Frameworks/R.framework/Versions/3.1/Resources/library/gdata/perl/
> xls2csv.pl line 270.
> Warning message:
> In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>   EOF within quoted string
> ###
>
>
>
> Here is the code I’m running to get the data:
> ###
> install.packages("gdata")
> library("gdata")
> fileUrl <- "
> http://www.ferc.gov/industries/electric/indus-act/demand-response/2008/survey/ami_survey_responses.xls
> "
> download.file(fileUrl, destfile="./ami.data/ami-data2008.xls")
> list.files("ami.data")
> dateDown.2008 <- date()
> ami.data2008 <- read.xls("./ami.data/ami-data2008.xls", sheet=1,
> header=TRUE)
> ###
>
>
> Reviewed the data in the XLS file, and both “” and # are present within
> it. Don’t know how to get the read.xls to ignore them so I can read all the
> data into my data frame. Tried :
> ###
> ami.data2008 <- read.xls("./ami.data/ami-data2008.xls", sheet=1, quote="",
> header=TRUE)
> ###
>
>
> And it spits out “More columns than column names” output.
>
>
> Been searching this, and I can find some “solutions” for read.table, but
> nothing specific to read.xls
>
>
> Many thanks,
>
>
> Benjamin Baker
>
>
>
> —
> Sent from Mailbox
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pseudo R squared for quantile regression with replicates

2014-09-18 Thread Anthony Damico

here is a reproducible example, mostly from ?withReplicates.  i think
something would have to be done using return.replicates=TRUE to manually
compute survey-adjusted residuals, but i'm not really sure what nor whether
the pseudo r^2 would be meaningful  :/


library(survey)
library(quantreg)

data(api)

## one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

## convert to bootstrap
bclus1<-as.svrepdesign(dclus1,type="bootstrap", replicates=100)

## median regression
fit <- withReplicates(bclus1, quote(coef(rq(api00~api99, tau=0.5,
weights=.weights,method="fn"

# # # no longer from ?withReplicates # # #
# from https://stat.ethz.ch/pipermail/r-help/2006-August/110386.html
rho <- function(u,tau=.5)u*(tau - (u < 0))

V <- sum(rho(fit$resid, fit$tau)) # # breaks


On Thu, Sep 18, 2014 at 1:55 PM, David L Carlson  wrote:

> It is hard to say because we do not have enough information. R has
> approximately 6,000 packages and you have not told us which ones you are
> using. You have not told us much about your data and you have not told us
> where to find the query from August 2006. The basic problem is that your
> "fit" is not the same as the "f" in the query. Your fit object is not very
> complicated. If you look at the output from str(fit) you will see that fit
> is an "atomic" vector (note the wording in your error message) with a
> series of attributes that are probably documented in the help pages for the
> functions you are using. There is nothing called resid inside fit. It is
> likely that the post you are looking at refers to the output from rq(...)
> or perhaps predict(rq(...)), but not the output from withReplicates(...,
> quote(coef(rq(... which is what fit is.
>
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Donia Smaali Bouhlila
> Sent: Thursday, September 18, 2014 9:54 AM
> To: r-help@r-project.org
> Subject: [R] Pseudo R squared for quantile regression with replicates
>
> Hi,
>
>
> I am a new user of r software. I intend to do quantile regressions with
> complex survey data using replicate method. I have ran the following
> commands successfully:
>
>
>   mydesign
>
> <-svydesign(ids=~IDSCHOOL,strata=~IDSTRATE,data=TUN,nest=TRUE,weights=~TOTWGT)
> bootdesign <- as.svrepdesign(mydesign,type="auto",replicates=150)
>
>   fit<-
>
> withReplicates(bootdesign,quote(coef(rq(Math1~Female+Age+calculator+computer+desk+
> +
>
> dictionary+internet+work+Book2+Book3+Book4+Book5+Pedu1+Pedu2+Pedu3+Pedu4+Born1+Born2,tau=0.5,weights=.weights,
> method="fn"
>
>
>
>
> I want get the pseudo R squared but I failed. I read a query dating from
> August 2006, [R] Pseudo R for Quant Reg and the answer to it:
>
>
> rho <- function(u,tau=.5)u*(tau - (u < 0))
>   V <- sum(rho(f$resid, f$tau))
>
>
>   I copied it and paste it , replacing f by fit I get this error message:
> Error in fit$resid : $ operator is invalid for atomic vectors, I don't
> know what it means
>
> The fit object is likely to be quite complicated  I used str() to see
> what it looks like:
>
>
>
> str (fit)
> Class 'svrepstat'  atomic [1:19] 713.24 -24.01 -18.37 9.05 7.71 ...
>..- attr(*, "var")= num [1:19, 1:19] 2839.3 10.2 -122.1 -332.4 -42.3
> ...
>.. ..- attr(*, "dimnames")=List of 2
>.. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
>.. .. ..$ : chr [1:19] "(Intercept)" "Female" "Age" "calculator" ...
>.. ..- attr(*, "means")= Named num [1:19] 710.97 -24.03 -18.3 9.39
> 7.58 ...
>.. .. ..- attr(*, "names")= chr [1:19] "(Intercept)" "Female" "Age"
> "calculator" ...
>..- attr(*, "statistic")= chr "theta"
>
> How can I retrieve the residuals?? and calculate the pseudo R squared??
>
>
> Any help please
>
>
> --
> Dr. Donia Smaali Bouhlila
> Associate-Professor
> Department of Economics
> Faculté des Sciences Economiques et de Gestion de Tunis
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using "survey" package with ACS PUMS

2014-09-30 Thread Anthony Damico

hi michael, you probably need

options( "survey.replicates.mse" = TRUE )


i also think you don't want type = "Fay" and you do want scale = 4/80 and
rscales = rep( 1 , 80 )  as well, but what you have might be equivalent
(not sure)


regardless, this blog post details how to precisely replicate the census
bureau's estimates using the acs pums with R

http://www.asdfree.com/search/label/american%20community%20survey%20%28acs%29









On Tue, Sep 30, 2014 at 9:17 AM, 
wrote:

>
> I'm trying to reproduce some results from the American Community Survey
> PUMS data using the "survey" package. I'm using the one-year 2012 estimates
> for New Hampshire
> (http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the
> estimates for user verification from
>
> http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv
>
> Once the age groups are set up as specified in the verification estimates,
> the following SAS code produces the correct estimated totals with standard
> errors:
>
> proc surveyfreq data = acs2012 varmethod = jackknife;
>   weight pwgtp;
>   repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05;
>   table SEX agegroup;
> run;
>
> I've not been successful in reproducing the standard errors with R,
> although they are very close. My code follows; what revisions do I need to
> make?
>
> Thanks,
> Mike L.
>
> # load estimates for verification
> pums_est <- read.csv("pums_estimates_12.csv")
> pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4]))
>
> # load PUMS data
> pums_p <- read.csv("ss12pnh.csv")
> # convert sex and age group to factors
> pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F"))
> pums_p$agegrp <- cut(pums_p$AGEP,
>  c(0,5,10,15,20,25,35,45,55,60,65,75,85,100),
>  right = FALSE)
>
> # create replicate-weight survey object
> library(survey)
> pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
>   weights = pums_p[,7],
>   combined.weights = TRUE,
>   type = "Fay", rho = 1 - 1/sqrt(4),
>   scale = 1, rscales = 1,
>   data = pums_p)
>
> # using type "JK1" with scale = 4/80 and rscales = rep(1,80)
> #   seems to produce the same results
>
> # total population by sex with SE's
> by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
> round(by.sex[1,4:5])
> # se1  se2
> # 33 1606 1606
> # compare results with Census
> pums_est[966:967, 5]
> #[1] 1610 1610
>
> # total population by age group with SE's
> by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
> round((by.agegrp)[15:27])
> #   se1  se2  se3  se4  se5  se6  se7  se8  se9 se10 se11 se12 se13
> #33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200  880 1700 1678
> # compare results with Census
> pums_est[968:980, 5]
> #  [1]  874 2578 2613 1463 1399 1476 1493 1555 2191 2200  880 1702 1684
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vcov function and cross terms

2014-10-15 Thread Anthony Damico

it might be slightly different, but i think the result is very close to a
tsl result (which hasn't been implemented)..  could you use this?

mns<-svyby(~api00+api99, ~stype, rclus1, svytotal,covmat=TRUE)
vcov(mns)




On Wed, Oct 15, 2014 at 9:27 AM, Daniela Droguett <
daniela.droguett.l...@gmail.com> wrote:

> Hi,
>
> I would like to apply the vcov function from the survey package for the
> variables api00 and api99 grouped by the stype variable which can assume H,
> M and E categories.
>
> From the code in the survey package manual:
>
> data(api)
> dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
> rclus1<-as.svrepdesign(dclus1)
> mns<-svyby(~api00, ~stype, rclus1, svymean,covmat=TRUE)
> vcov(mns)
>
> I have tried the following changes in order to get the variance matrix
> estimation (as part of Taylor Linearization).
>
> > data(api)
> > dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
>
> > mns<-svyby(~api00+api99, ~stype, dclus1, svytotal)
> > mns
>   stype api00 api99 se.api00 se.api99
> E E 3162561.8 2962356.8 842713.7 796474.0
> H H  293115.0  282283.9 104059.8 101492.9
> M M  534308.7  514982.0 108710.7 105036.6
> > vcov(mns)
>  E:api00 H:api00 M:api00  E:api99 H:api99
> M:api99
> E:api00 710166313660   0   00   0
> 0
> H:api000 10828434454   00   0
> 0
> M:api000   0 118180068320   0
> 0
> E:api990   0   0 634370797647   0
> 0
> H:api990   0   00 10300818294
> 0
> M:api990   0   00   0
> 11032691751
> Warning message:
> In vcov.svyby(mns) : Only diagonal elements of vcov() available
>
> How to obtain the cross terms in the matrix above?
>
> I have no clue on how to implement that.
>
> Thanks a lot!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "survey" package -- doesn't appear to match svy

2014-10-28 Thread Anthony Damico

could you provide a minimal reproducible example?  perhaps use ?dput.


in general the survey package matches all other languages
http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Damico.pdf


here's an example of a minimal reproducible example that does match
http://www.ats.ucla.edu/stat/stata/faq/svy_stata_subpop.htm


library(foreign)
library(survey)
x <- read.dta( "
http://www.ats.ucla.edu/stat/stata/seminars/svy_stata_intro/strsrs.dta"; )
y <- svydesign( ~ 1 , strat = ~strat , data = x , weights = ~ pw , fpc =
~fpc )
svytable( ~ yr_rnd , y )
z <- subset( y , yr_rnd == 1 )
svymean( ~ ell , z )



there are lots of examples of R matching official published statistics
(often computed with stata) here--
https://github.com/ajdamico/usgsd/search?utf8=%E2%9C%93&q=replication





On Mon, Oct 27, 2014 at 11:11 AM, Stephen Amrock 
wrote:

> Hi,
>
> I'm new to R and have encountered two issues in coding using the "survey"
> package:
>
> (1) Code from *svytable* using "survey" package does not correspond to
> Stata estimates from *svy: tab*. I call
>
> svyd.nation <- svydesign(ids = ~1, probs = ~wt_national, strata =
> ~stratum, data=nats.sub)
> svytable(formula = ~wpev, design = svyd.nation, Ntotal = 100)
>
> where the equivalent Stata would be:
>
> svyset [pw=wt_national], strata(stratum)
> svy: tab wpev
>
> These are inverse probability weights but Stata and R give different %s for
> the tabulations. I know from published results that the Stata code is
> correct.  Any ideas as to what I've done incorrectly in R?
>
> (2) Alternative weights---which I've verified in R have the same
> distribution as in Stata---produce a strange "inf" output.
>
> svyd.st <- svydesign(ids = ~1, probs = ~wt_state, strata = ~stratum,
> data=nats.sub)
> svytable(~wpev, design = svyd.st, Ntotal=100)
>
> This produces the following output:
>
> wpev
> 0 1
>
> The code
> svytable(~wpev, svyd.st)
>
> produces the output:
>
> wpev
> 0   1
> Inf Inf
>
> Why are these alternative weights -- which work in Stata when resetting
> svyset -- not working in R?
>
> Any insights would be much appreciated!! Many thanks!
> - S
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply error svyby function "survey" package

2014-11-12 Thread Anthony Damico

try resetting your levels?  if that doesn't work, please dput() an example
data set that we can test with :) thanks!

sii.design <- update( sii.design , d6 = factor( d6 ) )






On Wed, Nov 12, 2014 at 7:59 AM, Martin Canon 
wrote:

> Hi.
>
>
> I'm trying to calculate the weighted mean score of a quality of life
> measure (ovt) in patients with irritable bowel syndrome by their
> marital status (d7).
>
> This is a summary of the structure of the dataset:
>
> > str(sii.tesis)
> 'data.frame':1063 obs. of  75 variables:
>  $ id : int  51 52 53 54 55 56 57 58 59 60 ...
>  $ stratum: Factor w/ 6 levels "MEst","MAcad",..: 1 4 NA 4 4 1 6 NA 4
> 4 ...
>  $ expfc  : num  22.8 17.1 NA 17.1 17.1 ...
>  $ d6 : Factor w/ 3 levels "Estudiante","Profesor",..: 1 1 NA
> 1 1 1 3 NA 1 1 ...
>  $ d7 : Factor w/ 6 levels "Soltero","Casado",..: 1 1 NA 1 1 1
> 1 NA 1 1 ...
>  $ d7c: Factor w/ 2 levels "No estable","Estable": 1 1 NA 1 1
> 1 1 NA 1 1 ...
>  $ s1cm   : Factor w/ 2 levels "No","Si": 1 2 NA 1 1 1 2 NA 1 1 ...
>  $ ovt: num  NA 93.4 NA NA NA ...
>
> I declared the sampling design:
>
> > sii.design <- svydesign(
>   id = ~1,
>   strata = ~stratum,
>   weights = ~expfc,
>   data = subset(sii.tesis, !is.na(stratum)))
>
> Then I tried to get the result:
>
> > svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)
>
> but i get the error:
>
> Error in tapply(1:NROW(x), list(factor(strata)), function(index) { :
>   arguments must have same length
>
>
> The length of both variables is the same. If the variable ovt exists,
> there is a d7 match in the data frame.
>
> I try the same thing using another variable instead - "role" (d6) -
> and it works.
>
> > svyby(~ovt, ~d6, sii.design, svymean, na.rm = TRUE, level = 0.95)
>d6  ovt   se
> Estudiante Estudiante 71.01805 1.370569
> Profesor Profesor 72.30923 6.518378
> Administrativo Administrativo 75.69102 3.715050
>
> If I use the recategorized d7 variable (d7c,  two levels only) it works
> too:
>
> > svyby(~ovt, ~d7c, sii.design, svymean, na.rm = TRUE, level = 0.95)
>   d7c  ovt  se
> No estable No estable 70.92344 1.37460
> Estable   Estable 74.53719 4.16954
>
>
> What could be the problem?
>
>
> Regards.
>
>
> Martin Canon
> Colombia, South America
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply error svyby function "survey" package

2014-11-12 Thread Anthony Damico

hi martin, sending the first 25 rows does not help if it does not re-create
the problem..  when i run the data you have provided, i do not encounter
your problem (see below).  someone else may be able to guess the issue, but
this would be a lot easier to solve if you can create a minimal
reproducible example

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


sii.tesis <-
structure(list(id = c(51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L,
73L, 74L, 75L, 76L), stratum = structure(c(1L, 4L, NA, 4L, 4L,
1L, 6L, NA, 4L, 4L, 1L, 1L, 1L, 6L, 6L, 3L, 3L, 6L, NA, 1L, 1L,
6L, 4L, 3L, 6L), .Label = c("MEst", "MAcad", "MAdm", "FEst",
"FAcad", "FAdm"), class = "factor"), expfc = c(22.8195266723633,
17.0644626617432, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
5.1702127456665, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
22.8195266723633, 22.8195266723633, 5.1702127456665, 5.1702127456665,
6.24137926101685, 6.24137926101685, 5.1702127456665, NA, 22.8195266723633,
22.8195266723633, 5.1702127456665, 17.0644626617432, 6.24137926101685,
5.1702127456665), d7 = structure(c(1L, 1L, NA, 1L, 1L, 1L, 1L,
NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, NA, 1L, 1L, 6L, 1L,
6L, 6L), .Label = c("Soltero", "Casado", "Separado", "Divorciado",
"Viudo", "Union libre"), class = "factor"), ovt = c(NA, 93.3823547363281,
NA, NA, NA, NA, 83.8235321044922, NA, NA, NA, NA, NA, NA, NA,
79.4117660522461, NA, NA, 19.1176471710205, NA, NA, NA, 85.2941207885742,
NA, NA, NA)), .Names = c("id", "stratum", "expfc", "d7", "ovt"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"21", "22", "23", "24", "25"), class = "data.frame")

 sii.design <- svydesign(
  id = ~1,
  strata = ~stratum,
  weights = ~expfc,
  data = subset(sii.tesis, !is.na(stratum)))

svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)


# works fine---
> svyby(~ovt, ~d7, sii.design, svymean, na.rm = TRUE, level = 0.95)
 d7  ovt   se
Soltero Soltero 88.94329 3.333485
Casado   Casado 19.11765 0.00
Union libre Union libre 85.29412 0.00






On Wed, Nov 12, 2014 at 5:25 PM, Martin Canon 
wrote:

> Anthony, thanks for your reply.
>
> Resetting the levels didn't work.
>
> These are the first 25 rows of the dataset:
>
> structure(list(id = c(51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
> 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L,
> 73L, 74L, 75L, 76L), stratum = structure(c(1L, 4L, NA, 4L, 4L,
> 1L, 6L, NA, 4L, 4L, 1L, 1L, 1L, 6L, 6L, 3L, 3L, 6L, NA, 1L, 1L,
> 6L, 4L, 3L, 6L), .Label = c("MEst", "MAcad", "MAdm", "FEst",
> "FAcad", "FAdm"), class = "factor"), expfc = c(22.8195266723633,
> 17.0644626617432, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
> 5.1702127456665, NA, 17.0644626617432, 17.0644626617432, 22.8195266723633,
> 22.8195266723633, 22.8195266723633, 5.1702127456665, 5.1702127456665,
> 6.24137926101685, 6.24137926101685, 5.1702127456665, NA, 22.8195266723633,
> 22.8195266723633, 5.1702127456665, 17.0644626617432, 6.24137926101685,
> 5.1702127456665), d7 = structure(c(1L, 1L, NA, 1L, 1L, 1L, 1L,
> NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, NA, 1L, 1L, 6L, 1L,
> 6L, 6L), .Label = c("Soltero", "Casado", "Separado", "Divorciado",
> "Viudo", "Union libre"), class = "factor"), ovt = c(NA, 93.3823547363281,
> NA, NA, NA, NA, 83.8235321044922, NA, NA, NA, NA, NA, NA, NA,
> 79.4117660522461, NA, NA, 19.1176471710205, NA, NA, NA, 85.2941207885742,
> NA, NA, NA)), .Names = c("id", "stratum", "expfc", "d7", "ovt"
> ), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
> "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
> "21", "22", "23", "24", "25"), class = "data.frame")
>
> Regards.
>
> Martin
>
> On Wed, Nov 12, 2014 at 1:39 PM, Anthony Damico 
> wrote:
> > try resettin

Re: [R] SVYPLOT

2014-11-20 Thread Anthony Damico

survey:::svyplot.default   with style="grayhex"   calls
hexbin:::gplot.hexbin

an internet search turns up lots of people asking the question "how do i
set xlim and ylim on hexbin plots?" but i don't see any easy solutions.  :/

On Thu, Nov 20, 2014 at 10:31 AM, Raphael Fraser 
wrote:

> Does not work when ,style="grayhex".
>
> library(survey)
> data(api)
> dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat,
> fpc=~fpc)
> svyplot(api00~api99, design=dstrat, style="grayhex")
> svyplot(api00~api99, design=dstrat, style="grayhex", ylim=c(500, 700))
>
> On Thu, Nov 20, 2014 at 9:19 AM, Adams, Jean  wrote:
>
> > Raphael
> >
> > I just ran an example from the help file, and the xlim argument worked
> > fine.  Can you post a small example where the xlim argument doesn't work?
> >
> > Jean
> >
> > library(survey)
> > data(api)
> > dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat,
> > fpc=~fpc)
> > svyplot(api00~api99, design=dstrat, style="bubble")
> > svyplot(api00~api99, design=dstrat, style="bubble", xlim=c(500, 700))
> >
> >
> > On Thu, Nov 20, 2014 at 12:54 AM, Raphael Fraser <
> raphael.fra...@gmail.com
> > > wrote:
> >
> >> How do I set the limits of my x and y axis in svyplot? xlim and ylim
> does
> >> not work.
> >>
> >> Regards,
> >> Raphael
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] r function idea: minimize() to turn reproducible_example.R into minimal_reproducible_example.R

2015-09-10 Thread Anthony Damico

just going to throw this idea out there in case it's something that anyone
wants to pursue: if i have an R script and i'm hitting some unexpected
behavior, there should be some way to remove extraneous objects and
manipulations that never touch the line that i'm trying to reproduce.
automatically stepping through the code and removing things that never
affect the final line seems (difficult but) possible.  so if my_example.R
looks like this code and i didn't understand why i was hitting an error at
the third line..

x <- mtcars
y <- mean( mtcars$mpg )
mean( x[ , "hello" ] )

..the function i am envisioning would automatically remove the `y <- mean(
mtcars$mpg )` because that object and all subsequent objects do not affect
the error resulting from the third line.  in other words, pointing this
minimize() function to the error..

minimize( "my_example.R" , 'Error in `[.data.frame`(x, , "hello") :
undefined columns selected' )

..would find that the error happens on the third line, then follow things
backward and remove any command that does not touch the line that results
in the error.  so my reproducible example was three lines, but the minimal
reproducible example became two lines.

i understand it might be impossible to automate all of the minimizing, but
i think there might be enough low-hanging fruit here that this might be a
quick and useful debugging tool for those of us trying to create
easier-to-reproduce code for members of this list.

thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with svychisq

2015-09-10 Thread Anthony Damico

could you try this, and then not use factor(age) elsewhere?

sv1 <- update( sv1 , age = factor( age ) )

if that doesn't work, is it possible for you to share a reproducible
example? thanks


On Thu, Sep 10, 2015 at 4:51 PM, Emanuele Mazzola  wrote:

> Hello,
>
> I’m having a weird issue with the function “svychisq” in package “survey",
> which would be very helpful for me in this case.
>
> I’m tabulating age categories (a factor variable subdivided into 4
> categories: [18,25), [25, 45), [45,65), [65, 85) ) with respect to
> ethnicity/race (another factor variable subdivided into “hispanic white”,
> “non hispanic black”, “hispanic black”).
>
> I’m perfectly able to get to the “svytable" object, which looks like this
>
> > svytable(~age+ETN, design=sv1)
>  ETN
> age   hisp black hispanic white non hisp black
>   [18,25)   26.97019  798.87444  183.61834
>   [25,45)  145.19650 4783.47678  854.82748
>   [45,65)  104.83682 2537.15021  595.04924
>   [65,85]0.00.00.0
>
>  Since it has last row equal to 0 (which would give me troubles with the
> corresponding chi-square p-value), I try to get rid of it by using
>
> > svytable(~factor(age)+ETN, design=sv1)
>ETN
> factor(age) hisp black hispanic white non hisp black
> [18,25)   26.97019  798.87444  183.61834
> [25,45)  145.19650 4783.47678  854.82748
> [45,65)  104.83682 2537.15021  595.04924
>
> which exactly responds to what I’m looking for and to what I’m expecting.
>
> The design is built by using
>
> sv1 = svydesign(ids=~factor(age)+ETN, weights=~WTFA.n, data=totfor)
>
> Now, if I would like to evaluate the corresponding weighted chi squared
> test, I use
>
> svychisq(~factor(age)+ETN, design=sv1)
>
> but here’s what I get from R:
>
> > svychisq(~factor(age)+ETN, design=sv1)
> Error in `[.data.frame`(design$variables, , as.character(rows)) :
>   undefined columns selected
>
> Maybe it is a stupid question but I really can’t figure out where the
> error is.
>
> Could you please help me with this?
> Thanks in advance for any information you will provide me with!
>
> Emanuele
>
> ***
> Emanuele Mazzola, Ph.D.
> Department of Biostatistics & Computational Biology
> Dana-Farber Cancer Institute
> 450 Brookline Ave
> Mail Location: LC1056
> Office Location: Longwood Center, Room 1056
> Boston, MA 02215
> Office phone 617-582-7614
> Fax 617-632-2516
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 32 bit windows version of r 3.2.2 crashes a lot at first internet connection attempts

2015-11-06 Thread Anthony Damico

hi, just throwing this out there.  it's not clear to me how to reproduce
the crashes, because they are sporadic (but common)

guessing it's related to the switched default of setInternet2(TRUE) but not
sure

here's a semi-absurd screenshot

http://s17.postimg.org/70omgtmi7/early_crashes.png


i had already submitted one (reproducible) bug on the topic, but not sure
of the best ways to find others lurking

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16513

thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] .Internal(La_rs(x, FALSE)) crashes R after long (reproducible) script on windows only

2016-02-23 Thread Anthony Damico

hi, does anybody have a clue why .Internal(La_rs(x,FALSE)) is getting
corrupted (actual detonation occurs within La_solve_cmplx within Lapack.c)
on windows but not mac/unix?

i have provided two (long) scripts that reproduce the problem and a third
script modified to trigger the crash that unfortunately does not reproduce
the problem

http://stackoverflow.com/questions/35447971/internalla-rsx-false-crashes-r-after-long-reproducible-script-on-windows

thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] .Internal(La_rs(x, FALSE)) crashes R after long (reproducible) script on windows only

2016-02-23 Thread Anthony Damico

hi, thank you,

at the point that the corruption exists, the line
`.Internal(La_rs(x,FALSE))` actually breaks without needing `eigen`

i have provided a reproducible example but agree it might not be minimal --
i did try removing various sections, each time the bug unfortunately
vanished.  note the february 22nd edit: even interspersing the script with
the line that triggers the crash prevents the crash in the first place!

i think this occurs in C and not R, and would appreciate pointers about how
one might do that?  the only advice i've have is rebuilding R with a debug
build and gdb, but this seems like a huge lift --  are there any shortcuts
here for someone mostly unfamiliar with C code and even setup?  general
advice on this thread might also help me crack this case  :)

stackoverflow.com/questions/35455135/general-suggestions-on-debugging-internal-in-r

thanks for your time

On Tue, Feb 23, 2016 at 8:22 AM, Duncan Murdoch 
wrote:

> On 23/02/2016 7:49 AM, Anthony Damico wrote:
>
>> hi, does anybody have a clue why .Internal(La_rs(x,FALSE)) is getting
>> corrupted (actual detonation occurs within La_solve_cmplx within Lapack.c)
>> on windows but not mac/unix?
>>
>> i have provided two (long) scripts that reproduce the problem and a third
>> script modified to trigger the crash that unfortunately does not reproduce
>> the problem
>>
>>
>> http://stackoverflow.com/questions/35447971/internalla-rsx-false-crashes-r-after-long-reproducible-script-on-windows
>>
>
> Just two comments:
>
>  - Your post suggests you're calling .Internal() yourself, but that's not
> the case.  So your question should be about why eigen() crashes R.
>
>  - If you need a long script to trigger the error, I'd assume there's
> something wrong in that script. Your script uses several contributed
> packages, so the problem could be there. Shorten it to a minimal
> reproducible example that doesn't use any contributed packages.  If you
> can't leave out the packages, try to reduce it to just one, and ask the
> maintainer of that package about it.
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading large .pxt and .asc datasets causes issues.

2016-02-23 Thread Anthony Damico

hi eiko, LaF is incompatible with survey data, that road is a dead-end.
this code below will painlessly load brfss into R, review the link douglas
sent for analysis examples and change `years.to.download <- ` to 2006 only
if you just want a single year of microdata.  glhf


# install.packages( c("MonetDB.R", "MonetDBLite" , "survey" , "SAScii" ,
"descr" , "downloader" , "digest" ) , repos=c("
http://dev.monetdb.org/Assets/R/";, "http://cran.rstudio.com/";))

# setInternet2( FALSE )# # only windows users need
this line
# options( encoding = "windows-1252" )# # only macintosh and *nix
users need this line
library(downloader)
# setwd( "C:/My Directory/BRFSS/" )
years.to.download <- 1984:2014
source_url( "
https://raw.githubusercontent.com/ajdamico/asdfree/master/Behavioral%20Risk%20Factor%20Surveillance%20System/download%20all%20microdata.R";
, prompt = FALSE , echo = TRUE )





On Tue, Feb 23, 2016 at 4:39 PM, Federman, Douglas <
douglas.feder...@utoledo.edu> wrote:

> You might want to look at Anthony Damico's work at
>
>
> http://www.asdfree.com/search/label/behavioral%20risk%20factor%20surveillance%20system%20%28brfss%29
>
> --
> Better name for the general practitioner might be multispecialist.
> ~Martin H. Fischer (1879-1962)
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Torvon
> Sent: Tuesday, February 23, 2016 2:13 PM
> To: r-help@r-project.org
> Subject: [R] Loading large .pxt and .asc datasets causes issues.
>
> Hi,
>
> I want to load a dataset into R. This dataset is available in two formats:
> .XPT and .ASC. The dataset is available at
> http://www.cdc.gov/brfss/annual_data/annual_2006.htm.
>
> They are about 40mb zipped, and about 500mb unzipped.
>
> I can get the .xpt data to load, using:
>
> > library(hmisc)
> > data <- sasxport.get("CDBRFS06.XPT")
>
> The data look fine, no error messages. However, the data only contains 302
> columns, which is less than it should have (according to the
> documentation). It does not contain my variables of interest, so either the
> documentation or the data file is wrong, and I want to make sure it's not
> the data file.
>
> Hence I wanted to see if I get the same results loading the .ASC file.
> However, multiple ways to do so have failed.
>
> > library(adehabitat)
> > import.asc("CDBRFS06.asc")
>
> Results in:
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> : scan() expected 'a real', got '1191.8808943.38209868648.960119'
>
> > library(SDMTools)
> > read.asc("CDBRFS06.asc")
>
> Results in:
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> : scan() expected 'a real', got '1191.8808943.38209868648.960119' In
> addition: Warning messages: 1: In scan(file, what, nmax, sep, dec, quote,
> skip, nlines, na.strings, : number of items read is not a multiple of the
> number of columns 2: In scan(file, what, nmax, sep, dec, quote, skip,
> nlines, na.strings, : number of items read is not a multiple of the number
> of columns 3: In scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, : number of items read is not a multiple of the number of
> columns 4: In scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, : number of items read is not a multiple of the number of
> columns 5: In scan(file, nmax = nl * nc, skip = 6, quiet = TRUE) : NAs
> introduced by coercion to integer range
>
> Thank you for your help.
>Eiko
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] is this an R bug or a DBI bug?

2016-02-27 Thread Anthony Damico

this happens with both SQLite and MonetDBLite, so i assume it is not an
RSQLite bug.

notice the gc() in the no-crash version..

thanks


# initiate R with "C:\Program Files\R\R-3.2.3\bin\x64\Rterm.exe"
--max-mem-size=35M
library(RSQLite)
db <- dbConnect( SQLite() )
for( i in 1:1000 ) { dbWriteTable( db , 'x' , mtcars , append = TRUE ) }
# CRASH


# initiate R with "C:\Program Files\R\R-3.2.3\bin\x64\Rterm.exe"
--max-mem-size=35M
library(RSQLite)
db <- dbConnect( SQLite() )
for( i in 1:1000 ) { dbWriteTable( db , 'x' , mtcars , append = TRUE )
; gc() }
# no crash







> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] RSQLite_1.0.0 DBI_0.3.1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is this an R bug or a DBI bug?

2016-02-28 Thread Anthony Damico

tested this out on 3.2.0, 3.2.1, and 3.2.2 -- only happens on 3.2.3, so i
assume it was an R bug not a DBI bug.  submitted here:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16734




On Sat, Feb 27, 2016 at 6:20 PM, Anthony Damico  wrote:

> this happens with both SQLite and MonetDBLite, so i assume it is not an
> RSQLite bug.
>
> notice the gc() in the no-crash version..
>
> thanks
>
>
> # initiate R with "C:\Program Files\R\R-3.2.3\bin\x64\Rterm.exe"
> --max-mem-size=35M
> library(RSQLite)
> db <- dbConnect( SQLite() )
> for( i in 1:1000 ) { dbWriteTable( db , 'x' , mtcars , append = TRUE )
> }
> # CRASH
>
>
> # initiate R with "C:\Program Files\R\R-3.2.3\bin\x64\Rterm.exe"
> --max-mem-size=35M
> library(RSQLite)
> db <- dbConnect( SQLite() )
> for( i in 1:1000 ) { dbWriteTable( db , 'x' , mtcars , append = TRUE )
> ; gc() }
> # no crash
>
>
>
>
> 
>
>
> > sessionInfo()
> R version 3.2.3 (2015-12-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] RSQLite_1.0.0 DBI_0.3.1
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using final sample weight in survey package

2016-04-04 Thread Anthony Damico

hi, probably not..  if your survey dataset has a complex design (like
clusters/strata), you need to include them in the `svydesign` call.
coercing an incorrect survey design into a replicate-weighted design will
not fix the problem of failing to account for the sampling strategy

On Mon, Apr 4, 2016 at 12:01 AM, José Fernando Zea  wrote:

> I have the final sample weight (expansion factor) from a socieconomic
> survey. I don't know the exact design used in the study ( (probably is a
> stratified two-stage design).
>
> To illustrate my problem I will use the next dataset which have a sample
> weight (but the design is not specified) and incorporate the design with
> svydesign and create some bootstrap replicates in order to be able to
> produce estimations.
>
> Is that correct?:
>
>
> load(url("http://knutur.at/wsmt/R/RData/small.RData";))
> library(survey)
> small.w <- svydesign(ids = ~1, data = small, weights = small$weight)
> design<-as.svrepdesign(small.w,type="bootstrap", replicates=100)
>
>
>
> Cordialmente
> Jose F. Zea
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantiles on multiply imputed survey data - mitools

2016-05-10 Thread Anthony Damico

is the `with` not passing make.formula( get( 'var_name' ) ) through to
svyquantile for some reason?  does this work?

MIcombine( with(des, svyquantile(~LBXTCD, .5)))


if that's not it, could you make a minimal reproducible example that
includes the data download?  code to download and import nhanes here

https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey



On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler <
abichte...@toxstrategies.com> wrote:

> Hello, and thank you for considering this question:
>
> The svystat object created with multiply imputed NHANES data files is
> failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the
> issue correctly, whether the behavior is expected, and whether y'all might
> have any ideas for workarounds.
>
> I'm following T. Lumley's general method outlined here:
> http://faculty.washington.edu/tlumley/old-survey/svymi.html, but with
> data files I've imputed myself on the 2001/2002 biennial. Each file has
> 1081 observations and no missing values.
>
> ### Create the survey design object with list of imputed data files
> ImputedList0102.
> des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(ImputedList0102), nest=TRUE)
>
>
> ### Blood analyte of interest
> var_name <- "LBXTCD" # analyte in blood serum
>
> ### All is well calculating the mean:
> M <- with(des, svymean(make.formula(get('var_name'
> summary(M)
> Result <- MIcombine(M)
> Result$coefficients
> # LBXTCD
> # 17.41635
>
>
> ### but svystat object fails to calculate a 50th percentile:
> ### it fails when hard-coding the name rather than using make.formula;
> ### it fails regardless of number of files or choices in handling ties or
> interval type.
> ### There are 16 ties in each data file.
> M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles =
> c(.5)))
> summary(M1)
>
> # Length Class  Mode
> #[1,] 1  -none- numeric
> #[2,] 1  -none- numeric
> #[3,] 1  -none- numeric
>
>
> ### The quantile is successfully calculated on one file at a time,
> however, and is different for each file.
> ### (had thought perhaps there was a lack-of-variance issue). The quantile
> calculated on each file
> ### is the same regardless of interval.type.
> des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=ImputedList0102[[1]], nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single1, c(.5))
> # 0.5
> # LBXTCD 13.5554
>
>
> des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=ImputedList0102[[2]], nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single2, c(.5))
> # 0.5
> # LBXTCD 14.06154
>
> # The number of observations exceeding the 50th percentile differs for
> each file, which I can't claim to understand.
>
> # I removed the 16 ties, but no help. Do the ties and/or different number
> of observations above/below prevent the svydesigns from being combined?
> nrow(subset(ImputedList0102[[1]], LBXTCD > 13.5554))
> # [1] 516
> nrow(subset(ImputedList0102[[2]], LBXTCD > 14.06154))
> # [1] 512
>
>
> I'm hoping someone can point me to some gross error I'm making or another
> function parameter or data manipulation or another survey-savvy method
> altogether to calculate a 50th percentile across multiply imputed data
> files. Thanks for any advice,
>
> Brennan
>
> www.toxstrategies.com
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantiles on multiply imputed survey data - mitools

2016-05-11 Thread Anthony Damico

hi, you want   se=T

M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')),
quantiles = c(.5),se=T))
MIcombine(M_quantile)



Multiple imputation results:
  with(des_mult, svyquantile(make.formula(get("var_name")), quantiles =
c(0.5),
se = T))
  MIcombine.default(M_quantile)
   results   se
LBXTCD 12.7978 6.917285






On Wed, May 11, 2016 at 12:09 PM, Anne Bichteler <
abichte...@toxstrategies.com> wrote:

> Thanks for looking. No, for the quantiles it fails to instantiate the
> collection of designs correctly, whether hard-coding the variable name or
> using make.formula. 'with' passes make.formula correctly when calculating
> the mean, e.g. this works:
>
> MIcombine( with(des, svymean(make.formula(get('var_name')
>
> # Here's a reproducible example.
>
> DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
>   SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
>   WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959,
> 65085, 21765, 197775, 49931),
>   LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9,
> 15.3, 13.8, 84.5))
>
> DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
>   SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
>   WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959,
> 65085, 21765, 197775, 49931),
>   LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4,
> 10.5, 84.5))
>
> var_name <- "LBXTCD"
>
> # Individually svyquantile (and svymean) work:
> des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=Df1_red, nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm =
> FALSE)
>
> des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=Df2_red, nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm =
> FALSE)
>
> Imputed_list <- c()
> Imputed_list[[1]] <- DF1
> Imputed_list[[2]] <- DF2
>
> # svymean works (so the svydesign object is fine?) but svyquantile doesn't:
> des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(Imputed_list), nest=TRUE)
> M_mean <- with(des_mult, svymean(make.formula(get('var_name'
> summary(M_mean)
> M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')),
> quantiles = c(.5)))
> summary(M_quantile)
>
>
> Thanks again,
>
> Brennan
>
> www.toxstrategies.com
>
>
> From:  Anthony Damico 
> Date:  Tuesday, May 10, 2016 at 10:37 PM
> To:  Anne Bichteler 
> Cc:  "r-help@r-project.org" 
> Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools
>
>
> is the `with` not passing make.formula( get( 'var_name' ) ) through to
> svyquantile for some reason?  does this work?
>
> MIcombine( with(des, svyquantile(~LBXTCD, .5)))
>
>
>
> if that's not it, could you make a minimal reproducible example that
> includes the data download?  code to download and import nhanes here
>
>
> https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey
>
>
>
>
>
> On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler
>  wrote:
>
> Hello, and thank you for considering this question:
>
> The svystat object created with multiply imputed NHANES data files is
> failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the
> issue correctly, whether the behavior is expected, and whether y'all might
> have any ideas for workarounds.
>
> I'm following T. Lumley's general method outlined here:
> http://faculty.washington.edu/tlumley/old-survey/svymi.html <
> http://faculty.washington.edu/tlumley/old-survey/svymi.html>, but with
> data files I've imputed myself on the 2001/2002 biennial. Each file has
> 1081 observations and no missing values.
>
> ### Create the survey design object with list of imputed data files
> ImputedList0102.
> des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(ImputedList0102), nest=TRUE)
>
>
> ### Blood analyte of interest
> var_name <- "LBXTCD" # analyte in blood serum
>
> ### All is well calculating the mean:
> M <- with(des, svymean(make.formula(get('var_name'
> summary(M)
> Result <- MIcombine(M)
> Result$coefficients
> # LBXTCD
> # 17.41635
>
>
> ### but svystat object fails to calculate a 50th percentile:
> ### it fails when hard-coding the name rather than using make.formula;
&g

Re: [R] Significance of Svyrepdesign Object Warning

2016-10-23 Thread Anthony Damico

hi, great example.  i am ccing survey package author/maintainer dr.
lumley.  why do you have `na.action=na.exclude`?  if you remove it, things
work as expected--


library(RCurl)
library(survey)
data <- getURL("
https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq1adj.csv
")
elsq1ch <- read.csv(text = data)
#Specifying the svyrepdesign object which applies the BRR weights
elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
"BRR")
elsq1ch_brr
#Logistic regression call which yields a warning regarding svyrepdesign
object

# your warning
a <-
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude)
summary(a)

# works fine
a <-
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1)
summary(a)



the mismatch of vectors generating that warning happens inside

debug(survey:::summary.svrepglm)

[..snip..]

Browse[2]> length(presid)
[1] 12614
Browse[2]> length(object$survey.design$pweights)
[1] 8397


and including vs excluding the na.action=na.exclude gives you a
slightly different dispersion parameter calculation

(Dispersion parameter for binomial family taken to be 0.7756235)

(Dispersion parameter for binomial family taken to be 0.7849244)


not sure if the two survey:::residuals.sv* methods should deal with the
na.action= parameter?


thanks

On Sun, Oct 23, 2016 at 11:56 AM, Courtney Benjamin 
wrote:

> Hello R Users,
>
> I am using Lumley's Survey Package in R to analyze complex survey data
> that involves 200 balanced repeated replicate (BRR) weight variables.  I
> have ensured that my svyrepdesign object that specifies the application of
> the BRR weights to the data set is accurate and I have matched the
> published standard errors of the data set.
>
> When doing a logistic regression through the svyglm call, I receive the
> following warning:
>
> In object$survey.design$pweights * presid^2 :
>   longer object length is not a multiple of shorter object length?
> I have search around quite a bit online and have not been able to find any
> good interpretation of its meaning.  I want to be sure that I am not making
> some type of mistake that is causing this warning to be produced.  Any
> advisement is greatly appreciated.
> The following is an MRE that can be pasted into the R console:
> library(RCurl)
> library(survey)
> data <- getURL("https://raw.githubusercontent.com/
> cbenjamin1821/careertech-ed/master/elsq1adj.csv")
> elsq1ch <- read.csv(text = data)
> #Specifying the svyrepdesign object which applies the BRR weights
> elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
> "BRR")
> elsq1ch_brr
> #Logistic regression call which yields a warning regarding svyrepdesign
> object
> svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+
> F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=
> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude)
> allCC <- summary(svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+
> F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",
> design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude))
> allCC
>
> #Session Info
> #R version 3.3.1 (2016-06-21)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows >= 8 x64 (build 9200)
>
> #locale:
> #  [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> #[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> #[5] LC_TIME=English_United States.1252
> #attached base packages:
> #  [1] grid  stats graphics  grDevices utils datasets
> methods   base
> #other attached packages:
> #[1] survey_3.31-2   survival_2.39-4 Matrix_1.2-6RCurl_1.95-4.8
> bitops_1.0-6
> #loaded via a namespace (and not attached):
> #[1] tools_3.3.1 splines_3.3.1   knitr_1.14  lattice_0.20-33
>
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenj...@btboces.org
>
> 607-763-8633
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE

Re: [R] Resetting Baseline Level of Predictor in svyglm Function

2016-11-01 Thread Anthony Damico

hi, i think you want

elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg or
Less") )





On Mon, Oct 31, 2016 at 9:05 PM, Courtney Benjamin 
wrote:

> Hello R Users:
>
> I am using the survey package in R for modeling with complex survey data.
> I am trying to reset the baseline level of certain predictor variables
> being used in a logistic regression without success. The following is a
> reproducible example:
>
> library(RCurl)
> library(survey)
>
> data <- getURL("https://raw.githubusercontent.com/
> cbenjamin1821/careertech-ed/master/elsq1adj.csv")
> elsq1ch <- read.csv(text = data)
>
> #Specifying the svyrepdesign object which applies the BRR weights
> elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
> "BRR")
> elsq1ch_brr
>
> #Log. Reg. model
> allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+
> F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=
> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
> summary(allCC)
>
> ##Attempting to reset baseline level for predictor variable
> #Both attempts did not work
> elsq1ch$F1HIMATH <- C(elsq1ch$F1HIMATH,contr.treatment, base=1)
> elsq1ch$F1HIMATH <- relevel(elsq1ch$F1HIMATH,"PreAlg or Less")
>
> #Log. Reg. model with no changes in baseline levels for the predictors
> allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+
> F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=
> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
> summary(allCC)
>
>
> Any guidance is greatly appreciated.?
>
> Sincerely,
>
> Courtney?
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenj...@btboces.org
>
> 607-763-8633
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression

2016-11-17 Thread Anthony Damico

great minimal reproducible example, thanks.  does something like this work?



#Log. Reg. model-all curric. concentrations including F1RTRCC as a predictor
allCC <-
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude)
summary(allCC)

r <- residuals(allCC, type="response")
f<-fitted(allCC)
your_g<- cut(f, c(-Inf, quantile(f,  (1:9)/10, Inf)))

elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'your_g' ] <- your_g
elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'r' ] <- r
newdesign<-svrepdesign(variables = elsq1ch, repweights = elsq1ch[,18:217],
weights = elsq1ch[,17], combined.weights = TRUE, type = "BRR")

decilemodel<- svyglm(r~your_g,
design=newdesign,subset=BYSCTRL==1&G10COHRT==1)
regTermTest(decilemodel, ~your_g)




On Wed, Nov 16, 2016 at 10:15 PM, Courtney Benjamin 
wrote:

> ?Hello R Experts,
>
> I am trying to implement the Archer-Lemeshow GOF Test for survey data on a
> logistic regression model using the survey package based upon an R Help
> Archive post that I found where Dr. Thomas Lumley advised how to do it:
> http://r.789695.n4.nabble.com/Goodness-of-t-tests-for-
> Complex-Survey-Logistic-Regression-td4668233.html
>
> Everything is going well until I get to the point where I have to add the
> objects 'r' and 'g' as variables to the data frame by either using the
> transform function or the update function to update the svrepdesign
> object.  The log. regression model involved uses a subset of data and some
> of the values in the data frame are NA, so that is affecting my ability to
> add 'r' and 'g' as variables; I am getting an error because I only have
> 8397 rows for the new variables and 16197 in the data frame and svrepdesign
> object.  I am not sure how to overcome this error.
>
> The following is a MRE:
>
> ##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with
> Logistic Regression
>
> library(RCurl)
> library(survey)
>
> data <- getURL("https://raw.githubusercontent.com/
> cbenjamin1821/careertech-ed/master/elsq1adj.csv")
> elsq1ch <- read.csv(text = data)
>
> #Specifying the svyrepdesign object which applies the BRR weights
> elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
> "BRR")
> elsq1ch_brr
>
> ##Resetting baseline levels for predictors
> elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg
> or Less") )
> elsq1ch_brr <- update( elsq1ch_brr , BYINCOME = relevel(BYINCOME,"0-25K") )
> elsq1ch_brr <- update( elsq1ch_brr , F1RACE = relevel(F1RACE,"White") )
> elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male") )
> elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC = relevel(F1RTRCC,"Academic")
> )
>
> #Log. Reg. model-all curric. concentrations including F1RTRCC as a
> predictor
> allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+
> F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=
> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
> summary(allCC)
>
> #Recommendations from Lumley (from R Help Archive) on implementing the
> Archer Lemeshow GOF test
> r <- residuals(allCC, type="response")
> f<-fitted(allCC)
> g<- cut(f, c(-Inf, quantile(f,  (1:9)/10, Inf)))
>
> # now create a new design object with r and g added as variables
> #This is the area where I am having problems as my model involves a subset
> and some values are NA as well
> #I am also not sure if I am naming/specifying the new variables of r and g
> properly
> transform(elsq1ch,r=r,g=g)
> elsq1ch_brr <- update(elsq1ch_brr,tag=g,tag=r)
> #then:
> decilemodel<- svyglm(r~g, design=newdesign)
> regTermTest(decilemodel, ~g)
> #is the F-adjusted mean residual test from the Archer Lemeshow paper
>
> Thank you,
> Courtney
>
> ?
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenj...@btboces.org
>
> 607-763-8633
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression

2016-11-18 Thread Anthony Damico

hi, my code does not subset the survey design on the line that creates the
svrepdesign().  subsetting in order to create a variable while your data is
still a data.frame is probably okay, so long as you expect the observations
outside of the subset to be NAs like they are in this case.
nrow(elsq1ch_brr)==nrow(newdesign)

On Fri, Nov 18, 2016 at 5:06 AM, Courtney Benjamin 
wrote:

> Thank you, Anthony.  Your approach does work; however, I am concerned to
> some degree about subsetting the data prior to creating the new svrepdesign
> as I know that it is not recommended to subset the data prior to creating
> the svrepdesign object.  I am not sure if this is a significant concern in
> this context as the model was fitted with the original svrepdesign that was
> created prior to subsetting any data and the new svrepdesign is being used
> to run the diagnostic for the model.  Any thoughts on that issue?
>
> Also, from my understanding of the outcome of the diagnostic, low p values
> indicate a poor model fit.
>
> Sincerely,
>
> Courtney
>
>
> Courtney Benjamin
>
> Broome-Tioga BOCES
>
> Automotive Technology II Teacher
>
> Located at Gault Toyota
>
> Doctoral Candidate-Educational Theory & Practice
>
> State University of New York at Binghamton
>
> cbenj...@btboces.org
>
> 607-763-8633
> --
> *From:* Anthony Damico 
> *Sent:* Thursday, November 17, 2016 4:28 AM
> *To:* Courtney Benjamin
> *Cc:* r-help@r-project.org
> *Subject:* Re: [R] Archer-Lemeshow Goodness of Fit Test for Survey Data
> with Log. Regression
>
> great minimal reproducible example, thanks.  does something like this work?
>
>
>
> #Log. Reg. model-all curric. concentrations including F1RTRCC as a
> predictor
> allCC <- svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+
> F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=
> elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.exclude)
> summary(allCC)
>
> r <- residuals(allCC, type="response")
> f<-fitted(allCC)
> your_g<- cut(f, c(-Inf, quantile(f,  (1:9)/10, Inf)))
>
> elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'your_g' ] <- your_g
> elsq1ch[ elsq1ch$BYSCTRL==1&elsq1ch$G10COHRT==1 , 'r' ] <- r
> newdesign<-svrepdesign(variables = elsq1ch, repweights =
> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
> "BRR")
>
> decilemodel<- svyglm(r~your_g, design=newdesign,subset=
> BYSCTRL==1&G10COHRT==1)
> regTermTest(decilemodel, ~your_g)
>
>
>
>
> On Wed, Nov 16, 2016 at 10:15 PM, Courtney Benjamin 
> wrote:
>
>> ?Hello R Experts,
>>
>> I am trying to implement the Archer-Lemeshow GOF Test for survey data on
>> a logistic regression model using the survey package based upon an R Help
>> Archive post that I found where Dr. Thomas Lumley advised how to do it:
>> http://r.789695.n4.nabble.com/Goodness-of-t-tests-for-Comple
>> x-Survey-Logistic-Regression-td4668233.html
>>
>> Everything is going well until I get to the point where I have to add the
>> objects 'r' and 'g' as variables to the data frame by either using the
>> transform function or the update function to update the svrepdesign
>> object.  The log. regression model involved uses a subset of data and some
>> of the values in the data frame are NA, so that is affecting my ability to
>> add 'r' and 'g' as variables; I am getting an error because I only have
>> 8397 rows for the new variables and 16197 in the data frame and svrepdesign
>> object.  I am not sure how to overcome this error.
>>
>> The following is a MRE:
>>
>> ##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with
>> Logistic Regression
>>
>> library(RCurl)
>> library(survey)
>>
>> data <- getURL("https://raw.githubusercontent.com/cbenjamin1821/
>> careertech-ed/master/elsq1adj.csv")
>> elsq1ch <- read.csv(text = data)
>>
>> #Specifying the svyrepdesign object which applies the BRR weights
>> elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights =
>> elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type =
>> "BRR")
>> elsq1ch_brr
>>
>> ##Resetting baseline levels for predictors
>> elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg
>> or Less") )
>> elsq1ch_brr <- update( elsq1ch_brr , BYINCOME = relevel(BYINCOME,"0-25K")
>> )
>> elsq1ch_brr <- update( elsq1ch_brr , F1RACE = relevel(F1RACE,"White") )
&g

[R] [R-pkgs] New package queuecomputer

2016-12-01 Thread Anthony Ebert

Dear R users,

queuecomputer is a new R package now available on CRAN. It is a very
fast method for simulating queueing networks.

The user supplies the arrival and service times and the departure
times are computed deterministically. The name queuecomputer is meant
in the sense that the package 'computes queues'.

The page for the package:

https://CRAN.R-project.org/package=queuecomputer

The github repo is:

https://github.com/AnthonyEbert/queuecomputer

All feedback is welcome anthonyebert+c...@gmail.com

Please send any reports of bugs to anthonyebert+c...@gmail.com or
create an issue on github.

Kind regards,

Anthony Ebert

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can you get the DEFT from svyratio?

2016-12-07 Thread Anthony Damico

hi, your code isn't runnable at

fpc= ~M + Nbar)

On Wed, Dec 7, 2016 at 5:03 PM, Chris Webb  wrote:

> To Dr. Lumley or anyone who may know the answer,
>
> I am trying to obtain ratio estimates from Levy and Lemeshow's Sampling of
> Populations 4th ed. page 281. The results in the book are from STATA.
> According to the STATA output, the DEFT is 0.830749
>
> I can recreate all of the results except for DEFT. For svytotal and svymean
> I can use the option deff="replace" to obtain DEFT results (by taking the
> square root), but I get an error when using this option with svyratio. The
> problem can be my poor understanding of how to calculate DEFT, but perhaps
> it's not implemented for svyratio?
>
>
> R code to that fails:
>
> library(survey)
>
> # Creating the dataset
> df_tbl_10_1 <-
>   data.frame(
> center = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),
> nurse = c(rep(c(1,2,3),5)),
> seen = c(58,44,18,42,63,10,13,18,37,16,32,10,25,23,23),
> referred = c(5,6,6,3,19,2,12,6,30,5,14,4,17,9,14)
>   )
> df_tbl_10_2 <- df_tbl_10_1[c(2,3,4,6,10,11),]
>
> # Defining the cluster sampling design
> svy_tbl_10_2 <-
>   svydesign(id=~center + nurse,
> data=df_tbl_10_2,
> fpc= ~M + Nbar)
>
> # Ratio estimates
> svyratio(~referred, ~seen, svy_tbl_10_2)
> confint(svyratio(~referred, ~seen, svy_tbl_10_2), df=degf(svy_tbl_10_2))
>
> # DEFF
> deff(svyratio(~referred, ~seen, svy_tbl_10_2, deff=TRUE))
>
> # DEFT (fails)
> sqrt(deff(svyratio(~referred, ~seen, svy_tbl_10_2, deff="replace")))
>
> Fail message:
> Error in if (deff) deffs <- matrix(ncol = nd, nrow = nn) : argument is not
> interpretable as logical
>
>
> For other individuals, I have included code that will calculate DEFF and
> DEFT for svytotal on page 280, This code doesn't fail.
>
> svytotal(~referred, svy_tbl_10_2)
> confint(svytotal(~referred, svy_tbl_10_2), df=degf(svy_tbl_10_2))
> deff(svytotal(~referred, svy_tbl_10_2, deff=TRUE))
> sqrt(deff(svytotal(~referred, svy_tbl_10_2, deff="replace")))
>
>
>
> To recap: can you get the DEFT from svyratio?
>
> Sincerely,
> Chris Webb
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in calculation of subpopulation mean in Survey package (Specify survey design with replicate weights)

2016-12-24 Thread Anthony Damico

hi, please revise your minimal reproducible example.  the objects `W` and
`bootstrap.results` do not exist.  thanks

On Fri, Dec 23, 2016 at 4:56 PM, Kristi Glover 
wrote:

> Hi R users,
>
> I got a problem when I was trying to calculate the population mean for
> different groups (classes) in using "svrepdesign", It worked when I used
> entire rows  but when I introduced by classes (groups or strata), it did
> not work out. I read several documents but I could not figure it out to the
> fix the problem.
>
> Actually, I do have about 500 data points (rows) in which there are
> several classes (groups). I developed the design using the svrepdesign, but
> when I introduced group in calculating mean, it says
>
> ""Error in qr.default(weights(design, "analysis"), tol = 1e-05) :
>   NA/NaN/Inf in foreign function call (arg 1)"
>
> I think I have to introduce groups in the design and I tried it different
> ways but it did not work it out. Would you mind to give some hints?I would
> be very grateful if you could give some hints.
>
> here I have attached a example (format of the data) and the code I used.
> Thanks for your help.
> have a great holidays.
>
> MG
> ___
>
> library(survey)
>
> dat<-structure(list(ID = 1:6, group = structure(c(8L, 7L, 7L, 8L,
> 7L, 7L), .Label = c("groupA", "groupB", "groupC", "groupD", "groupE",
> "groupF", "groupG", "groupH"), class = "factor"), ProbSelect = c(0.72,
> 0.62, 0.62, 0.72, 0.72, 0.62), density = c(28, 227, 65, 132,
> 13, 227), totalSampled = c(96L, 96L, 96L, 96L, 96L, 96L), pop = c(166L,
> 166L, 166L, 166L, 166L, 166L), wgt = c(2.42, 2.79, 2.79, 2.42,
> 2.42, 2.79)), .Names = c("ID", "group", "ProbSelect", "density",
> "totalSampled", "pop", "wgt"), row.names = c(NA, 6L), class = "data.frame")
>
> dat
>
> design_dat<-svrepdesign(data=dat, type="bootstrap", weights =
> I(1/dat$ProbSelect), repweights=W[,-1],scale=bootstrap.results$scale,
> rscale=bootstrap.results$rscales, combined.weights=TRUE)
>
> svymean(~ density,design=subset(design_dat, group=='groupH'),na.rm=T,
> digits=2)
>
> "Error in qr.default(weights(design, "analysis"), tol = 1e-05) :
>   NA/Na
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to extract weighted data in "survey" package

2017-01-01 Thread Anthony Damico

# load the survey library
library(survey)

# load the apistrat data.frame
data(api)

# look at the first six records
head(apistrat)

# look at the weight column only
apistrat$pw





On Sun, Jan 1, 2017 at 9:49 AM, Kristi Glover 
wrote:

> Hi R Users,
>
> Happy New Year
>
>
> I wanted to see the data after  raw data was adjusted/weighted but I could
> not get it.  Any suggestions?
>
> I would like to see which data points got more weight after the design was
> used.
>
>
> I have given you an example what I tried but I was not successful .
>
>
> Thanks
>
>
> library(survey)
>
> data(api)
>
>
> rawData<-data.frame(API00=apistrat$api00, API99=apistrat$api99)
>
> head(rawData)
>
> dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat,
> fpc=~fpc)
>
> svyplot(api00~api99, design=dstrat, style="bubble")
>
>
> adjustedData<-data.frame(API00=(~api00, design=dstrat),API99=(~api99,
> design=dstrat ))
>
> head(adjustedData)
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to extract weighted data in "survey" package

2017-01-01 Thread Anthony Damico

sum(apistrat$api00*apistrat$pw)/sum(apistrat$pw)

On Sun, Jan 1, 2017 at 11:11 AM, Kristi Glover 
wrote:

> Thank You Anthony for the message.
>
> Why did not I get the same values in the following examples?
>
> To get the adjusted value, should not we just multiphy by weight? For
> example, I multiplied "api00" by "column "pw" (mean(apistrat$api00*
> apistrat$pw/100)) but I did not get the same value as of survey package
> given. I think I did mistake. Any suggestions?
>
>
>
>
> # load the survey library
>
> library(survey)
>
>
> # load the apistrat data.frame
>
> data(api)
>
>
> # look at the first six records
>
> head(apistrat)
>
>
> # look at the weight column only
>
> apistrat$pw
>
> # calcualet mean using raw data and afetr adjusted
>
>
> svymean(~api00, dstrat)
>
>
> mean(apistrat$api00*apistrat$pw/100)
>
>
>
>
>
>
> --
> *From:* Anthony Damico 
> *Sent:* January 1, 2017 8:00 AM
> *To:* Kristi Glover
> *Cc:* R-help
> *Subject:* Re: [R] how to extract weighted data in "survey" package
>
>
> # load the survey library
> library(survey)
>
> # load the apistrat data.frame
> data(api)
>
> # look at the first six records
> head(apistrat)
>
> # look at the weight column only
> apistrat$pw
>
>
>
>
>
> On Sun, Jan 1, 2017 at 9:49 AM, Kristi Glover 
> wrote:
>
>> Hi R Users,
>>
>> Happy New Year
>>
>>
>> I wanted to see the data after  raw data was adjusted/weighted but I
>> could not get it.  Any suggestions?
>>
>> I would like to see which data points got more weight after the design
>> was used.
>>
>>
>> I have given you an example what I tried but I was not successful .
>>
>>
>> Thanks
>>
>>
>> library(survey)
>>
>> data(api)
>>
>>
>> rawData<-data.frame(API00=apistrat$api00, API99=apistrat$api99)
>>
>> head(rawData)
>>
>> dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat,
>> fpc=~fpc)
>>
>> svyplot(api00~api99, design=dstrat, style="bubble")
>>
>>
>> adjustedData<-data.frame(API00=(~api00, design=dstrat),API99=(~api99,
>> design=dstrat ))
>>
>> head(adjustedData)
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] if i paste this into my windows 3.3.2 R console, it crashes

2017-01-07 Thread Anthony Damico

hi, should i file this on https://bugs.r-project.org/  ?  thanks



# crash R with this command
dir.create( "C:/My Directory/PEW/Hispanic Trends/2015/2013 Recontact Survey
of Asian Ame
ricans  Field dates: 10/16/13 - 10/31/13 Respondents:
Nationally-rep
resentative sample of 802 Asian Americans ages 18 and older. Margin of
Error: +/
- 5.0 percentage points at the 95% confidence interval. This survey focused
on p
olitics, attitudes regarding immigration legislation, illegal immigration,
and n
aturalization./"  , recursive = TRUE , showWarnings = FALSE )



> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 10586)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if i paste this into my windows 3.3.2 R console, it crashes

2017-01-07 Thread Anthony Damico

haha no doubt.  just want to confirm this counts as a reportable bug (
https://www.r-project.org/bugs.html) before bothering r-core




On Sat, Jan 7, 2017 at 5:51 AM, Jim Lemon  wrote:

> Hi Anthony,
> I think you have included most of the forbidden characters in Windows
> folder names and while I am too lazy to count the characters, you may
> have exceeded the 259 character limit as well. Are there really
> embedded EOLs as well? This is truly a masterpiece of computer
> disobedience.
>
> Jim
>
>
> On Sat, Jan 7, 2017 at 8:31 PM, Anthony Damico  wrote:
> > hi, should i file this on https://bugs.r-project.org/  ?  thanks
> >
> >
> >
> > # crash R with this command
> > dir.create( "C:/My Directory/PEW/Hispanic Trends/2015/2013 Recontact
> Survey
> > of Asian Ame
> > ricans  Field dates: 10/16/13 - 10/31/13 Respondents:
> > Nationally-rep
> > resentative sample of 802 Asian Americans ages 18 and older. Margin of
> > Error: +/
> > - 5.0 percentage points at the 95% confidence interval. This survey
> focused
> > on p
> > olitics, attitudes regarding immigration legislation, illegal
> immigration,
> > and n
> > aturalization./"  , recursive = TRUE , showWarnings = FALSE )
> >
> >
> >
> >> sessionInfo()
> > R version 3.3.2 (2016-10-31)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > Running under: Windows 10 x64 (build 10586)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if i paste this into my windows 3.3.2 R console, it crashes

2017-01-07 Thread Anthony Damico

thanks!  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17206

On Sat, Jan 7, 2017 at 7:34 AM, Duncan Murdoch 
wrote:

> On 07/01/2017 5:51 AM, Jim Lemon wrote:
>
>> Hi Anthony,
>> I think you have included most of the forbidden characters in Windows
>> folder names and while I am too lazy to count the characters, you may
>> have exceeded the 259 character limit as well. Are there really
>> embedded EOLs as well? This is truly a masterpiece of computer
>> disobedience.
>>
>
> I haven't looked closely to see what goes wrong, but the problems you list
> should lead to an error message, not a crash.  So yes, Anthony should
> report this as a bug.  Anthony, if you don't have a bug reporting account
> you won't be able to do so; write to me privately and I'll create one for
> you (with your choice of associated email address).
>
> Duncan Murdoch
>
>
>
>> Jim
>>
>>
>> On Sat, Jan 7, 2017 at 8:31 PM, Anthony Damico 
>> wrote:
>>
>>> hi, should i file this on https://bugs.r-project.org/  ?  thanks
>>>
>>>
>>>
>>> # crash R with this command
>>> dir.create( "C:/My Directory/PEW/Hispanic Trends/2015/2013 Recontact
>>> Survey
>>> of Asian Ame
>>> ricans  Field dates: 10/16/13 - 10/31/13 Respondents:
>>> Nationally-rep
>>> resentative sample of 802 Asian Americans ages 18 and older. Margin of
>>> Error: +/
>>> - 5.0 percentage points at the 95% confidence interval. This survey
>>> focused
>>> on p
>>> olitics, attitudes regarding immigration legislation, illegal
>>> immigration,
>>> and n
>>> aturalization./"  , recursive = TRUE , showWarnings = FALSE )
>>>
>>>
>>>
>>> sessionInfo()
>>>>
>>> R version 3.3.2 (2016-10-31)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> Running under: Windows 10 x64 (build 10586)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats graphics  grDevices utils datasets  methods   base
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does "coeftest" correctly use weights from "svydesign" in "svyglm" object?

2017-02-08 Thread Anthony Damico

hi, that setup is not correct.  see examples in

https://github.com/ajdamico/asdfree/tree/master/European%20Social%20Survey

On Feb 8, 2017 11:54 PM, "André Grow"  wrote:

> Dear all,
>
>
>
> I am using data from the European Social Survey (ESS) and I would like to
> calculate country-level cluster-robust standard errors for a regression
> model in R that includes country fixed effects and employs the design
> weights that come with the ESS.
>
>
>
> To correctly use the weights, I use the 'survey' package and the functions
> 'svydesign' and 'svyglm'. This step looks like this:
>
>
>
> design_1 <- svydesign(id=~1, weights=~dweight, data=ESS)
>
>
>
> m1 <- svyglm(y ~ cntry + x, design = design_1)
>
>
>
> My question is: when I now apply the functions 'cluster.vcov' and
> 'coeftest'
> from the packages 'lmtest' and 'multiwayvcov' to the model m1, do the
> resulting standard errors correctly account for the design weights? This
> step looks like this:
>
>
>
> vcov_m1 <- cluster.vcov(m1, ESS$cntry)
>
>
>
> coeftest(m1, vcov_m1)
>
>
>
> Note that I do not use 'cntry' as an id variable in the svydesign function,
> because then I cannot include country dummies in the regression model.
>
>
>
> Thanks in advance for your feedback!
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] > quit('no')\nError: cannot allocate vector of size 512 Kb

2017-03-04 Thread Anthony Damico

this one is cute

[damico@rocks010 ~]$ ulimit -v 15
[damico@rocks010 ~]$ R

R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object
'/usr/lib64/R/library/grDevices/libs/grDevices.so':
  /usr/lib64/R/library/grDevices/libs/grDevices.so: failed to map
segment from shared object
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object
'/usr/lib64/R/library/grDevices/libs/grDevices.so':
  /usr/lib64/R/library/grDevices/libs/grDevices.so: failed to map
segment from shared object
In addition: Warning message:
package ‘grDevices’ in options("defaultPackages") was not found
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object
'/usr/lib64/R/library/grDevices/libs/grDevices.so':
  /usr/lib64/R/library/grDevices/libs/grDevices.so: failed to map
segment from shared object
In addition: Warning message:
package ‘graphics’ in options("defaultPackages") was not found
During startup - Warning message:
package ‘stats’ in options("defaultPackages") was not found
> quit('no')
Error: cannot allocate vector of size 512 Kb
> sessionInfo()
Error: cannot allocate vector of size 512 Kb



separate session obviously

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 24 (Twenty Four)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] first readline() instance getting skipped on windows with R 3.3.3

2017-03-10 Thread Anthony Damico

hi, i'm curious if anyone else has noticed a change in behavior of
readline()?

i have a function in an R package that calls readline() here:

https://github.com/ajdamico/lodown/blob/master/R/mics.R#L126

after upgrading to 3.3.3, the function appeared to start ignoring that
readline() call.  my function runs some internet and plotting commands, so
i might be doing something wrong here.  i figured out that i can work
around the problem by adding an empty readline("") call earlier in the same
function.. so readline() is just getting skipped at the first occurrence
for some reason.  this line works around the problem, which seems odd:

https://github.com/ajdamico/lodown/blob/master/R/mics.R#L94


i've been unsuccessful reproducing this more succinctly.  latest R news
indicates lots of changes to this function:
https://cran.r-project.org/doc/manuals/r-release/NEWS.html


here's my extSoftVersion() and sessionInfo()

thanks!


> extSoftVersion()
 zlib bzlib
xz  PCRE
ICU   TRE iconv
readline
  "1.2.8"  "1.0.6, 6-Sept-2010"
"5.0.8" "8.38 2015-11-23""55.1" "TRE 0.8.0
R_fixes (BSD)"   "win_iconv"""






> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 10586)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
base

other attached packages:
[1] survey_3.31-5   survival_2.40-1 Matrix_1.2-7.1  lodown_0.1.0

loaded via a namespace (and not attached):
[1] httr_1.2.1  R6_2.2.0curl_2.3splines_3.3.3
jpeg_0.1-8  jsonlite_1.1lattice_0.20-34

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R and GNUPLOT

2008-08-22 Thread Fristachi, Anthony

Hello,

 I am trying to send commands to GNUPLOT to load a .plt file that was generated 
with a C++ software module called with R, and to replot.

 I am able to open the program with shell.exec (below)  but I am at a loss at 
what do next.  Any suggestions??

shell.exec("C:\\ gnuplot \\ bin \\ wgnuplot.exe")

Thanks

>>>---
Tony Fristachi
Battelle, Statistics & Information Analysis
505 King Avenue, Rm 11-7-056
Columbus, OH 43201
Email: [EMAIL PROTECTED]
Phone: 614/424.4910   Fax: 614/458.4910
Cell: 513/375.3263
Web: 
http://www.battelle.org/solutions/default.aspx?Nav_Area=Tech&Nav_SectionID=3

"We are what we repeatedly do. Excellence then, is not an act, but a habit."
Aristotle


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re-assigning variables stored as character strings in another variable

2010-02-22 Thread Anthony Damico

Is there any way to get the last line of this code to double the contents of
a and b without naming them directly?

#create variables a and b
a<-5
b<-10

#store variable names a and b in variables c and d
c<-"a"
d<-"b"

e<-c(c,d)

#loop through both variables
for (i in e){

#print the numbers five and ten using only the variables c and d
#this line works fine
print(eval(parse(text=i)))*2

#re-assign variables a and b using only variables c and d
#this line does not work -
parse(text=i) <- eval(parse(text=i))*2
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] summated scale

2010-03-20 Thread Anthony Lopez

hello,

I am new to R (convert from Stata) and I am wondering if there is an R
command/function for generating a summated scale from, for example, four
separate variables in a data set.  In other words, I want to transform the
variables x1, x2, x3, and x4 (which have high inter-item reliability) into
one new summated variable x'.  Can't seem to find it in the psy or psych
package ...

Thanks for any suggestions!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] contradictory output between ncv.test() and gvlma()

2010-04-05 Thread Anthony Lopez

Can anyone tell me why the ncv.test output and the gvlma output would be
contradictory on the question of heteroscedasticity?  Below, the ncv.test
output reveals a problem with heteroscedasticity, but the gvlma output says
that the assumptions are acceptable.  How is this reconciled?

> ncv.test(defmodA)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 7.360374Df = 1 p = 0.00666769

> gvlma(defmodA)

Call:
lm(formula = DefPunWmn1 ~ DefPersBenef, data = Data)

Coefficients:
 (Intercept)  DefPersBenef
  1.25790.1572


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05

Call:
 gvlma(x = defmodA)

 Value   p-value   Decision
Global Stat37.3746 1.508e-07 Assumptions NOT satisfied!
Skewness   32.8916 9.744e-09 Assumptions NOT satisfied!
Kurtosis2.6248 1.052e-01Assumptions acceptable.
Link Function   0.3684 5.439e-01Assumptions acceptable.
Heteroscedasticity  1.4899 2.222e-01Assumptions acceptable.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] categorical variable in scatterplot (car)

2010-04-24 Thread Anthony Lopez

Hello R folks,

I am encountering a problem with the following scatterplot function from the
car package:

> scatterplot(y~x|z)

where y and x are continuous (interval) random variables and z is a
categorical variable.  When z is a categorical variable coded 1 or 2, I
(appropriately) get a scatterplot of y by x, coded by z.  Similarly, when z
is a categorical variable coded 1, 2, or 3, there is again, no problem.
 However, when z is a categorical variable coded 0 or 1, the scatterplot

> scatterplot(y~x|z)

is exactly identical to the one generated by

> scatterplot(y~x)

It is not possible that this is due to the fact that there is no difference
between the categories.  It is as if R doesn't "see" that I want it coded by
z.  But this only happens when one of the categories of z is coded "0" (i.e.
zero).  Any ideas why this is so, or how I can fix this without recoding my
variable?

Thank you!

Anthony

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] categorical variable in scatterplot (car)

2010-04-25 Thread Anthony Lopez

Thank you both for your helpful reply!  And apologies for the lack of a
reproducible example - I would/could send one now, but I believe Peter's
example will suffice (and thank you for that).  And making z a factor worked
perfectly.  Thank you!

Anthony

On Sun, Apr 25, 2010 at 10:20 AM, Peter Ehlers  wrote:

> Hi John,
>
> The problem seems to be with the order in which the 'levels' of
> the conditioning variable appear. Here's a reproducible example:
>
>  Prestige$tp<- with(Prestige, ifelse(type == "prof", 0, 1))
>
>  scatterplot(prestige ~ income | tp, data=Prestige)
>
> Note that I've just switched the 0/1 from your example.
>
> A quick look at scatterplot.formula suggests that wrapping
> the 'X[, 3]' in this line
>
>   scatterplot(X[, 2], X[, 1], groups = X[, 3], xlab = xlab,
>
> inside an as.factor() would solve the problem.
>
>  -Peter
>
>
>
> On 2010-04-25 8:40, John Fox wrote:
>
>> Dear Peter and Anthony,
>>
>> Thanks, Peter, for answering the question, but scatterplot() should work
>> even if z is not a factor, and does for me in the following example:
>>
>>  library(car)
>>> Prestige$tp<- with(Prestige, ifelse(type == "prof", 1, 0))
>>> scatterplot(prestige ~ income | tp, data=Prestige)
>>>
>>
>> So, Anthony, the usual advice about providing a reproducible example seems
>> applicable here.
>>
>> Regards,
>>  John
>>
>> 
>> John Fox
>> Senator William McMaster
>>   Professor of Social Statistics
>> Department of Sociology
>> McMaster University
>> Hamilton, Ontario, Canada
>> web: socserv.mcmaster.ca/jfox
>>
>>
>>  -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>>>
>> On
>>
>>> Behalf Of Peter Ehlers
>>> Sent: April-24-10 11:57 PM
>>> To: Anthony Lopez
>>> Cc: R-help@r-project.org
>>> Subject: Re: [R] categorical variable in scatterplot (car)
>>>
>>> On 2010-04-24 21:30, Anthony Lopez wrote:
>>>
>>>> Hello R folks,
>>>>
>>>> I am encountering a problem with the following scatterplot function from
>>>>
>>> the
>>>
>>>> car package:
>>>>
>>>>  scatterplot(y~x|z)
>>>>>
>>>>
>>>> where y and x are continuous (interval) random variables and z is a
>>>> categorical variable.  When z is a categorical variable coded 1 or 2, I
>>>> (appropriately) get a scatterplot of y by x, coded by z.  Similarly,
>>>>
>>> when z
>>
>>> is a categorical variable coded 1, 2, or 3, there is again, no problem.
>>>>   However, when z is a categorical variable coded 0 or 1, the
>>>>
>>> scatterplot
>>
>>>
>>>>  scatterplot(y~x|z)
>>>>>
>>>>
>>>> is exactly identical to the one generated by
>>>>
>>>>  scatterplot(y~x)
>>>>>
>>>>
>>>> It is not possible that this is due to the fact that there is no
>>>>
>>> difference
>>
>>> between the categories.  It is as if R doesn't "see" that I want it
>>>>
>>> coded
>>
>>> by
>>>
>>>> z.  But this only happens when one of the categories of z is coded "0"
>>>>
>>> (i.e.
>>>
>>>> zero).  Any ideas why this is so, or how I can fix this without recoding
>>>>
>>> my
>>
>>> variable?
>>>>
>>>
>>> Make z a factor (which it really should be anyway).
>>>
>>>   -Peter Ehlers
>>>
>>>
>>>> Thank you!
>>>>
>>>> Anthony
>>>>
>>>>[[alternative HTML version deleted]]
>>>>
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>>
>>> guide.html
>>>
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>> --
>>> Peter Ehlers
>>> University of Calgary
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>>
>> http://www.R-project.org/posting-guide.html
>>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>>
> --
> Peter Ehlers
> University of Calgary
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Prediction from BMA

2009-10-07 Thread Anthony Staines

Dear colleagues,

I am doing a little bit of work using the BMA package to examine
predictive models for types of pain - the outcome variable is a binary
(Yes/No) for the type of pain experienced by a patient.
Our observation is that BMA makes a lot of sense in this application, as
the data suggests that there are several well-fitting possible logistic
models, each with posterior probabilities close to 0.1, as well as a
straggle of somewhat less likely models.

I am unsure how best to produce predictions from the BMA output - i.e.
the posterior means of model coefficients. There isn't a predict.BMA or
similar, that I can see. What's the recommended way to produce
predictions (i.e. fitted values, or estimates from new data) from these
BMA models?

Best wishes,
Anthony Staines
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory Problems with CSV and Survey Objects

2009-10-23 Thread Anthony Damico

I'm working with a 350MB CSV file on a server that has 3GB of RAM, yet I'm
hitting a memory error when I try to store the data frame into a survey
design object, the R object that stores data for complex sample survey data.

When I launch R, I execute the following line from Windows:
"C:\Program Files\R\R-2.9.1\bin\Rgui.exe" --max-mem-size=2047M
Anything higher, and I get an error message saying the maximum has been set
to 2047M.

Here are the commands:
> library(survey)

#this step takes more than five minutes
> data08<-read.csv("data08.csv",header=TRUE,nrows=210437)

> object.size(data08)
#329877112 bytes

#Looking at Windows Task Manager, Mem Usage for Rgui.exe is already 659,632K

> brr.dsgn <-svrepdesign( data = data08 , repweights = data08[, grep(
"^repwgt" , colnames( data08)) ], type = "BRR" , combined.weights = TRUE ,
weights = data08$mainwgt )
#Error: cannot allocate vector of size 254.5 Mb

#The survey design object does not get created.

#This also causes Windows Task Manager, Mem Usage to spike to 1,748,136K

#And here are some memory diagnostics
> memory.limit()
[1] 2047
> memory.size()
[1] 1449.06
> gc()
   used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   131148   3.6 593642   15.9  15680924  418.8
Vcells 45479988 347.0  173526492 1324.0 220358611 1681.3

A description of the survey package can be found here:
http://faculty.washington.edu/tlumley/survey/

I tried creating a work-around by using the database-backed survey objects
(DB SO), included in the survey package to conserve memory on larger
datasets like this one.  Unfortunately, I don't think the survey package
supports database connections for replicate weight designs yet, since I've
only been able to get a database connection working after creating a
svydesign object and not a svrepdesign object - and also because neither the
DB SO website nor the svrepdesign help page make any mention of those
parameters.

The DB SOs are described in detail here:
http://faculty.washington.edu/tlumley/survey/svy-dbi.html

Any advice would be truly appreciated.

Thanks,
 Anthony Damico

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem Removing Border Lines in Maps Package

2009-10-24 Thread Anthony Damico

I'm working with the nationwide county maps data, and trying to remove the
internal county boundary lines.  The only map() function parameter that I've
found that gets me anywhere close to my desired result leaves small white
segments on parts of the map.  I believe this is due to the low resolution,
because when I look at individual states, the lty parameter solves the
problem.  Does anyone have any idea how I might draw a United States county
map without these white borders making the map look sloppy?

library(maps)

#county map of new jersey, with invisible county borders -- correct example
map("county","new jersey", fill=TRUE , col=palette() , lty=0)

#county map of the united states, with white county borders still visible
some places
map("county", fill=TRUE , col=palette() , lty=0)

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Maptools runs out of memory installing the help files for spCbind-methods

2010-01-26 Thread Anthony Staines

Hi,
I'd be grateful for help with the following:-

Running R version 2.9.2 (2009-08-24) on Gentoo Linux, on an
x86 PC. I am trying to install maptools, (via CRAN from
maptools_0.7-29.tar.gz'), for the first time.

All runs smoothly until the installation gets to
"*** installing help indices" for spCbind-methods, about two
thirds of the way through the help files, at which point the
installation hangs until R runs out of memory.
The last few lines of the output are :-
"sp2tmaptexthtmllatex   example
spCbind-methodstexthtmllatex   example
Out of memory!"
and it freezes.

Memory available as reported by
> gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 133838  3.6 35  9.4   35  9.4
Vcells  81771  0.7 786432  6.0   406077  3.1

top, in another shell, reports that I am using almost all
the memory (2G) and almost all the swap (1G)

the command
perl /usr/lib/R/share/perl/build-help.pl --txt --html
--latex is running as well as R.

I tried this on an amd64 Linux system, and got exactly the
same results.

I've never seen this before, and I can't find any similar
issues in the bug tracker, or on the forums.

Any suggestions? Am I missing something that it needs?
Anthony
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] rameritrade

2020-10-16 Thread Anthony Trevisan

Hello,

I recently created a package that allows R users to trade through the TD
Ameritrade API. With billions in assets and over 11 million clients, I am
sure some R users could leverage the API.

Best regards,
Tony

https://cran.r-project.org/web/packages/rameritrade/index.html

https://tonytrevisan.github.io/rameritrade/index.html

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] fmpcloudr package

2020-11-05 Thread Anthony Trevisan

Hello,

As recommended in https://r-pkgs.org/release.html 
, I wanted to notify the R community about a 
new package for accessing financial data metrics. The package fmpcloudr 
(https://CRAN.R-project.org/package=fmpcloudr 
) interacts with the FMP API. 

FMP offers historical pricing data for indexes, stocks, ETFs, mutual finds, 
currencies, and crypto. Other financial metrics are available such as technical 
indicators (EMA, RSI, etc), company financials, and 13F. It’s a great data 
source and could be beneficial to many R users. You can find more details about 
the package here: https://tonytrevisan.github.io/fmpcloudr/ 


Thank you for supporting such an incredible open source community. Learning R 
has dramatically reshaped my career path which would have been impossible 
without the R community.  

Best regards,
Tony



[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] Automated Investing - etrader

2020-12-01 Thread Anthony Trevisan

Hello,

I wanted to notify the R community about a new package for trading on the
ETRADE platform through an API. The package `etrader` (
https://CRAN.R-project.org/package=etrader
) interacts with the ETRADE API
to allow for authentication, trading, account details and more. You can
find more details about the package here:
https://tonytrevisan.github.io/etrader/.

This completes a suite of three finance packages I developed to allow for
automated trading and investing. `rameritrade` and `etrader` allow for
trading on TD Ameritrade and ETRADE which collectively hold more than 16
million accounts and $1.5 trillion in Assets. `fmpcloudr` is a package to
pull data from Financial Modeling Prep (
https://financialmodelingprep.com/developer/docs/companies-key-stats-free-api/)
to perform in-depth analysis on a range of financial metrics.

I have a number of articles and 'How To' guides on my blog that show
someone how to fully automate their retirement investing (
https://tonytrevisan.github.io/blog.html). I hope these tools prove useful
to the R community.

Best regards,
Tony

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue running svyglm after subsetting: NA/NaN/Inf in foreign function call (arg 1)

2022-09-30 Thread Anthony Damico

hi, that error happens before svyglm because the second parameter isn't a
logical test?  run `subset(rclus1, as.factor(stype=="E"))` and you'll see
the same error..  if you remove the "as.factor" `subset(rclus1,
(stype=="E"))`  then the svyglm simply fails to converge but i think that's
just too many variables without enough sample?  thanks




On Thu, Sep 29, 2022 at 4:18 PM Felippe Marcondes <
felippemarcon...@gmail.com> wrote:

> Hello,
>
> I am attempting to run 1 svyglm model for each of the levels of a factor
> variable.
> When I use the subset function in the survey design object, I get
> the following error:
>
> Error in qr.default(weights(design, "analysis"), tol = 1e-05) :
>   NA/NaN/Inf in foreign function call (arg 1)
>
> I am using the api data for a minimal reproducible example.
>
> # loading package and data
> library(survey)
> data(api)
>
> # creating the svyrep design object
> dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
>
> rclus1<-as.svrepdesign(dclus1)
>
>
> # attempting to sun svyglm model with subsetted design object:
> t <- svyglm(awards ~ comp.imp + api99 + api00 + cname + cnum + meals + ell,
> design = subset(rclus1, as.factor(stype=="E")), family = quasibinomial)
>
> I get the following error:
> Error in qr.default(weights(design, "analysis"), tol = 1e-05) :
>   NA/NaN/Inf in foreign function call (arg 1)
>
> How do I properly subset the design object by each level of the stype
> variable for the svyglm model to run?
>
> Thanks,
>
> Felippe
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there any design based two proportions z test?

2024-01-17 Thread Anthony Damico

hi, this guide to analyzing changes in prevalence rates over time with
complex survey data might also help?  thanks

http://asdfree.com/trend-analysis-of-complex-survey-data.html



On Wed, Jan 17, 2024, 9:15 AM John Fox  wrote:

> Dear Md Kamruzzaman,
>
> To answer your second question first, you could just use the svychisq()
> function. The difference-of-proportion test is equivalent to a chisquare
> test for the 2-by-2 table.
>
> You don't say how you computed the confidence intervals for the two
> separate proportions, but if you have their standard errors (and if not,
> you should be able to infer them from the confidence intervals) you can
> compute the variance of the difference as the sum of the variances
> (squared standard errors), because the two proportions are independent,
> and from that the confidence interval for their difference.
>
> I hope this helps,
> John
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://www.john-fox.ca/
>
> On 2024-01-16 10:21 p.m., Md. Kamruzzaman wrote:
> > [You don't often get email from mkzama...@gmail.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Caution: External email.
> >
> >
> > Hello Everyone,
> > I was analysing big survey data using survey packages on RStudio. Survey
> > package allows survey data analysis with the design effect.The survey
> > package included functions for all other statistical analysis except
> > two-proportion z tests.
> >
> > I was trying to calculate the difference in prevalence of Diabetes and
> > Prediabetes between the year 2011 and 2017 (with 95%CI). I was able to
> > calculate the weighted prevalence of diabetes and prediabetes in the Year
> > 2011 and 2017 and just subtracted the prevalence of 2011 from the
> > prevalence of 2017 to get the difference in prevalence. But I could not
> > calculate the 95%CI of the difference in prevalence considering the
> weight
> > of the survey data.
> >
> > I was also trying to see if this difference in prevalence is
> statistically
> > significant. I could do it using the simple two-proportion z test without
> > considering the weight of the sample. But I want to do it considering the
> > weight of the sample.
> >
> >
> > Example: Prevalence of Diabetes:
> >   2011: 11.0 (95%CI
> > 10.1-11.9)
> >   2017: 10.1 (95%CI
> > 9.4-10.9)
> >   Diff: 0.9% (95%CI:
> ??)
> >   Proportion Z test P
> > Value: ??
> > Your cooperation will be highly appreciated.
> >
> > Thanks in advance.
> >
> > With Regards
> >
> > **
> >
> > *Md Kamruzzaman*
> >
> > *PhD **Research Fellow (**Medicine**)*
> > Discipline of Medicine and Centre of Research Excellence in Translating
> > Nutritional Science to Good Health
> > Adelaide Medical School | Faculty of Health and Medical Sciences
> > The University of Adelaide
> > Adelaide SA 5005
> >
> >  [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a map in R using ACS PUMS data

2013-09-12 Thread Anthony Damico

the smallest boundary in the 1-year acs files is public use microdata area
(puma), but the 3- and 5-year public use microdata samples (pums) go down
to some counties, i believe..

http://www.census.gov/acs/www/guidance_for_data_users/estimates/



i think you just need to download the census bureau's shapefiles for the
geography you want to map

http://www.census.gov/geo/maps-data/data/tiger-line.html



at the point you have a shapefile and you have created the data that you
actually want to map, maybe start exploring your mapping options in R here--

http://flowingdata.com/category/visualization/mapping/



there's lots of people working with the ACS in R

http://www.asdfree.com/search/label/american%20community%20survey%20%28acs%29

http://cran.r-project.org/web/packages/UScensus2010/UScensus2010.pdf

http://cran.r-project.org/web/packages/acs/acs.pdf






On Thu, Sep 12, 2013 at 9:49 AM, Chris Schatschneider wrote:

> Hello all - does anyone know if there is a package in R that will allow me
> to create a map of the US (or individual states) that uses the American
> Community Survey PUMS boundaries?
>
> Thanks
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] accumulate() function in R?

2013-09-14 Thread Anthony Damico

maybe ?cumsum

z <- 1:10
cumsum( z )
z <- sort( z )
cumsum( z )[ cumsum( z ) < 30 ]






On Sat, Sep 14, 2013 at 10:36 PM,  wrote:

> I came from Python, newly learning R. is there something like accumulate()
> in R?
>
> Example:
> accumulate([1,2,3,4,5]) --> 1 3 6 10 15
>
> Or perhaps I should show the problem. The problem I am trying to solve, is
> to select elements from X until it accumulate to 30. My solution is:
>
>  X = c(1,3,4,5,8,15,35,62,78,99)
>> X[sapply(seq_len(length(X)), function(x) { sum(X[1:x])}) < 30]
>>
> [1] 1 3 4 5 8
>
> Is this already the shortest/canonical way to do it in R?
>
>
> --**---
>
> VFEmail.net - http://www.vfemail.net
> $14.95 ONETIME Lifetime accounts with Privacy Features! 15GB disk! No
> bandwidth quotas!
> Commercial and Bulk Mail Options!
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get a value from a list (using paste function)?

2012-12-18 Thread Anthony Damico

# use the mtcars data frame as your starting list.  save it to x
x <- as.list( mtcars )

# just print one column, by hand.
x$wt

# ..dynamically choose the column you want
colname <- 'wt'

# this breaks
get( paste( 'x$' , colname , sep = "" ) )

# this works, but doesn't do what you want, since it's not dynamic
get( 'x' )$wt

# why not access the list dynamically without paste() or get()  ?  ;)
x[ colname ]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Order variables automatically

2013-01-01 Thread Anthony Damico

# create an example data frame
yourdata <-
data.frame(
cat1 = c( 1 , 0 , 1 ) ,
cont1 = c( 0 , 1 , 0 ) ,
cat2 = c( 0 , 0 , 1 )
)
# if this doesn't work for you,
# please ?dput some example data in the future :)

# figure out which variables contain the word 'cat'
vars.to.order <- grep( 'cat' , names( yourdata ) )

# convert all of those columns to factor..
yourdata[ , vars.to.order ] <- lapply( yourdata[ , vars.to.order ], factor )
# ..and then to ordered factor
yourdata[ , vars.to.order ] <- lapply( yourdata[ , vars.to.order ], ordered
)

# confirm the results of the new data frame
class( yourdata )  # yourdata is a data frame..

sapply( yourdata , class )  # here's the class of each column

yourdata  # here's the whole data set printed to the screen

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sas by function in R

2013-01-03 Thread Anthony Damico

https://www.google.com/search?q=multiple+histograms+R  turns up a lot of
possible answers..  what's your desired output and how does what you're
trying to do differ from what's already been described on the web?  :)


On Thu, Jan 3, 2013 at 2:50 AM, catalin roibu wrote:

> Hello,
> It's an alternative to use SAS by function in R?
> I want to plot d histograms by plot.from example bellow:
> Thank you!
>plot d
> 1  1  16.3
> 2  1  25.0
> 3  1  57.8
> 4  1  17.0
> 5  2  10.8
> 13 2  96.4
> 17 3  76.0
> 18 3  32.0
> 19 3  11.0
> 20 3  11.0
> 24 3 106.0
> 25 3  12.5
> 21 4  19.3
> 22 4  12.0
> 26 4  15.0
> 27 5  99.3
> 32 7  11.0
> 36 8  18.5
> 38 8  77.0
> 4110   9.8
> 4210 101.5
> 4310  18.0
> 4410   6.0
> 4710  12.3
> 4911  80.8
> 5011  14.0
> 5112  12.5
> 5312  10.8
> 5713  14.5
> 6013  13.3
> 6113  12.5
> 6213 124.1
> 6814  92.3
> 6914  20.3
> 7014   9.5
> 7115   9.0
> 7516 104.1
> 8216  15.0
> 8317  62.8
> 8417  16.3
> 8617  95.2
> 8717  17.8
> 8918  14.0
> 9018 107.3
> 9118  22.5
> 9419  22.8
> 9519  16.5
> 9619  15.3
> 9821  47.8
> 101   22 110.1
> 103   22  14.3
> 107   23  73.3
> 108   24  14.5
> 109   24  93.9
> 115   24  68.5
> 114   25  96.4
> 124   26  44.5
> 122   27  18.0
> 125   27   7.0
> 126   27  14.0
> 127   28  13.8
> 128   28  80.0
> 138   29  43.5
> 139   30  69.8
> 143   30  12.8
> 144   31 102.2
> 146   31  15.8
> 147   31  63.0
> 148   31  63.5
> 149   32  31.3
> 150   32   6.3
> 153   33  82.1
> 289   34 112.7
> 165   35  80.0
> 170   36   5.0
> 171   36  80.0
> 172   37  11.5
> 173   37  95.5
> 176   37  31.3
> 180   38   7.0
> 181   38  54.0
> 182   39   6.5
> 183   39  20.5
> 187   39  63.0
> 188   39  60.0
> 190   40   9.0
> 198   41  44.0
> 199   41  64.0
> 204   42  70.0
> 205   43  37.0
> 207   43  57.3
> 213   43  21.3
> 209   44  10.5
> 217   44  43.5
> 218   44  33.8
> 219   45  21.8
> 225   45  19.3
> 226   45  13.5
> 227   45  77.0
> 228   45  24.0
> 231   46  16.3
> 236   46  31.5
> 237   46   8.3
> 238   47  70.3
> 242   47  17.0
> 249   49  16.5
> 250   49   7.8
> 256   51  62.8
> 257   51  64.8
> 258   52  94.9
> 264   53  94.5
> 266   53  36.8
> 273   56  19.5
> 274   56  12.8
> 277   56  62.5
> 278   56  40.0
> 279   57  17.0
> 282   57  70.5
> 283   58  10.5
> 284   58  48.0
> 285   58  59.8
> 286   59  10.3
> 288   59  16.8
> 293   59  19.8
> 297   60 108.5
> 299   61 106.0
> 301   61  22.8
> 300   62  15.3
> 302   62  17.3
> 306   63  57.5
> 309   63  15.8
> 310   63 104.9
> 311   63   7.8
> 312   63   5.0
> 313   63   5.0
> 316   63   4.8
> 314   64   6.3
> 315   64   5.0
> 317   64   9.5
> 318   64  21.3
> 323   64   5.0
> 326   64  54.0
> 324   65  17.5
> 325   65  68.0
> 333   65   6.8
> 334   65  13.8
> 335   66  80.5
> 336   66  15.5
> 343   66  28.8
> 344   66  12.5
> 350   67  27.0
> 351   67  21.8
> 352   67  25.3
> 355   68  85.3
> 359   68  18.0
> 363   70  10.8
> 364   70  14.8
> 370   71  13.0
> 374   72  90.1
> 377   72  33.8
> 378   72   7.8
> 379   72  14.3
> 381   73  72.0
> 392   74  66.0
> 393   74  24.8
> 390   75  14.8
> 391   75  93.3
> 396   75  12.5
> 397   75  15.0
> 398   76  22.0
> 399   76  90.4
> 400   76  53.0
> 404   77  21.3
> 405   78  11.5
> 406   78  60.3
> 407   78  65.5
> 408   78  69.0
> 409   78  30.8
> 410   78  70.0
> 422   79  16.5
> 423   79  15.8
> 424   79  13.5
> 425   79  22.0
> 426   79  51.5
> 427   80  55.0
> 428   80  26.3
> 429   80  55.3
> 430   80  24.8
> 431   80  55.0
> 435   80  35.0
> 436   80  13.3
> 6 81  11.8
> 7 81  16.5
> 8 81  13.8
> 9 81  15.0
> 1081  59.5
> 1182  26.5
> 1282  11.0
> 1482   7.8
> 1582   6.5
> 1683   5.3
> 2384  18.8
> 2885  29.0
> 2985  56.3
> 3086  71.3
> 3186  24.0
> 3387  12.0
> 3487  32.5
> 3588  96.1
> 3788  51.5
> 3989  55.8
> 4089  29.3
> 4689  16.8
> 4590  42.8
> 4891  85.9
> 5291  16.8
> 5991  15.8
> 5492   9.0
> 5592  12.3
> 5692  11.5
> 5892  11.8
> 6392  85.8
> 6493  13.5
> 6593  38.8
> 6693   9.0
> 6794  54.8
> 7295   8.8
> 7395  43.5
> 7495  12.3
> 7695  13.0
> 7795  80.5
> 7896  13.5
> 7996  75.0
> 8097  68.8
> 8197  22.8
> 8597  39.5
> 8897  43.3
> 9298  15.8
> 9399  14.8
> 97   100  59.5
> 99   101  91.4
> 100  102  80.9
> 102  102  79.6
> 104  102  84.7
> 105  102  15.0
> 106  102  14.3
> 110  104  16.8
> 111  104  16.3
> 112  104  13.3
> 113  104  52.0
> 116  105  14.0
> 117  106  85.3
> 118  106  14.0
> 119  107  78.8
> 120  107  10.8
> 121  107  15.5
> 123  107  67.0
> 130  107  22.8
> 131  107  15.0
> 129  108  91.4
> 132  108  19.3
> 133  109  12.5
> 134  109  16.0
> 135  109  10.3
> 136  109  57.8
> 137  109  59.0
> 140  109  12.0
> 141  110  16.5
> 142  110  58.5
> 145

Re: [R] Version Controlled CRAN Packages

2013-01-03 Thread Anthony Damico

are you looking for the 'old sources' link shown on every package homepage
on CRAN?

http://cran.r-project.org/web/packages/ggplot2/  > 'old sources' >
http://cran.r-project.org/src/contrib/Archive/ggplot2/



On Thu, Jan 3, 2013 at 9:33 AM, Mario Bourgoin  wrote:

> Dear Sir or Madam,
>
> The group of people with whom I work is now convinced of the usefulness of
> using R and its packages to meet our needs for statistical analysis.  It
> has become important that R programs and scripts we create today can be run
> by someone else tomorrow, so need to use version-control.  For this to work
> well, we need to version-control not just our code, but also R and the CRAN
> packages we use.  (We only use CRAN for now.)  Fortunately, R is under
> Subversion, and many CRAN packages are under Subversion in R-Forge.
> However, many CRAN packages do not appear to be available from R-Forge.
>
> 1- Are all CRAN packages available from some repository under version
> control?  (My guess is ``no.'')
> 2- Is there an identifier on CRAN that flags a package as under version
> control in a repository?  (My guess is ``no.'')
> 3- How does CRAN do version control for non-repository packages?  (My guess
> is ``through the generosity of volunteer administrators'' though I would
> prefer that some version control software be involved.)
> 4- Should we decide to create a local source repository to meet our needs?
> (My guess is ``that depends.'')
> 5- Where might I find examples of groups creating and maintaining local
> source repositories for R and its packages?
>
> Sincerely,
> --
> Mario Bourgoin
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inserting percentile values in a data frame

2013-01-03 Thread Anthony Damico

it doesn't work because you're trying to save something of the wrong length
into your data frame

I used agr1$quantile <- quantile(agr1$cnt, probs=c(.50, .75, .90, .95, .99))
>

# these two numbers should be the same
nrow( agr1 )
length( agr1$quantile )

# these two numbers should be the same
length( c(.50, .75, .90, .95, .99) )
length( quantile(agr1$cnt, probs=c(.50, .75, .90, .95, .99)) )

# but all four of those numbers aren't the same.  there's one answer for
each percentile, not one answer for each row.

# if you want the entire column in your agr1 data frame to contain the same
number,
# you could potentially add quantiles one at a time
agr1$median <- quantile( agr1$cnt , 0.5 )
agr1$p75 <- quantile( agr1$cnt , 0.75 )
# but that may look silly, since it's the same number over and over
head( agr1 )
# why not save them elsewhere?  :)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 >

1 - 100 of 224 matches

Mail list logo