from:"ramoss"

[R] DTM Package removeSparseTerms function question

2014-01-16 Thread ramoss

IN inspect(removeSparseTerms(dtm, 0.4)) does anyone knows how the sparse term "A numeric for the maximal allowed sparsity" works? ie what is the difference between say 0.2, 0.4 & 0.6? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/DTM-Package-r

Re: [R] How do you transform a dataframe to a corpus?

2014-01-10 Thread ramoss

The column length is 4000 bytes long if that helps. -- View this message in context: http://r.789695.n4.nabble.com/How-do-you-transform-a-dataframe-to-a-corpus-tp4683396p4683402.html Sent from the R help mailing list archive at Nabble.com. __ R-help@

[R] How do you transform a dataframe to a corpus?

2014-01-10 Thread ramoss

Hi; I have a data frame complains w/ dimensions 11335291 ( 1.13m obs 1 col)& I am trying to transform it into a corpus using the following code: myCorpus <-Corpus(VectorSource(complaints$text)) Error in .Source(readPlain, encoding, length(x), FALSE, names(x), 0, TRUE, : vectorized sources

[R] Package TM & dataframes

2014-01-10 Thread ramoss

Hi, I am trying to use the package TM on a dataframe & get the following error: complaints <- tm_map(complaints, tolower) Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "data.frame" Tm doesn't work on dataframes? My data frame consists of 1

[R] Removing rows w/ smaller value from data frame

2013-05-23 Thread ramoss

Hello, I have a column called max_date in my data frame and I only want to keep the bigger values for the same activity. How can I do that? data frame: activitymax_dt A2013-03-05 B 2013-03-28 A 2013-03-28 C 2013-03-28 B 2013-03-01

Re: [R] how to merge 2 data frame if you want to exclude mutual obs

2013-05-13 Thread ramoss

Thanks Adam your solution worked perfectly. Thank you all for your responses. -- View this message in context: http://r.789695.n4.nabble.com/how-to-merge-2-data-frame-if-you-want-to-exclude-mutual-obs-tp4666975p4666985.html Sent from the R help mailing list archive at Nabble.com.

Re: [R] how to merge 2 data frame if you want to exclude mutual obs

2013-05-13 Thread ramoss

To clarify: So if in data frame A you have TdatesymbolTA 12/12/12 AX 123 12/11/12 ZZA4R 12/12/12 WQ B8R Data frame B TdatesymbolTA 12/12/12 AX 123 12/11/12 ZZ

[R] how to merge 2 data frame if you want to exclude mutual obs

2013-05-13 Thread ramoss

In the example below, I am merging 2 data frames & I want everything in the first one(all) all2 <- merge(all,spets, by.x=c("tdate","symbol"), by.y=c("tdate","symbol"),all.x=TRUE) What if I want to exclude everything in y? I tried below but doesn't seem to work. all2 <- merge(all,spets, by.x=c("tda

Re: [R] subsetting by is not

2013-05-09 Thread ramoss

I want to clarify we are talking about 2 variables in a datframe here. -- View this message in context: http://r.789695.n4.nabble.com/subsetting-by-is-not-tp4666706p4666707.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-pro

[R] subsetting by is not

2013-05-09 Thread ramoss

Hello, I have a simple question: I know how to subset by is: buy1 <- subset(buy,buybdge==badge) How do I subset if I don't want buybdge to equal badge? Thanks ahead for your help -- View this message in context: http://r.789695.n4.nabble.com/subsetting-by-is-not-tp4666706.html Sent from

[R] Stat question: How to deal w/ negative outliers?

2013-04-12 Thread ramoss

Hello all, I have a question: I am using the interquantile method to spot outliers & it gives me values of say 234 & -120 or for the higher & lower benchmarks. I don't have any issues w/ the higher end. However I don't have any negative values. My lowest possible value is 0. Should I consider 0

Re: [R] Using PLYR to apply a custom function to a data frame

2013-04-10 Thread ramoss

Thanks everyone. The mutate function worked great: all2<- mutate(all1,upper=p75+1.5*(p75-p25),lower=p25-1.5*(p75-p25)) -- View this message in context: http://r.789695.n4.nabble.com/Using-PLYR-to-apply-a-custom-function-to-a-data-frame-tp4663897p4663902.html Sent from the R help mailing list

[R] Using PLYR to apply a custom function to a data frame

2013-04-10 Thread ramoss

Hello, I am still struggling w/ the PLYR syntax. I am trying to build a customized function to detect outliers in a data frame based on the interquantile method. My data frame is called "ALL" & I am trying to create two new variables in my data frame: upper=q3+ 1.5*(q3-q1) & lower=q1-1.5*(q3-

[R] How to perform a grouped shapiro wilk test on dataframe

2013-04-05 Thread ramoss

Hello, I was wandering if it is possible to perform on a dataframe called 'all' a shapiro wilk normality test for COUNTS by variable Group ACTIVITY? Could it be done using plyer? I saw an eg that applies to an array but not to a dataframe: lapply(split(dataset1$Height,dataset1$Group),shapiro.t

Re: [R] Can package plyr also calculate the mode?

2013-04-04 Thread ramoss

When I run yy <- ddply(all,"ACTIVIT", summarise, mode=mode(COUNTS)) I get : ACTIVITmode XX numeric ZZ numeric & so on. -- View this message in context: http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235p

Re: [R] Can package plyr also calculate the mode?

2013-04-04 Thread ramoss

When I put in mode=mode(COUNTS) I get the value "numeric" as an answer. I think it's giving me the data type not the mode. -- View this message in context: http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235p4663301.html Sent from the R help mailing list archive at

[R] Can package plyr also calculate the mode?

2013-04-03 Thread ramoss

I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 <- ddply(all,"ACT_NAME", summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75),

[R] What is SAS options missing=0 equivalent in R?

2013-04-02 Thread ramoss

I have a dataframe & wish to convert the NA (missing values) to zero . In SAS I would use options missing=0 to convert all my obs in a dataset. How can I accomplish the same thing in R? Can it be done? Thanks for any thoughts on this. -- View this message in context: http://r.789695.n4.nabble

[R] Left join in R

2013-04-01 Thread ramoss

I have never used the data.table package. I am trying to do the following SQL left join in R create table all as select a.* from dates b left outerjoin activitycount a on a.tdate=b.tdate and a.activity=b.activity

[R] Data frame question

2013-04-01 Thread ramoss

Hello, I have 2 data frames: activity and dates. Activity contains a l variable listing all activities: activityA, activityB etc. The dates contain all the valid business dates. I need to combine the 2 so that I get a single data frame activitydat that contains the activity name along w/ evevr

[R] Subset in, not in

2013-01-10 Thread ramoss

Hello, I need to subset my dataframe into 2 parts; in: mm <- subset(agr1, subset=lmpcrd %in% c(11697,149823,7654)) not in: but where do I stick the " !" in the above? I've tried every position. Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/Subs

[R] Inserting percentile values in a data frame

2013-01-03 Thread ramoss

Hello I need to calculate and insert the values for the 50,75,90,95 & 99 percentiles in a data frame for each row. I used agr1$quantile <- quantile(agr1$cnt, probs=c(.50, .75, .90, .95, .99)) but that didn't work. How can calculate the percentile for my variable "cnt" , insert & name the percent

[R] Help w/ FF package to upload large file.

2012-12-31 Thread ramoss

Hello, Does anyone here know how to use this package? Documentation most confusing. I have a large CSV file w/ 6.8M obs & 19 variables. I am having memory issues trying to upload it to Green plump using: sqlSave(chann, rave, tablename="mossader_dev.rave", rownames=F, colnames=T) How can I write

Re: [R] subset data frame by variable with missing value

2012-11-30 Thread ramoss

I found the answer; Its mymissing <- subset(mydata,is.na(myvar)) -- View this message in context: http://r.789695.n4.nabble.com/subset-data-frame-by-variable-with-missing-value-tp4651439p4651440.html Sent from the R help mailing list archive at Nabble.com.

[R] subset data frame by variable with missing value

2012-11-30 Thread ramoss

Hello, I have a variable in a data frame that contains NA values. I just want to subset so that I get the obs where that variable is missing. In SAS I would do: data missing; set test; if myvalue=' '; run; How can I perform this simple task in R? Thanks in advance for your help. -- View

Re: [R] Can you have a by variable in Lag function as in SAS

2012-11-16 Thread ramoss

Thank you again all responders. Dan your solution was both easy & miraculous. -- View this message in context: http://r.789695.n4.nabble.com/Can-you-have-a-by-variable-in-Lag-function-as-in-SAS-tp4649647p4649773.html Sent from the R help mailing list archive at Nabble.com.

[R] Can you have a by variable in Lag function as in SAS

2012-11-15 Thread ramoss

Hello, I want to use lag on a time variable but I have to take date into consideration ie I don't want days to overlap ie: I don't want my first time of today to match my last time of yeterday. In SAS I would use : data x; set y; by date tim; previous=lag(tim); if first.date then d

[R] Using lubridate to increment date by business days only

2012-11-13 Thread ramoss

Hello, I know how to increment a date by calendar date: ticker$ldate <- ticker$tdate + days(5) How do I increment it by business days only so that week-ends are not counted? So for example friday november 2 + 5days becomes friday november 9 & not wednesday nov 7. Thanks for your help. -- Vi

Re: [R] Creating a new by variable in a dataframe

2012-10-19 Thread ramoss

Thanks for all the help guys. This worked for me: all6 <- arrange(all6, tdate,event_tim) lt <- ddply(all6,.(tdate),tail,1) lt$last_trans <-'Y' all6 <-merge(all6,lt, by.x=c("tdate","event_tim"), by.y=c("tdate","event_tim"),all.x=TRUE) -- View this message in context: http://r.789695.n4.nabbl

[R] Creating a new by variable in a dataframe

2012-10-19 Thread ramoss

Hello, I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & time(event_tim). How could I create a 4th variable (last_trans) that would flag the last transaction of the day for each day? In SAS I use: proc sort data=all6; by tdate event_tim; run; /*Create last transacti

[R] How to replicate SAS by group processing in R

2012-10-10 Thread ramoss

Hello, I am trying to re-code all my programs from SAS into R. In SAS I use the following code: proc sort data=upper; by tdate stock_symbol expire strike; run; data upper1; set upper; by tdate stock_symbol expire strike; if first.expire then output; rename strike=astrike; run; on the

Re: [R] Conditional operations in R

2012-09-18 Thread ramoss

Thanks to all who responded, particularly to Michael. Your solution was the easiest to understand & to implement. This worked beautifully: cmtot <- arrange(cmtot, -PCTTOT)#sort by descending top <- with(cmtot,which.max(cumsum(PCTTOT) >= 50)) topcm <- cmtot[seq(1,top),] -- View this message

[R] Conditional operations in R

2012-09-18 Thread ramoss

Hello, I am a newbie to R coming from SAS background. I am trying to program the following: I have a monthly data frame with 2 variables: client pct_total A 15% B 10% C 10% D 9% E 8% F 6% G 4% I need to come up w/ a monthly list o

Re: [R] Cannot install package xlsx

2012-09-14 Thread ramoss

It looks like they are all corrupted. I tried several other CRAN sites across the world. How can we notify the package owner? -- View this message in context: http://r.789695.n4.nabble.com/Cannot-install-package-xlsx-tp4643054p4643142.html Sent from the R help mailing list archive at Nabble.co

Re: [R] Paasing values to sqlQuery like SAS macro

2012-09-13 Thread ramoss

Thanks I was doing something similar in SAS. I was looping macro based on a dataset containing the values: data _null_; set summary2; mindat=put(datepart(mindate),date9.); min_date='mindat_'|| trim(left(_n_)); put mindate= mindat= min_date=; /*check values in log*/ call symput (min_

[R] Cannot install package xlsx

2012-09-13 Thread ramoss

I get following error message: trying URL 'http://cran.stat.ucla.edu/bin/windows/contrib/2.15/xlsx_0.4.2.zip' Content type 'application/zip' length 365611 bytes (357 Kb) opened URL downloaded 357 Kb Error in read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) : cannot open the co

[R] Paasing values to sqlQuery like SAS macro

2012-09-13 Thread ramoss

Hello, We lost our SAS licence & I am busy transfering my old SAS programs to R environment. I am very new to R. In 1 program I was creating SAS macro vars & passing them into a SQL query to run against the server. There are 3 variables firm, begindt, enddt. # of values for each varies month to

[R] FF package & downloading a large file using sqlQuery

2012-09-06 Thread ramoss

I am new to R and am encountering memory issues while trying to download a large table from Green Plump, using sqlQuery. Is there any way this FF package can help me create a large dataframe in R while downloading from the server? The FF documentations are very confusing. Thanks for any help

[R] help w/ uploading table frm R to green plump

2012-09-05 Thread ramoss

Hi, Does anyone know how to upload a table to green plumb & have it be distributed? I know how to upload using sqlSave(chann, d, tablename="castaneg.wh_d", rownames=F, colnames=T) but how can I make my table be distributed randomly on the server. In SAS you can use the option "distribute_on=rand

[R] Conditional merging in R & if then statement

2012-08-31 Thread ramoss

1)I am wandering how the following SQL statement can be written in R language w/o using sqldf: create table detail2 as select a.* from detail a, pdetail b where a.TDATE=b.TDATE and(a.STIM >= b.STIM and a.STIM <=b.MAXTIM) 2) when try if then in R it only applies to the 1st row & not t

Re: [R] Deduping in R by multiple variables

2012-08-30 Thread ramoss

Thanks for your help guys. I was refering to the variables the wrong way. This worked for me: idx <- !duplicated(detail2[,c("TDATE","FIRM","CM","BRANCH", "BEGTIME", "ENDTIME","OTYPE","OCOND", "ACCTYP","OSIDE","SHARES","STOCKS", "ST

[R] Deduping in R by multiple variables

2012-08-29 Thread ramoss

I have a dataset w/ 184K obs & 16 variables. In SAS I proc sort nodupkey it in seconds by 11 variables. I tried to do the same thing in R using both the unique & then the !duplicated functions but it just hangs there & I get no output. Does anyone know how to solve this? This is how I tried to d

[R] if then in R versus SAS

2012-08-24 Thread ramoss

I am new to R and I have the following SAS statements: if otype='M' and ocond='1' and entry='a.Prop' then MOC=1; else MOC=0; How would I translate that into R code? Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/if-then-in-R-versus-SAS-tp4641225.html Sent

Re: [R] Concatenating data frames in R versus SAS

2012-08-24 Thread ramoss

I used summary <-rbind.fill(agency,prop) & it worked like a charm. Thanks everyone. -- View this message in context: http://r.789695.n4.nabble.com/Concatenating-data-frames-in-R-versus-SAS-tp4641138p4641219.html Sent from the R help mailing list archive at Nabble.com. _

[R] Concatenating data frames in R versus SAS

2012-08-23 Thread ramoss

I am trying to concatenate 2 datasets that don't have exactly the same column. In SAS I did: data summary; set agency prop; run; No problem in R I get error message summary <-rbind(agency,prop) Error in match.names(clabs, names(xi)) : names do not match previous names But when I use rbin.fi

[R] Merging data in R compared to SAS

2012-08-22 Thread ramoss

Hello, I am a SAS user new to R. What is the R equivalent to following SAS statements: 1) data all; merge test1(in=a) test2(in=b) ; by account_id; if a; run; 2) proc sort data=all nodupkey; by account_id; run; 3) data all test1onnly test2only; merge test1(in=a)

46 matches

Mail list logo