IN inspect(removeSparseTerms(dtm, 0.4)) does anyone knows how the sparse
term
"A numeric for the maximal allowed sparsity" works? ie what is the
difference between say 0.2, 0.4 & 0.6?
Thanks for your help
--
View this message in context:
http://r.789695.n4.nabble.com/DTM-Package-r
The column length is 4000 bytes long if that helps.
--
View this message in context:
http://r.789695.n4.nabble.com/How-do-you-transform-a-dataframe-to-a-corpus-tp4683396p4683402.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@
Hi;
I have a data frame complains w/ dimensions 11335291 ( 1.13m obs 1 col)&
I am trying to transform it into a corpus
using the following code: myCorpus <-Corpus(VectorSource(complaints$text))
Error in .Source(readPlain, encoding, length(x), FALSE, names(x), 0, TRUE,
:
vectorized sources
Hi,
I am trying to use the package TM on a dataframe & get the following error:
complaints <- tm_map(complaints, tolower)
Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class
"data.frame"
Tm doesn't work on dataframes? My data frame consists of 1
Hello,
I have a column called max_date in my data frame and I only want to keep the
bigger values for the same activity. How can I do that?
data frame:
activitymax_dt
A2013-03-05
B 2013-03-28
A 2013-03-28
C 2013-03-28
B 2013-03-01
Thanks Adam your solution worked perfectly. Thank you all for your
responses.
--
View this message in context:
http://r.789695.n4.nabble.com/how-to-merge-2-data-frame-if-you-want-to-exclude-mutual-obs-tp4666975p4666985.html
Sent from the R help mailing list archive at Nabble.com.
To clarify:
So if in data frame A you have
TdatesymbolTA
12/12/12 AX 123
12/11/12 ZZA4R
12/12/12 WQ B8R
Data frame B
TdatesymbolTA
12/12/12 AX 123
12/11/12 ZZ
In the example below, I am merging 2 data frames & I want everything in the
first one(all)
all2 <- merge(all,spets, by.x=c("tdate","symbol"),
by.y=c("tdate","symbol"),all.x=TRUE)
What if I want to exclude everything in y? I tried below but doesn't seem to
work.
all2 <- merge(all,spets, by.x=c("tda
I want to clarify we are talking about 2 variables in a datframe here.
--
View this message in context:
http://r.789695.n4.nabble.com/subsetting-by-is-not-tp4666706p4666707.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-pro
Hello,
I have a simple question:
I know how to subset by is: buy1 <- subset(buy,buybdge==badge)
How do I subset if I don't want buybdge to equal badge?
Thanks ahead for your help
--
View this message in context:
http://r.789695.n4.nabble.com/subsetting-by-is-not-tp4666706.html
Sent from
Hello all,
I have a question: I am using the interquantile method to spot outliers &
it gives me values of say 234 & -120 or for the higher & lower benchmarks.
I don't have any issues w/ the higher end. However I don't have any
negative values. My lowest possible value is 0. Should I consider 0
Thanks everyone. The mutate function worked great:
all2<- mutate(all1,upper=p75+1.5*(p75-p25),lower=p25-1.5*(p75-p25))
--
View this message in context:
http://r.789695.n4.nabble.com/Using-PLYR-to-apply-a-custom-function-to-a-data-frame-tp4663897p4663902.html
Sent from the R help mailing list
Hello,
I am still struggling w/ the PLYR syntax. I am trying to build a customized
function to detect outliers in a data frame based on the interquantile
method. My data frame is called "ALL" & I am trying to create two new
variables in my data frame:
upper=q3+ 1.5*(q3-q1) & lower=q1-1.5*(q3-
Hello,
I was wandering if it is possible to perform on a dataframe called 'all' a
shapiro wilk normality test for COUNTS by variable Group
ACTIVITY? Could it be done using plyer? I saw an eg that applies to an
array but not to a dataframe:
lapply(split(dataset1$Height,dataset1$Group),shapiro.t
When I run yy <- ddply(all,"ACTIVIT", summarise, mode=mode(COUNTS))
I get : ACTIVITmode
XX numeric
ZZ numeric
& so on.
--
View this message in context:
http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235p
When I put in mode=mode(COUNTS) I get the value "numeric" as an answer. I
think it's giving me the data type not the mode.
--
View this message in context:
http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235p4663301.html
Sent from the R help mailing list archive at
I am trying to replicate the SAS proc univariate in R. I got most of the
stats I needed for a by grouping in a data frame using:
all1 <- ddply(all,"ACT_NAME", summarise, mean=mean(COUNTS), sd=sd(COUNTS),
q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50),
q75=quantile(COUNTS,.75),
I have a dataframe & wish to convert the NA (missing values) to zero . In SAS
I would use options missing=0 to convert all my obs in a dataset. How can I
accomplish the same thing in R? Can it be done? Thanks for any thoughts on
this.
--
View this message in context:
http://r.789695.n4.nabble
I have never used the data.table package. I am trying to do the following
SQL left join in R
create table all as select a.*
from dates b left outerjoin activitycount a on
a.tdate=b.tdate
and a.activity=b.activity
Hello,
I have 2 data frames: activity and dates. Activity contains a l variable
listing all activities: activityA, activityB etc.
The dates contain all the valid business dates. I need to combine the 2 so
that I get a single data frame activitydat that contains the activity name
along w/ evevr
Hello,
I need to subset my dataframe into 2 parts; in: mm <- subset(agr1,
subset=lmpcrd %in% c(11697,149823,7654))
not in: but where do I stick the " !" in the above? I've tried every
position.
Thanks for your help.
--
View this message in context:
http://r.789695.n4.nabble.com/Subs
Hello
I need to calculate and insert the values for the 50,75,90,95 & 99
percentiles in a data frame for each row.
I used agr1$quantile <- quantile(agr1$cnt, probs=c(.50, .75, .90, .95, .99))
but that didn't work.
How can calculate the percentile for my variable "cnt" , insert & name the
percent
Hello,
Does anyone here know how to use this package? Documentation most confusing.
I have a large CSV file w/ 6.8M obs & 19 variables. I am having memory
issues trying to upload it to Green plump using:
sqlSave(chann, rave, tablename="mossader_dev.rave", rownames=F, colnames=T)
How can I write
I found the answer;
Its mymissing <- subset(mydata,is.na(myvar))
--
View this message in context:
http://r.789695.n4.nabble.com/subset-data-frame-by-variable-with-missing-value-tp4651439p4651440.html
Sent from the R help mailing list archive at Nabble.com.
Hello,
I have a variable in a data frame that contains NA values. I just want to
subset so that I get the obs where that variable is missing.
In SAS I would do:
data missing;
set test;
if myvalue=' ';
run;
How can I perform this simple task in R?
Thanks in advance for your help.
--
View
Thank you again all responders. Dan your solution was both easy & miraculous.
--
View this message in context:
http://r.789695.n4.nabble.com/Can-you-have-a-by-variable-in-Lag-function-as-in-SAS-tp4649647p4649773.html
Sent from the R help mailing list archive at Nabble.com.
Hello,
I want to use lag on a time variable but I have to take date into
consideration ie I don't want days to overlap ie:
I don't want my first time of today to match my last time of yeterday.
In SAS I would use :
data x;
set y;
by date tim;
previous=lag(tim);
if first.date then
d
Hello,
I know how to increment a date by calendar date:
ticker$ldate <- ticker$tdate + days(5)
How do I increment it by business days only so that week-ends are not
counted?
So for example friday november 2 + 5days becomes friday november 9 & not
wednesday nov 7.
Thanks for your help.
--
Vi
Thanks for all the help guys.
This worked for me:
all6 <- arrange(all6, tdate,event_tim)
lt <- ddply(all6,.(tdate),tail,1)
lt$last_trans <-'Y'
all6 <-merge(all6,lt, by.x=c("tdate","event_tim"),
by.y=c("tdate","event_tim"),all.x=TRUE)
--
View this message in context:
http://r.789695.n4.nabbl
Hello,
I have a dataframe w/ 3 variables of interest: transaction,date(tdate) &
time(event_tim).
How could I create a 4th variable (last_trans) that would flag the last
transaction of the day for each day?
In SAS I use:
proc sort data=all6;
by tdate event_tim;
run;
/*Create last transacti
Hello,
I am trying to re-code all my programs from SAS into R.
In SAS I use the following code:
proc sort data=upper;
by tdate stock_symbol expire strike;
run;
data upper1;
set upper;
by tdate stock_symbol expire strike;
if first.expire then output;
rename strike=astrike;
run;
on the
Thanks to all who responded, particularly to Michael. Your solution was the
easiest to understand & to implement.
This worked beautifully:
cmtot <- arrange(cmtot, -PCTTOT)#sort by descending
top <- with(cmtot,which.max(cumsum(PCTTOT) >= 50))
topcm <- cmtot[seq(1,top),]
--
View this message
Hello,
I am a newbie to R coming from SAS background. I am trying to program the
following:
I have a monthly data frame with 2 variables:
client pct_total
A 15%
B 10%
C 10%
D 9%
E 8%
F 6%
G 4%
I need to come up w/ a monthly list o
It looks like they are all corrupted. I tried several other CRAN sites across
the world. How can we notify the package owner?
--
View this message in context:
http://r.789695.n4.nabble.com/Cannot-install-package-xlsx-tp4643054p4643142.html
Sent from the R help mailing list archive at Nabble.co
Thanks I was doing something similar in SAS. I was looping macro based on a
dataset containing the values:
data _null_;
set summary2;
mindat=put(datepart(mindate),date9.);
min_date='mindat_'|| trim(left(_n_));
put mindate= mindat= min_date=; /*check values in log*/
call symput (min_
I get following error message:
trying URL
'http://cran.stat.ucla.edu/bin/windows/contrib/2.15/xlsx_0.4.2.zip'
Content type 'application/zip' length 365611 bytes (357 Kb)
opened URL
downloaded 357 Kb
Error in read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) :
cannot open the co
Hello,
We lost our SAS licence & I am busy transfering my old SAS programs to R
environment. I am very new to R. In 1 program I was creating SAS macro
vars & passing them into a SQL query to run against the server. There are 3
variables firm, begindt, enddt. # of values for each varies month to
I am new to R and am encountering memory issues while trying to download a
large table from Green Plump, using sqlQuery. Is there any way this FF
package can help me create a large dataframe in R while downloading from the
server?
The FF documentations are very confusing. Thanks for any help
Hi,
Does anyone know how to upload a table to green plumb & have it be
distributed?
I know how to upload using sqlSave(chann, d, tablename="castaneg.wh_d",
rownames=F, colnames=T)
but how can I make my table be distributed randomly on the server.
In SAS you can use the option "distribute_on=rand
1)I am wandering how the following SQL statement can be written in R language
w/o using sqldf:
create table detail2 as
select a.*
from detail a,
pdetail b
where a.TDATE=b.TDATE
and(a.STIM >= b.STIM and a.STIM <=b.MAXTIM)
2) when try if then in R it only applies to the 1st row & not t
Thanks for your help guys. I was refering to the variables the wrong way.
This worked for me:
idx <- !duplicated(detail2[,c("TDATE","FIRM","CM","BRANCH",
"BEGTIME", "ENDTIME","OTYPE","OCOND",
"ACCTYP","OSIDE","SHARES","STOCKS",
"ST
I have a dataset w/ 184K obs & 16 variables. In SAS I proc sort nodupkey it
in seconds by 11 variables.
I tried to do the same thing in R using both the unique & then the
!duplicated functions but it just hangs there & I get no output. Does
anyone know how to solve this?
This is how I tried to d
I am new to R and I have the following SAS statements:
if otype='M' and ocond='1' and entry='a.Prop' then MOC=1;
else MOC=0;
How would I translate that into R code?
Thanks in advance
--
View this message in context:
http://r.789695.n4.nabble.com/if-then-in-R-versus-SAS-tp4641225.html
Sent
I used summary <-rbind.fill(agency,prop) & it worked like a charm. Thanks
everyone.
--
View this message in context:
http://r.789695.n4.nabble.com/Concatenating-data-frames-in-R-versus-SAS-tp4641138p4641219.html
Sent from the R help mailing list archive at Nabble.com.
_
I am trying to concatenate 2 datasets that don't have exactly the same
column.
In SAS I did: data summary;
set agency prop;
run;
No problem
in R I get error message
summary <-rbind(agency,prop)
Error in match.names(clabs, names(xi)) :
names do not match previous names
But when I use rbin.fi
Hello,
I am a SAS user new to R. What is the R equivalent to following SAS
statements:
1) data all;
merge test1(in=a)
test2(in=b)
;
by account_id;
if a;
run;
2) proc sort data=all nodupkey;
by account_id;
run;
3) data all test1onnly test2only;
merge test1(in=a)
46 matches
Mail list logo