Try the stringr package.
This should work
chemical=c("basic", "alkalin", "alkali", "acid", " ph ", "hss")
chemical_match <- str_c(chemical, collapse = "|")
chemical_match
concept_df$match[str_detect(concept_df$concept, chemical_match)] <-
"chemical"
concept_df
> concept_df
conc
I have a vector of values, and have written a function that takes each
value in that vector, generates a normal distribution with that value as
the mean, and then finds the interval at different levels. However, these
intervals don't seem to be right (too narrow).
### CREATE PREDICTION INTERVALS
So I have two data frames.
The first one is a reccomendation data frame and the second is a melted
list with a pairing of OpportunityId's and ProductId's. There are multiple
product id's per an opportunty id. What I want to do is merge based on
ProductId so that I can add the OpportunityId to the
I'm getting an "invalid first argument" error for the following. However,
con is an actual connection and is set up properly. So what does this error
actually refer to?
library(dplyr)
con <- RSQLServer::src_sqlserver("***", database = "***")
myData <- con %>%
tbl("table") %>%
group_by( work_d
I'm using the knitr package to post an .Rmd file to wordpress. First time
I'm working this type of project and am having the following error/issue.
Can anyone help identify the issue. Have done a number of Google searches
but haven't seem similar issues. Also tried to use the newPost function for
t
I have a user-defined function that I'm using alongside a postgresql
connection to
summarize some data. I've connected to the local machine with no problem.
However,
the connection keeps throwing the following error when I attempt to use it.
Can anyone point
to what I could be doing wrong.
> ds_su
Very simple question that I want confirm.
Let's say that I have a response variable. What are the appropriate ways
that it can be coded for a logistic regression model?
1. It can be 0/1 and a factor
2. It can be 1/2 and a factor
3. It can be characters and a factor, where the second letter takes
Let's say I have a corpus and want to find the two, three, etc word phrases
that occur most frequently in the data. I normally do this in the following
manner but am getting an error message and am having some difficulty
diagnosing what is going wrong. Given the following data, I'd just want a
coun
I have the following data frame. Using the stringr package, I've attempted
to map the url's to some specific elements that are in each url. I then
used the reshape package to join two different data frames. The next step
is to transform the two columns in the mydt data frame (forester and
customer_
gt; dices the url.
>
> library(XML)
> parseURI('http://www.mdd.com/food/pizza/index.html')
>
> Might that help?
>
> Cheers,
> Ben
>
> On Mar 6, 2014, at 12:23 PM, Abraham Mathew wrote:
>
> > Let's say that I have the following character vector with a s
should be
/food/pizza/index.html
build-your-own/index.html
/special-deals.html
If anyone has a solution using the stringr package, that'd be of interest
also.
Thanks.
--
*Abraham Mathew**Analytics Strategist*
*Minneapolis, MN*
*720-648-0108*
*abmathe...@gmail.com *
*Twitter <https://twit
Let's say I have the following data frame and the date column has two
different ways in which date is presented. How can I use as.Date or the
lubridate package to have one date structure for the entire colum
df = data.frame(Date=c("5/1/13","8/1/13","9/1/13","Apr-10",
"Apr-11","Apr-1
I'm trying to educate myself about predictive analytics and am using R to
generate a linear model with the following data.
age <- c(23, 19, 25, 10,9, 12, 11,8)
steroid <- c(27.1, 22.1, 21.9, 10.7, 7.4, 18.8, 14.7, 5.7)
gpa <- c( 2.1, 2.9, 2.8, 3.5, 3.2, 3.9, 2.8, 2.6)
sample
ulate predict() such that I can get a similar output as ^^.
mod1 = glm(posted ~ amount, data=ndat, family=binomial(link="probit"))
summary(mod1)
Can anyone help?
Thanks!
--
*Abraham Mathew
Statistical Analyst
**720-648-0108*
*abmathe...@gmail.com
*
*Twitter <https://twitter.com/abma
I want to construct a logit model, plot the probability curve with the
confidence intervals, and then I want to
print out a data frame with the predictor, response value, predicted value,
the low ci predicted value, and the
high ci predicted value. So it should look something like:
value low_ci
LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United
States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods base
other attached packages:
[1] effects_2.2-1colorspace_1.1-
;,"GOOD"))
dat$won = factor(dat$won)
dat$sold = factor(dat$sold)
dat$insured = factor(dat$insured)
dat$credit = factor(dat$credit)
highlight.opts <- list(nodes = c("won","sold","insured","credit"),
col = "red"
nt","rent","own"),
income=c(50,20,20,50,50), gender=c("M","M","F","F","F"))
df$sell = as.factor(df$sell)
df$home = as.factor(df$home)
df$income = as.factor(df$income)
df$gender = as.factor(df$gender)
str(df)
m1
not use
>
> ?predict.glm ## with type = "response" ?
>
> -- Bert
>
> On Mon, Aug 13, 2012 at 12:39 PM, Abraham Mathew
> wrote:
> > I'm trying to run a logit model and plot the probability curve for a
> number
> > of the important predictors. I'm trying to
or any of the other predictors).
However, what I want to do is generate the same plot, with won don y axis
and income on x axis, but the curves showing the probabilities for age and
home.
Not seeing how to do this in the effects documentation. Help!
Thanks.
--
*Abraham Mathew
Statistical A
5,8,3,5,4,2,3,5), purchase=c(6,3,4,5,5,5,6,2,3,7),
sold=c(0,1,0,0,0,1,1,0,0,1))
f
Thanks.
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
R-help@r-project.o
ls(dat$final_purchase_amount)character(0)
Can anyone point to what I'm doing wrong.
Thanks!
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
R-help@r-projec
ny diagnostic test to determine the overall
misclassification rate
of a NB classifier, and if there is a function in R that is available to
implement it?
Thanks,
Abraham
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
ub to find a solution, and there doesn't seem
to be anything helpful in the stringr package for this task.
Thanks
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
R-help
I've been looking into the effects package and it seems to be a great tool
for plotting the probabilities of the
response variable by the predictors. However, I'm wonder if I can use the
effects package to plot the probabilities
on the y axis and one predictor on the x axis, with the curve having t
model. Do I simply expand the
>> expand.grid() function to include all the variables?
>>
>> So my question is how do I form a plot of a logit probability curve when I
>> have 10 predictors?
>>
>> would be nice to do this in ggplot2.
>>
>> Thank
ot;l")
I'm not sure how to proceed when I have 10 or so predictors in the logit
model. Do I simply expand the
expand.grid() function to include all the variables?
So my question is how do I form a plot of a logit probability curve when I
have 10 predictors?
would be nice to do
Let's say I have a variable, day, which is saved as a factor with 7 levels,
and I use it in a
logistic regression model. I ran the model using the car package in R and
printed out the
results.
mod1 = glm(factor(status1) ~ factor(day), data=mydat,
family=binomial(link="logit"))
print(summary(mod1))
2" "00:05:22"
[23] "00:05:28" "00:05:44" "00:05:54" "00:06:54" "00:06:54" "00:07:10"
"00:08:15" "00:08:26"
What I am trying to do is group the data into one hour incr
ot;))
On Thu, Dec 22, 2011 at 8:23 AM, Abraham Mathew wrote:
>
> I'm working on a logistic regression in R with the car package but keep
> getting the following error message.
> It's only and warning and not an error, but I'm just not sure how to
> resolve the issues
ot(s.out)
When I run with mbid as 300, I get 49%.
At 500, it's 49% and at 700 it's 50%.
At 1500, it's 51%
These results are just really weird.
I was expecting an exponential curve when I plotted
mbid by probability of winning, but that doesn't seem
to be the case
...
$ mbid: int 700 300 700 300 500 300 300 700 300 300 ...
Can anyone tell me what I should do to fix the warnings.
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
___
845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
> ------
>
>
>
>
> On Wed, Dec 21, 2011 at 12:48 PM, Abraham Mathew wrote:
>
>>
li.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
> ----------
>
>
>
>
> On Wed, Dec 21, 2011 at 12:04 PM, Abraham Mathew wrote:
>
>>
>> I looked into what you suggested an
w) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
> ----------
>
>
>
>
> On Wed, Dec 21, 2011 at 6:59 AM, Abraham Mathew wrote:
>
>> Lets say I have a linear model and I want to find the a
combination of values in the independent variables.
So Expected price when:
weather=1, gender=male
weather=1, gender=female
weather=2, gender=male
etc.
Can anyone help with this problem?
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML
ran everything but the explanatory variable as a
numeric variable. Now, I'm trying everything and no luck.
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
R-help@r-
lternative solution that I can use to generate
the probabilities.
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mail
dimensions
I googled this problem and couldn't find anything, minus a question by
me on this same problem from 1.5 years ago. Just don't remember what I
did to solve the problem.
Help!
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[
s(educ)[c(3,5)]] <- "Advanced Degree"
educ2[educ %in% levels(educ)[c(6,8)]] <- "Other"
educ2 = factor(educ2)
levels(educ2)
The above code is how I regrouped the variable. How can I regroup it so
that it's levels
are from lowest to highest. What if they're numeric
This mean
First, I am no expert but I am analyzing some marketing data.
I have information on two versions of the same site, and I have data
on the number of times people filled out a form on each version
of the site.
Sample data:
Site 1 Site 2
Fill
;t have a package
installed
that is necessary for RODBC. What is that package?
--
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.c
I didn't learn about data tables until recently. (They're never covered in
any intro R books).
In any case, I'm not sure what (if any) is the difference between a data
frame and a data table.
Can anyone provide a brief explanation?
Is one preferred over another or is it just dependent on the tas
Lets say I have the following data frame.
df = data.frame(word = c("David", "James", "Sara", "Jamie", "Jon"))
df
I was trying to place brackets , [ ] , around each string.
I'll be exporting it with write.table and quotes=FALSE, so it will
eventually look like:
[David]
[James]
[Sara]
Can
I'm creating a function in R. However, I have a large number of function
parameters, and
need to find an efficient solution for running the function with all the
parameters.
So in the following function, I have about 20 parameters that I assign to
the function, with
almost all the values being diff
I'm trying to install the XML package on Ubuntu 10.10, and I keep getting
a warning message the XML could not be found and had non-zero exit
status. How can I fix this problem?
> install.packages()
Loading Tcl/Tk interface ... done
--- Please select a CRAN mirror for use in this session ---
Instal
I'm trying to develop a stacked bar plot in R with ggplot2.
My data:
conv = c(10, 4.76, 17.14, 25, 26.47, 37.5, 20.83, 25.53, 32.5, 16.7, 27.33)
click = c(20, 42, 35, 28, 34, 48, 48, 47, 40, 30, 30)
date = c("July 7", "July 8", "July 9", "July 10", "July 11", "July 12",
"July 13",
"July 14", "Jul
This is a very basic question, so please bear with me.
I've been learning about AB Testing, which is largely used in internet
marketing to examine the effectiveness of certain aspects of ads, websites,
etc. Here's a couple links to people who want to know more about AB Testing:
http://visualwebsi
I have a data frame that looks as follows.
df <- data.frame(city=c("Houston", "Houston", "El Paso", "Waco", Houston",
"Plano", "Plano")
What I want to do is get a list of the city values. Currently, when I run
df$city, I get all the values.
I just want to know the four cities that appear.
So ins
I'm working with some data, and am trying to generate it in the following
format.
statecity zipcode
I like pizza0 0 0
I live in Denver 0 1 0
I have a number of strings in a vector, and want the output to be seperated
by commas.
> t [1] "35004" "35005" "35006" "35007" "35010" "35014" "35016"
So I want want it to look like:
"35004", 35005", "35006", "35007",...
Can anyone help? I initially thought strsplit would be the correct
funct
I'm trying to find the total number of letters in a row of a data frame.
Let's say I have the following data frame.
f1 <- data.frame(keyword=c("I live in Denver", I live in Kansas City, MO",
"Pizza is good"))
The following function gives me the number of characters in each string.
So for "I live
I have a repetative task in R and i'm trying to find a more efficient way to
perform
the following task.
lst <- list(roots = c("car insurance", "auto insurance"),
roots2 = c("insurance"), prefix = c("cheap", "budget"),
prefix2 = c("low cost"), suffix = c("quote", "quotes
I passed it as an argument to the function because every week I'll need to
add keywords to the lst, and that function will make the process more
automated.
On Thu, Jun 9, 2011 at 10:21 AM, Sarah Goslee wrote:
> On Thu, Jun 9, 2011 at 11:53 AM, Abraham Mathew
> wrote:
> >
a bad idea, because
> it guarantees that nobody but you can ever use it. And why would you,
> rather than passing the working directory as an argument if it's
> crucial?
>
> Sarah
>
>
> On Thu, Jun 9, 2011 at 11:14 AM, Abraham Mathew
> wrote:
> > I have a rea
I have a really long functions, and at the end of the function, I am using a
if statement
to tag certain keywords based on whether they have certain values contained
in them.
However, the if statement doesn't seem to work.
When I had split up the commands into various functions, it worked fine, b
I'm trying to run a function inside a function but get an error message.
lst <- list(roots = c("car insurance", "auto insurance"),
roots2 = c("insurance"), prefix = c("cheap", "budget"),
prefix2 = c("low cost"), suffix = c("quote", "quotes"),
suffix2 = c("rate", "rates"), suffix3 = c("comparison")
I'm writing a function and keep getting the following error message.
myfunc <- function(lst) {
lst <- list(roots = c("car insurance", "auto insurance"),
roots2 = c("insurance"), prefix = c("cheap", "budget"),
prefix2 = c("low cost"), suffix = c("quote", "quotes"),
suffix2 = c("rate", "rates"), suf
I have a series of strings and I am trying to find all combinations and then
assign 1 or 0 to them based
on whether they contain the words car or budged. I want the data to look
like:
car budget
cheap car insurance quote10
budget car insurance quote 11
<- one(roots, suffix)
> rbind(d1, d2)
>
> To see a potential flaw in your function (as least as far as console
> output is concerned), try
> rbind(d1, one(roots, suffix))
>
> HTH,
> Dennis
>
> On Tue, Jun 7, 2011 at 3:30 PM, Abraham Mathew
> wrote:
> > Let
Let's say that I'm trying to write a functions that will allow me to
automate a process
where I examine all possible combinations of various string groupings. Each
time I run
the one function, I want to include the new values to the end of a data
frame. The data
frame will basically be one column w
I'm running R 2.13 on Ubuntu 10.10
I have a data set which is comprised of character strings.
site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt')
dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499")
dat
I want to loop through the data and construct a data fra
I have the following data:
prefix <- c("cheap", "budget")
roots <- c("car insurance", "auto insurance")
suffix <- c("quote", "quotes")
prefix2 <- c("cheap", "budget")
roots2 <- c("car insurance", "auto insurance")
roots3 <- c("car insurance", "auto insurance")
suffix3 <- c("quote", "quotes")
df
Let's say that I have a string and I want to know if a single word
is present in the string. I've written the following function to see if
the word "Geico" is mentioned in the string "Cheap Geico car insurance".
However, it doesn't work, and I assume it has something to do with the any()
function.
I have a data frame in R with the following values.
cars
autocar
cars info
what is that
donna drive
car
telephone
i need car...
I want to select all values which contain 'car', values with three
words, and those keywords with car that contain three words.
The first part is done with :
sqldf("SE
Hello Folks,
I'm working on trying to scrape my first web site and ran into a issue
because I'm really
don't know anything about regular expressions in R.
library(XML)
library(RCurl)
site <- "http://thisorthat.com/leader/month";
site.doc <- htmlParse(site, ?, xmlValue)
At the ?, I realize that
I'm using the subset() function in R.
dat <- data.frame(one=c(6,7,8,9,10), Number=c(5,15,13,1,13))
subset(dat, Number >= 10)
However, I want to find the number of all rows who meet the Number>=10
condition.
I've done this in the past with something like colSums or rowSums or another
similar fun
02
unfortunately, can I delete the Year and Month Columns.
Once that's done, I can reconfigure the columns
Abraham
On Thu, Apr 28, 2011 at 11:00 AM, Abraham Mathew wrote:
>
> Hi folks, I have a simple question that I just can't solve.
>
> I'm trying to merge two column
Hi folks, I have a simple question that I just can't solve.
I'm trying to merge two columns in my data frame.
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)
> head(dat)
Year Month Number
2002 Jan 0
2002 Feb 0
2002 March0
2002 April
Hi Folks,
I'm new to the linux world and am having some trouble installing the RSQLite
package.
SQLite is installed, but some dependencies(?) seem to be missing.
Can anyone help?
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)
> install.packages()
Installing
> on your Ubuntu machine.
>
> - Phil
>
>
>
>
> On Mon, 25 Apr 2011, Abraham Mathew wrote:
>
> Hello folks,
>>
>>
>> Here's is info on what system I'm working on.
>>
>>> sessionInfo()
>>>
>> R version 2.13.0 (2
Hello folks,
Here's is info on what system I'm working on.
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)
I'm trying to install the XML package. However, I end up with the following
error message.
> install.packages("XML")
checking for xml2-config... no
72 matches
Mail list logo