[R] any r package can handle factor levels not in the test set

2015-01-12 Thread HelponR
It looks like gbm, glm all has this issue

I wonder if any R package is immune of this?

In reality, it is very normal that test data has data unseen in training
data. It looks like I have to give up R?

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any r package can handle factor levels not in the test set

2015-01-13 Thread HelponR
Thanks for your reply. But I cannot control the data.
I am dealing with real world stream data. It is very normal that the test
data(when you apply model to do prediction) have new values that are not
seen in training data.
If I code myself, I would give a random guess or just an intercept for such
situation. But it seems most R package returns an error and exit.

On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger 
wrote:

> You need to define the levels of the training set to include all
> levels that you might see.
> Something like this
>
> > A <- factor(letters[1:5])
> > B <- factor(letters[c(1,3,5,7,9)])
> > A
> [1] a b c d e
> Levels: a b c d e
> > B
> [1] a c e g i
> Levels: a c e g i
> > training <- factor(A, levels=unique(c(levels(A), levels(B
> > training
> [1] a b c d e
> Levels: a b c d e g i
> >
>
> In the future please "provide commented, minimal, self-contained,
> reproducible code."
>
> On Mon, Jan 12, 2015 at 9:00 PM, HelponR  wrote:
> > It looks like gbm, glm all has this issue
> >
> > I wonder if any R package is immune of this?
> >
> > In reality, it is very normal that test data has data unseen in training
> > data. It looks like I have to give up R?
> >
> > Thanks!
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any r package can handle factor levels not in the training set

2015-01-13 Thread HelponR
sorry I notice the email subject is not accurate.

to be specific, when I do predict, there are error messages like

factor x has new levels 1, 2

Here x is an attribute(independent var), not outcome.

I wonder if the incremental packages (if any) solve this problem? Maybe it
is time to write my own package.

On Tue, Jan 13, 2015 at 8:59 AM, HelponR  wrote:

> Thanks for your reply. But I cannot control the data.
> I am dealing with real world stream data. It is very normal that the test
> data(when you apply model to do prediction) have new values that are not
> seen in training data.
> If I code myself, I would give a random guess or just an intercept for
> such situation. But it seems most R package returns an error and exit.
>
>
> On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger 
> wrote:
>
>> You need to define the levels of the training set to include all
>> levels that you might see.
>> Something like this
>>
>> > A <- factor(letters[1:5])
>> > B <- factor(letters[c(1,3,5,7,9)])
>> > A
>> [1] a b c d e
>> Levels: a b c d e
>> > B
>> [1] a c e g i
>> Levels: a c e g i
>> > training <- factor(A, levels=unique(c(levels(A), levels(B
>> > training
>> [1] a b c d e
>> Levels: a b c d e g i
>> >
>>
>> In the future please "provide commented, minimal, self-contained,
>> reproducible code."
>>
>> On Mon, Jan 12, 2015 at 9:00 PM, HelponR  wrote:
>> > It looks like gbm, glm all has this issue
>> >
>> > I wonder if any R package is immune of this?
>> >
>> > In reality, it is very normal that test data has data unseen in training
>> > data. It looks like I have to give up R?
>> >
>> > Thanks!
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to scientific notation numbers

2008-03-21 Thread HelponR
Hi, Sorry I have to bother you a question.

I have a file with each line like this:

 6.5500e+004  2.82350001e+000  3.2000e+001
1.1580e+003  2.4400e+002  5.9800e+002
2.2700e+002  3.9031e+001 -1.2137e+002


However, I use read.table, it cannot read it correctly. It read in 18
variables instead of 9.

I am so frustrated. I tried to search the archive. Seems nobody else has
this problem.

Thank you!

U

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.