Re: [R] a fast way to do my job

Bert Gunter Sat, 10 Aug 2024 13:11:30 -0700

Is it because I failed to to add a column of ones for an intercept to
the x matrix? TRhat would be my bad.


-- Bert


On Sat, Aug 10, 2024 at 12:59 PM Bert Gunter <bgunter.4...@gmail.com> wrote:
>
> Probably because you inadvertently ran different models. Without your code, I 
> haven't a clue.
>
>
> On Sat, Aug 10, 2024, 12:29 Yuan Chun Ding <ycd...@coh.org> wrote:
>>
>> HI Bert and Ben,
>>
>>
>>
>> Yes, running lm.fit using the matrix format is much faster. I read a couple 
>> of online comments why it is faster.
>>
>>
>>
>> However, the residual values for three tested variables or genes from lm 
>> function and lm.fit function are different, with Pearson correlation of 
>> 0.55, 0.89, and 0.99.
>>
>>
>>
>> I have not found the reason.
>>
>>
>>
>> Thanks,
>>
>>
>> Ding
>>
>>
>>
>> From: Bert Gunter <bgunter.4...@gmail.com>
>> Sent: Friday, August 9, 2024 7:11 PM
>> To: Ben Bolker <bbol...@gmail.com>
>> Cc: Yuan Chun Ding <ycd...@coh.org>; r-help@r-project.org
>> Subject: Re: [R] a fast way to do my job
>>
>>
>>
>> Better idea, Ben! It would work as you might expect it to to produce the 
>> same results as the above: ##first make sure your regressor is a matrix: 
>> pur2 <- matrix(purity2, ncol =1) ## convert the data frame variables into a 
>> matrix dat <-
>>
>> Better idea, Ben!
>>
>>
>>
>> It would work as you might expect it to to produce the same results as
>>
>> the above:
>>
>>
>>
>> ##first make sure your regressor is a matrix:
>>
>> pur2 <- matrix(purity2, ncol =1)
>>
>> ## convert the data frame variables into a matrix
>>
>> dat <- as.matrix(gem751be.rpkm[ , 74:35164])
>>
>> ##then
>>
>> result <- residuals(lm.fit( x= pur2, y = dat))
>>
>>
>>
>> Cheers,
>>
>> Bert
>>
>>
>>
>> On Fri, Aug 9, 2024 at 6:38 PM Ben Bolker <bbol...@gmail.com> wrote:
>>
>> >
>>
>> > You can also fit a linear model with a matrix-valued response
>>
>> > variable, which should be even faster (not sure off the top of my head
>>
>> > how to get the residuals and reshape them to the dimensions you want)
>>
>> >
>>
>> > On Fri, Aug 9, 2024 at 9:31 PM Bert Gunter <bgunter.4...@gmail.com> wrote:
>>
>> > >
>>
>> > > See ?lm.fit.
>>
>> > > I must be missing something, because:
>>
>> > >
>>
>> > > results <- sapply(74:35164, \(i) residuals(lm.fit(purity2,
>>
>> > > gem751be.rpkm[, i] )))
>>
>> > >
>>
>> > > would give you a 751 x 35091 matrix of the residuals from each of the
>>
>> > > regressions.
>>
>> > > I assume it will be considerably faster than all the overhead you are
>>
>> > > carrying in your current code, but of course you'll have to try it and
>>
>> > > see. ... Assuming that I have interpreted your request correctly.
>>
>> > > Ignore if not.
>>
>> > >
>>
>> > > Cheers,
>>
>> > > Bert
>>
>> > >
>>
>> > > On Fri, Aug 9, 2024 at 4:50 PM Yuan Chun Ding via R-help
>>
>> > > <r-help@r-project.org> wrote:
>>
>> > > >
>>
>> > > > Dear R users,
>>
>> > > >
>>
>> > > > I am running the following code below,  the gem751be.rpkm is a 
>> > > > dataframe with dim of 751 samples by 35164 variables,  73 phenotypic 
>> > > > variables in the furst to 73rd column and 35091 genomic variables or 
>> > > > genes in the 74th to 35164th columns.  What I need to do is to 
>> > > > calculate the residuals for each gene using the simple linear 
>> > > > regression model of genelist[i] ~ purity2;
>>
>> > > >
>>
>> > > > The following code is running,  it takes long time, but I have an 
>> > > > expensive ThinkStation window computer.
>>
>> > > > Can you provide a fast way to do it?
>>
>> > > >
>>
>> > > > Thank you,
>>
>> > > >
>>
>> > > > Ding
>>
>> > > >
>>
>> > > > ---------------------------------------------------------------------------------
>>
>> > > >
>>
>> > > >
>>
>> > > > gem751be.rpkm <-merge(gem751be10, as.data.frame(t(rna849.fpkm2)),
>>
>> > > > +                           by.x="id2",by.y=0)
>>
>> > > > >   row.names(gem751be.rpkm)<-gem751be.rpkm$id3
>>
>> > > > >   
>> > > > > colnames(gem751be.rpkm)<-gsub(colnames(gem751be.rpkm),pattern="-",replacement="_")
>>
>> > > > >   genelist <- gem751be.rpkm %>% dplyr::select(74:35164)
>>
>> > > > >   residuals <- NULL
>>
>> > > > >   for (i in 1:length(genelist)) {
>>
>> > > > +     #i=1
>>
>> > > > +     formula <- reformulate("purity2", response=names(genelist)[i])
>>
>> > > > +     model <- lm(formula, data = gem751be.rpkm)
>>
>> > > > +     resi <- as.data.frame(residuals(model))
>>
>> > > > +     colnames(resi)[1]<-names(genelist)[i]
>>
>> > > > +     resi <-as.data.frame(t(resi))
>>
>> > > > +     residuals <- rbind(residuals, resi)
>>
>> > > > +   }
>>
>> > > >
>>
>> > > >
>>
>> > > >
>>
>> > > > ----------------------------------------------------------------------
>>
>> > > > ------------------------------------------------------------
>>
>> > > > -SECURITY/CONFIDENTIALITY WARNING-
>>
>> > > >
>>
>> > > > This message and any attachments are intended solely for the 
>> > > > individual or entity to which they are addressed. This communication 
>> > > > may contain information that is privileged, confidential, or exempt 
>> > > > from disclosure under applicable law (e.g., personal health 
>> > > > information, research data, financial information). Because this 
>> > > > e-mail has been sent without encryption, individuals other than the 
>> > > > intended recipient may be able to view the information, forward it to 
>> > > > others or tamper with the information without the knowledge or consent 
>> > > > of the sender. If you are not the intended recipient, or the employee 
>> > > > or person responsible for delivering the message to the intended 
>> > > > recipient, any dissemination, distribution or copying of the 
>> > > > communication is strictly prohibited. If you received the 
>> > > > communication in error, please notify the sender immediately by 
>> > > > replying to this message and deleting the message and any accompanying 
>> > > > files from your system. If, due to the security risks, you do not wish 
>> > > > to rec
>>
>> > > >  eive further communications via e-mail, please reply to this message 
>> > > > and inform the sender that you do not wish to receive further e-mail 
>> > > > from the sender. (LCP301)
>>
>> > > > ------------------------------------------------------------
>>
>> > > >
>>
>> > > >         [[alternative HTML version deleted]]
>>
>> > > >
>>
>> > > > ______________________________________________
>>
>> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
>> > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!qcJ4z-vlMNzsa8XCsJUcuPOz8Vt12zsV_XaWpqXsyUYJBTlcNRonFPr7w7Ql3xqcDnZ9ZYC8JX72PW30DQ$
>>
>> > > > PLEASE do read the posting guide 
>> > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!qcJ4z-vlMNzsa8XCsJUcuPOz8Vt12zsV_XaWpqXsyUYJBTlcNRonFPr7w7Ql3xqcDnZ9ZYC8JX66rfmKvA$
>>
>> > > > and provide commented, minimal, self-contained, reproducible code.
>>
>> > >
>>
>> > > ______________________________________________
>>
>> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
>> > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!qcJ4z-vlMNzsa8XCsJUcuPOz8Vt12zsV_XaWpqXsyUYJBTlcNRonFPr7w7Ql3xqcDnZ9ZYC8JX72PW30DQ$
>>
>> > > PLEASE do read the posting guide 
>> > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!qcJ4z-vlMNzsa8XCsJUcuPOz8Vt12zsV_XaWpqXsyUYJBTlcNRonFPr7w7Ql3xqcDnZ9ZYC8JX66rfmKvA$
>>
>> > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a fast way to do my job

Reply via email to