Hello,

Another thing to note is that regexpr is likely to take (much) more time than ifelse or as.integer.
And the code will therefore not be very optimizable.

Rui Barradas

On 4/30/2018 4:25 PM, MacQueen, Don wrote:
Luca,

If speed is important, you might improve performance by making d0 into a true matrix, rather than a data frame (assuming d0 is indeed a data frame at this point). Although data frames may look like matrices, they aren’t, and they have some overhead that matrices don’t.  I don’t think you would be able to use the [[nm]] syntax with a matrix, but [ , nm] should work, provided the matrix has column names. Or you could perhaps index by column number.

I had a project some years ago in which I reduced calculation time a lot by extracting the numeric columns of a data frame and working with them, then recombining them with the character columns. R’s performance working with data frames has improved since then, so I really don’t know if it would make a difference for your task.

-Don

--

Don MacQueen

Lawrence Livermore National Laboratory

7000 East Ave., L-627

Livermore, CA 94550

925-423-1062

Lab cell 925-724-7509

*From: *Luca Meyer <lucam1...@gmail.com>
*Date: *Monday, April 30, 2018 at 8:08 AM
*To: *Rui Barradas <ruipbarra...@sapo.pt>
*Cc: *"MacQueen, Don" <macque...@llnl.gov>, array R-help <r-help@r-project.org>
*Subject: *Re: [R] How to visualise what code is processed within a for loop

Hi Rui

Thank you for your suggestion,

I have tested the code suggested by you against that supplied by Don in terms of timing and results are very much aligned: to populate a 5954x899 0/1 matrix on my machine your procedure took 79 secs, while the one with ifelse employed 80 secs, hence unfortunately not really any significant time saved there.

Nevertheless thank you for your contribution.

Kind regards,

Luca

2018-04-28 23:18 GMT+02:00 Rui Barradas <ruipbarra...@sapo.pt<mailto:ruipbarra...@sapo.pt>>:

    I forgot to explain why my suggestion.

    The logical condition returns FALSE/TRUE that in R are coded as 0/1.
    So all you have to do is coerce to integer.

    This works because the ifelse will return a 1 or a 0 depending on
    the condition. Meaning exactly the same values. And is more
    efficient since ifelse creates both vectors, the true part and the
    false part, and then indexes those vectors in order to return the
    appropriate values. This is the double of the trouble and a great
    deal of memory used.

    Rui Barradas

    On 4/28/2018 10:12 PM, Rui Barradas wrote:

        Hello,

        instead of ifelse, the following is exactly the same and much
        more efficient.

        d0[[nm]] <- as.integer(regexpr(d1[i,1], d0$X0) > 0)


        Hope this helps,

        Rui Barradas

        On 4/28/2018 8:45 PM, Luca Meyer wrote:

            Thanks Don,

                  for (i in 1:10){
                    nm <- paste0("V", i)
                    d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
                  }

            is exaclty what I needed.

            Best regards,

            Luca


            2018-04-25 23:03 GMT+02:00 MacQueen, Don
            <macque...@llnl.gov<mailto:macque...@llnl.gov>>:

                Your code doesn't make sense to me in a couple of ways.

                Inside the loop, the first line assigns a value to an
                object named "t".
                Then, the second line does the same thing, assigns a
                value to an object
                named "t".

                The value of the object named "t" after the second line
                will be the output
                of the ifelse() expression, whatever that is. This has
                the effect of making
                the first line irrelevant. Whatever value t has after
                the first line is
                replaced by whatever it gets from the second line.

                It looks like the first line inside the loop is
                constructing the name of a
                data frame column, and storing that name as a character
                string. However,
                the second line doesn't use that name at all. If your
                goal is to update the
                contents of a column, you need to assign something to
                that column in the
                next line. Instead you assign it to the object named "t".

                What you're looking for will be more along the lines of
                this:

                      for (i in 1:10){
                        nm <- paste0("V", i)
                        d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0,
                1, 0)
                      }

                This may not a complete solution, since I have no idea
                what the contents
                or structure of d1 are, or what the regexpr() is
                expected to return.

                And notice the use of double brackets, [[ and ]]. This
                is one way to
                reference a column of a  data frame when you have the
                column's name stored
                in a variable. Another way is d0[ , nm]


                A couple of additional comments:

                   "t" is a poor choice of object name, because it is
                one of R's built-in
                functions (immediately after starting a fresh session of
                R, with nothing
                left over from any previous session, type help("r") and
                see what you get).

                   ifelse() is intended for use on vectors, not scalars,
                and it looks like
                maybe you're using it on a scalar (can't be sure about
                this, though)

                For example, ifelse() is designed for this kind of usage:

                    ifelse( c(TRUE, FALSE, TRUE) , 1:3, 11:13)

                [1]  1 12  3

                Although it works ok for these

                    ifelse(TRUE, 3, 4)

                [1] 3

                    ifelse(FALSE, 3, 4)

                [1] 4
                They are not really what it is intended for.

-- Don MacQueen
                Lawrence Livermore National Laboratory
                7000 East Ave., L-627
                Livermore, CA 94550
                925-423-1062
                Lab cell 925-724-7509


                On 4/24/18, 12:30 AM, "R-help on behalf of Luca Meyer" <
                
r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>on
                behalf of
                lucam1...@gmail.com<mailto:lucam1...@gmail.com>> wrote:

                      Hi,

                      I am trying to debug the following code:

                      for (i in 1:10){
                        t <- paste("d0$V",i,sep="")
                        t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0)
                      }

                      and I would like to see what code is actually
                processing R, how can I
                do
                      that?

                      More to the point, I am trying to update my
                variables d0$V1 to d0$V10
                      according to the presence or absence of some text
                (contained in the
                file
                      d1) within the d0$X0 variable.

                      The code seem to run ok, if I add print(table(t))
                within the loop I
                can see
                      that the ifelse procedure is working and to some
                cases within the
                d0$V1 to
                      d0$V10 variable range a 1 is assigned. But when
                checking my d0$V1 to
                d0$V10
                      after the for loop they are all still equal to zero...

                      Thanks,

                      Luca

                          [[alternative HTML version deleted]]

                      ______________________________________________
                R-help@r-project.org<mailto:R-help@r-project.org>mailing
                list -- To UNSUBSCRIBE and more, see
                https://stat.ethz.ch/mailman/listinfo/r-help
                      PLEASE do read the posting guide
                http://www.R-project.org/
                posting-guide.html
                      and provide commented, minimal, self-contained,
                reproducible code.



                 [[alternative HTML version deleted]]

            ______________________________________________
            R-help@r-project.org<mailto:R-help@r-project.org>mailing
            list -- To UNSUBSCRIBE and more, see
            https://stat.ethz.ch/mailman/listinfo/r-help
            PLEASE do read the posting guide
            http://www.R-project.org/posting-guide.html
            and provide commented, minimal, self-contained, reproducible
            code.


        ______________________________________________
        R-help@r-project.org<mailto:R-help@r-project.org>mailing list --
        To UNSUBSCRIBE and more, see
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
        http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to