when the system locks up it is consuming CPU to ~50%-60%, physical memory
is about ~33%.

I did tried on a subset of the data, e.g i in 1:100 and it works although
takes about 5-10 min.

Thanks for your help!


On Fri, Oct 18, 2013 at 12:49 PM, jim holtman <jholt...@gmail.com> wrote:

> When the system locks up, what do  you see in the Task Manager?  Is it
> consuming CPU and memory?  On the example data you sent, you won't get
> a match on the time since there is not match for the first entry in
> df1 in the 'b' dataframe.  This leads to an error that you are not
> checking for.  Have you tried it with a small subset to see if it
> locks up in the same way.  Put a counter in the look that every 'n'
> iteration the value of 'i' is printed out.  May sure you have
> 'flush.console()' after the print statement to ensure it gets to the
> GUI even if you have the writes buffered.  You should be able to debug
> with some of these pointers.
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Fri, Oct 18, 2013 at 1:07 PM, Ye Lin <ye...@lbl.gov> wrote:
> > Thanks for your advice Jim!
> >
> > I tried Rprof but since the code just freezes the system, I am not able
> to
> > get results so far as I had to close R after waiting for a long time. I
> am
> > confused that the same code would work differently on the same system.
> >
> > I tried out foreach package as well but didnt notice significant
> > improvement. Is it that my code is not efficient or there is sth wrong or
> > sth has changed with my system?
> >
> > Thanks!
> >
> >
> >
> > On Fri, Oct 18, 2013 at 7:14 AM, jim holtman <jholt...@gmail.com> wrote:
> >>
> >> You might want to use the profiler (Rprof) on a subset of your code to
> >> see where time is being spent.  Find a subet that runs for a minute,
> >> or so, and enable profiling for the test.  Take a look and see which
> >> functions are taking the time. This will be a start.  You can also
> >> watch the task monitor while the application is running to see how
> >> fast it is using the CPU and memory.  If you are going around a loop a
> >> number of times, you can put some monitoring 'cat' statements that
> >> will periodically print out the memory and CPU used.  So these are
> >> some of the techniques to start looking at things in your program.
> >> Also data.frames are very costly to 'index' into.  You might want to
> >> consider converting to a matrix (where possible since all columns have
> >> to have the same mode).  This can provide significant improvement.
> >> This is something that you will be able to see when you use the
> >> profiling tool since it will probably show a lot of time in the
> >> functions that handle dataframes.
> >>
> >> Jim Holtman
> >> Data Munger Guru
> >>
> >> What is the problem that you are trying to solve?
> >> Tell me what you want to do, not how you want to do it.
> >>
> >>
> >> On Fri, Oct 18, 2013 at 9:23 AM, Ye Lin <ye...@lbl.gov> wrote:
> >> > Thanks for your help David!
> >> >
> >> > I was running the same code the other day and it worked fine although
> it
> >> > took a while as well. You are right that dff shud be df1 and maybe
> it's
> >> > a
> >> > portion of my data so it have an error of length =0.
> >> >
> >> > About CPU usage, I got it by clicking ctrl+alt+delete and it showed
> CPU
> >> > usage is really high. Is there anyway to figure out why R is taxing my
> >> > system?
> >> >
> >> > Thanks!
> >> >
> >> > Ye
> >> >
> >> > On Thursday, October 17, 2013, David Winsemius wrote:
> >> >
> >> >>
> >> >> On Oct 17, 2013, at 2:56 PM, Ye Lin wrote:
> >> >>
> >> >> > Hey R professionals,
> >> >> >
> >> >> > I have a large dataset and I want to run a loop on it basically
> >> >> > creating
> >> >> a
> >> >> > new column which gathers information from another reference table.
> >> >> >
> >> >> > When I run the code, R just freezes and even does not response
> after
> >> >> 30min
> >> >> > which is really unusual. I tried sapply as well but does not
> improve
> >> >> > at
> >> >> > all.
> >> >> >
> >> >> > I am running R 3.0.2 on Windows 7.  I checked the system, when I
> run
> >> >> > the
> >> >> > code, my CPU usage is about 25%-30% that is taxing my desktop.
> >> >>
> >> >> A guess: It's not your CPU use ... it's your RAM use. You've probably
> >> >> exhausted your RAM and your system has paged out to virutla memory
> >> >> >
> >> >> > Here is my code:
> >> >> >
> >> >> > #df1 is the data set I want to add a new column#
> >> >> > #b is the reference tabel#
> >> >> >
> >> >> > for (i in (1:nrow(df1))) {
> >> >> >  begin=which(b$Time2==df1$start[i] & b$Date==df1$Date[i])
> >> >> >  date=unlist(strsplit(as.character(dff$end[i])," "))[1]
> >> >> >   end=ifelse(date=="2013-10-17",
> >> >> >   which(b$Time2==df1$end[i] & b$Date==df1$Date[i]),
> >> >> >   which(b$Time2==df1$end[i]-3600*24 &
> >> >> > b$Date==as.Date(df1$Date[i])+1))
> >> >> >    df1$new[i] <- sum(b[begin:end,]$Power)
> >> >> > }
> >> >> >
> >> >>
> >> >> I get:
> >> >> Error in strsplit(as.character(dff$end[i]), " ") : object 'dff' not
> >> >> found
> >> >>
> >> >> If I change the dff to df1, I get:
> >> >> Error in begin:end : argument of length 0
> >> >>
> >> >> --
> >> >> David.
> >> >> > And here is a mimic sample of df1 & b:
> >> >> >
> >> >> > df1 <- structure(list(Date = structure(c(1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200), tzone = "UTC", class = c("POSIXct",
> >> >> > "POSIXt")), start = structure(c(1381991205, 1381990247, 1382010454,
> >> >> > 1382007281, 1381992288), tzone = "UTC", class = c("POSIXct",
> >> >> > "POSIXt")), end = structure(c(1381992405, 1381993727, 1382010694,
> >> >> > 1382007461, 1381992468), tzone = "UTC", class = c("POSIXct",
> >> >> > "POSIXt"))), .Names = c("Date", "start", "end"), row.names = c(NA,
> >> >> > -5L), class = "data.frame")
> >> >> >
> >> >> >
> >> >> > b <- structure(list(Date = structure(c(1369699200, 1369699200,
> >> >> 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
> >> >> > 1369699200,
> >> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200), tzone
> =
> >> >> "UTC",
> >> >> > class = c("POSIXct",
> >> >> > "POSIXt")), Time2 = structure(c(1381989634, 1381989694, 1381989754,
> >> >> > 1381989814, 1381989874, 1381989934, 1381989994, 1381990054,
> >> >> > 1381990114,
> >> >> > 1381990174, 1381990234, 1381990294, 1381990354, 1381990414,
> >> >> > 1381990474,
> >> >> > 1381990534, 1381990594, 1381990654, 1381990714, 1381990774,
> >> >> > 1381990834,
> >> >> > 1381990894, 1381990954, 1381991014, 1381991074, 1381991134,
> >> >> > 1381991194,
> >> >> > 1381991254, 1381991314, 1381991374, 1381991434, 1381991494,
> >> >> > 1381991554,
> >> >> > 1381991614, 1381991674, 1381991734, 1381991794, 1381991854,
> >> >> > 1381991914,
> >> >> > 1381991974, 1381992034, 1381992094, 1381992154, 1381992214,
> >> >> > 1381992274,
> >> >> > 1381992334, 1381992394, 1381992454, 1381992514, 1381992574), tzone
> =
> >> >> "UTC",
> >> >> > class = c("POSIXct",
> >> >> > "POSIXt")), Power = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
> >> >> > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
> >> >> > 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
> >> >> > 45, 46, 47, 48, 49, 50)), .Names = c("Date", "Time2", "Power"
> >> >> > ), row.names = c(NA, -50L), class = "data.frame")
> >> >> >
> >> >> > Thanks for your help!
> >> >> >
> >> >> >       [[alternative HTML version deleted]]
> >> >> >
> >> >> > ______________________________________________
> >> >> > R-help@r-project.org <javascript:;> mailing list
> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> > PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >> David Winsemius
> >> >> Alameda, CA, USA
> >> >>
> >> >>
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to