-----Original Message-----
From: ruipbarra...@sapo.pt
Sent: Mon, 17 Sep 2012 16:04:15 +0100
To: jrkrid...@inbox.com, ridav...@gmail.com, jrkrid...@inbox.com
Subject: Re: [R] [newbie] aggregating table() results and simplifying
code with loop
Hello,
Now I believe I've got it. (So-so believe.)
ddply is a good idea, in my first post I was using xtabs to the same
effect, but ddply makes it a lot easier.
I had made a big mistake, though. I thought that only sequences of FALSE
were valid, hence rle(). In a follow-up post, Davide listed all
combinations of T/F that matter. The code below uses them to create a df
of cases to count with ddply. The creation is done by merge, on the T/F
columns (function f1) and the rest is just to apply to 6 periods of 5
years each, after applying to covers, after applying to years of crop
type.
Sounds confusing, and it probably is but I've separated all of it in
small functions.
Each of the functions is in fact very simple, and all one needs is to
remember that the inner most function is reached after a series of
*apply statements, doing one thing at a time.
#------------------------------------------------------------------------------
# data.frame of target sequences of T/F per crop type and years
# (from an earlier post)
wanted <-
structure(list(V2 = c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE,
TRUE, FALSE, TRUE), V3 = c(FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
FALSE, TRUE, TRUE, TRUE), V4 = c(FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, FALSE), V5 = c(FALSE, FALSE, FALSE, FALSE,
TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,
FALSE, TRUE, FALSE, TRUE, TRUE), V6 = c(FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE,
TRUE, TRUE, FALSE, FALSE, TRUE), V7 = c(" return", " return",
" return", " return", " return", " return", " return", " return",
" return", " return", " mono-succession", " mono-succession",
" mono-succession", " mono-succession", " mono-succession","
mono-succession",
" mono-succession", " mono-succession", " crops"), V8 = c(5L,
5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 5L, 4L, 4L, 3L, 3L, 3L, 3L,
3L, 2L)), .Names = c("V2", "V3", "V4", "V5", "V6", "V7", "V8"
), row.names = c(NA, -19L), class = "data.frame")
want <- split(wanted, wanted$V7)
#--------------------------------------------------------------------
# The functions go from f5 down to f1 (f5 calls f4, f4 calls f3, etc.)
#
f1 <- function(DF, Mat, currcols, WS){
mcols <- names(T80[, ycols])[currcols] # names for 'merge'
names(DF)[1:5] <- mcols
m <- merge(DF, data.frame(Mat[, mcols], WS = WS))
ddply(m, .(WS), summarize, length(V8))
}
f2 <- function(w, tf, WS){
l1 <- lapply(yrs, function(yy) t(f1(w, tf, yy, WS)))
l2 <- lapply(l1, function(x){
nms <- as.character(x["WS",])
resline[ nms ] <- x[2, ]
resline})
res <- do.call(rbind, l2)
data.frame(res)
}
f3 <- function(Cover, w){
tf <- T80[, ycols] == Cover
inx <- as.vector(apply(tf, 1, function(.x) any(.x)))
tf <- tf[inx, ]
WS <- T80$WS[inx]
# Special care with empty sets
if(sum(tf) > 0){
res <- f2(w, tf, WS)
na <- as.vector(apply(res, 1, function(.x) all(is.na(.x))))
res <- res[!na, ]
if(!is.null(res) && nrow(res) > 0){
res$Cover <- Cover
res$Period <- (1:6)[!na]
}else res <- NULL
}else res <- NULL
res
}
f4 <- function(w){
res <- do.call(rbind, lapply(covers, f3, w))
# Special care with empty sets
if(!is.null(nrow(res)) && nrow(res) > 0){
res <- data.frame(res)
res$Years <- w$V8[1]
}else res <- NULL
res
}
f5 <- function(w){
sp <- split(w, w$V8)
res <- do.call( rbind, lapply(sp, f4) )
res <- data.frame(res)
res
}
T80 <- read.table("sample.txt", header = TRUE, sep = ";")
ycols <- grep("y", names(T80))
ws <- unique(T80$WS)
covers <- as.character(unique(unlist(T80[ycols])))
yrs <- lapply(0:5, `+`, 1:5)
resline <- rep(NA, length(ws))
names(resline) <- ws
result <- lapply(want, f5)
(Ok, maybe two or three things at a time.)
I hope this is it!
Have fun.
Rui Barradas
Em 17-09-2012 14:19, John Kane escreveu:
Comments in-line below.
Sorry to be so long getting back to you but sleep intervened.
-----Original Message-----
From: ridav...@gmail.com
Sent: Sun, 16 Sep 2012 21:24:15 +0200
To: jrkrid...@inbox.com
Subject: Re: [R] [newbie] aggregating table() results and simplifying
code with loop
Thank you John,
you are giving me two precious tips (in addition, well explained!):
1. to use the package plyr (I didn't know it before, but it seems to
make the deal!)
2. a smart and promising way to use it
I can finally plot the partial results, to have a first glance and
compare to them
==========================================================
# once the sequences for a given crop have been assembled in a
dataframe (e.g., maizedata)
library(latticeExtra) # load the package for "improved" graphics
dotplot(mzcount$crop_pattern) # to plot the occurrence of all the
retrieved sequences
#once the target pattern have been subset (e.g., m51)
dotplot(m51$WS ~ m51$count) # to plot the occurrence of the target
pattern(s) per watershed
Ah, so that's what WS stands for! Do you have an abstract or short
summary of what you are doing in English or French? My Itallian is
limited to about 3 words or what I can guess from French/Latin.
I think that you are misunderstanding what is happening in m51. It is
just summarizing the total counts across those two conditions. I think
either I made a logic error or was not understanding what was needed. In
any case the dotplot is not plotting the actual occurances but the
number of times the crop_pattern was found. The count is the actual
number of occurances per WS per crop_pattern
Have a look at my new effort. it is still ugly but I think it more
accurately supplies some of what you want.
Two caveats : 1. This is still very rough and an better programer may
have a better approach.
2. I used the package ggplot2 to do the graphs. One more thing to learn
but I am not used to lattice and I am used to ggplot2. Rather than spend
a lot of time with lattice or
2. The last faceted plots are not intended to be of real use, just my
quick look to see if anything looked like I expected. A bit of thinking
probably would give somethimg much better
==========================================================
This help me retrieving all the possible patterns for the different
land covers. Hence, you made me able to improve the subset of
patterns.
Tomorrow morning I'm going to test the tips for all the land covers
for all the 5years time-periods. Even if still labour-intensive, the
solution you propose it's surely a steady improvement.
Now only the second question remains: is there a way "to clean" this
approach?... or probably not ?...
Yes I suspect that there are lots of ways to 'clean' the process and
automate it so that it would apply to all three data sets quite easily.
Some I think I can help with and some you may need other people's help.
For example, once we have a working model to handle one or two
conditions It should be relatively easy to use an apply() or a loop to
handle all of them and so on..
Well, I'm off to work now so I probably won't be able to get back to
much before late evening my time ( probably after you are asleep) I
think I'm 6 hours ahead of you.
##===================Revised approach====================
# load the various packages (plyr, latticeExtra, ggplot2, reshape2)
library(plyr)
library(latticeExtra)
library(ggplot2)
library(reshape2)
# sample data
T80<- read.csv("/home/john/rdata/sample.csv", header = TRUE, sep = ";")
# Davide's actual read statement
# T80<-read.table(file="C:/sample.txt", header=T, sep=";")
# Looking for Maize
pattern <- c("2Ma", "2Ma","2Ma", "2Ma","2Ma")
# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern
T80[405, 3:7]
T80[405, 3:7] == pattern
T80[55, 3:7] == pattern
# now we apply the patterns to the entire data set.
pp1 <- T80[, 3:7] == pattern
# paste the TRUEs and FALSEs together to form a single variable
concatdat <- paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5] ,
sep = "+")
# Assmble new data frame.
maizedata <- data.frame(T80$WS, concatdat)
names(maizedata) <- c("WS", "crop_pattern")
str(maizedata)
maizedata$crop_pattern <- as.character(maizedata$crop_pattern)
pattern_count <- ddply(maizedata, .(crop_pattern), summarize, npattern
= length(crop_pattern))
str(pattern_count)
head(pattern_count); dim(pattern_count) # quick look at data.frame and
its size.
# FALSE+FALSE+FALSE+FALSE+FALSE accounts for 21,493 values.
which(pattern_count$npattern == max(pattern_count$npattern)) # this
does the same as looking at the data
# not needed here but useful for larger datasets.
# If we graph pattern_count as it stands we lose any useful detail
because of that outlier.
p <- ggplot(pattern_count , aes(crop_pattern, npattern )) +
geom_point() +
coord_flip()
p
(pattern1 <- pattern_count[-1,]) # Drop the offending
FALSE+FALSE+FALSE+FALSE+FALSE
dim(pattern1) # Okay now we have the maize patterns, without the WS who
had no maize at all.
p <- ggplot(pattern1 , aes(crop_pattern, npattern )) + geom_point()
+
coord_flip()
p
newmaize <- subset(maizedata, maizedata$crop_pattern !=
"FALSE+FALSE+FALSE+FALSE+FALSE")
dim(newmaize) ; head(newmaize)
str(newmaize)
summaize_by_WS <- ddply(newmaize, .(WS), summarize, crop_pattern_ws =
length(crop_pattern))
p <- ggplot( summaize_by_WS , aes(summaize_by_WS )) + geom_point +
coord_flip()
p
summaize_by_WS_and_crop <- ddply(newmaize, .(WS, crop_pattern),
summarize, crop_pattern_ws = length(crop_pattern))
# crappy graph but just try to see what we might get. probablly need to
subset or use better grid layout.
p <- ggplot( summaize_by_WS_and_crop , aes(crop_pattern,
crop_pattern_ws )) + geom_point() +
coord_flip() + facet_grid(WS ~ . )
p
# save the last graph to look at it in a graphics package== still
terrible
ggsave( "/home/john/Rjunk/crop.png")
##
Thanks again for the help.
Cheers,
Dd
***********************************************************
Davide Rizzo
website :: http://sites.google.com/site/ridavide/
On Sun, Sep 16, 2012 at 8:24 PM, John Kane <jrkrid...@inbox.com> wrote:
Hi Davide,
I had some time this afternoon and I wonder if this approach is llkely
to get the results you want? As before it is not complete but I think
it holds promise.
On the other hand Rui is a much better programer than I am so he may
have a much cleaner solution. My way still looks labour-intensive at
the moment.
I am using the plyr package which you will probably have to install.
load.packages("plyr") should do it.
==========================================================
# load the plyr package -
library(plyr)
# sample data
T80<- read.csv("/home/john/rdata/sample.csv", header = TRUE, sep =
";")
# Davide's actual read statement
# T80<-read.table(file="C:/sample.txt", header=T, sep=";")
# Looking for Maize
pattern <- c("2Ma", "2Ma","2Ma", "2Ma","2Ma")
# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern
T80[405, 3:7]
T80[405, 3:7] == pattern
T80[55, 3:7] == pattern
# now we apply the patterns to the entire data set.
pp1 <- T80[, 3:7] == pattern
# paste the TRUEs and FALSEs together to form a single variable
concatdat <- paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5] ,
sep = "+")
# Assmble new data frame.
maizedata <- data.frame(T80$WS, concatdat)
names(maizedata) <- c("WS", "crop_pattern")
mzcount <- ddply(maizedata, .(WS, crop_pattern), summarize, count =
length(crop_pattern))
mzcount # This is all the data not just the relevant maise patterns
# This seems to be getting us somewhere though we are not not there
yet
# Does this subset look like we are going in the right direction?
m51 <- subset(mzcount,
mzcount$crop_pattern == "FALSE+FALSE+FALSE+FALSE+TRUE"
| mzcount$crop_pattern == "TRUE+FALSE+FALSE+FALSE+FALSE")
m51 <- ddply(m51, .(WS), summarize, count = sum(count))
m51
=================================================================
John Kane
Kingston ON Canada
-----Original Message-----
From: ridav...@gmail.com
Sent: Sat, 15 Sep 2012 19:00:29 +0200
To: jrkrid...@inbox.com, ruipbarra...@sapo.pt
Subject: Re: [R] [newbie] aggregating table() results and simplifying
code with loop
Thanks Rui, thanks John for your very different solutions.
I'll try to break my questions into smaller steps following your
tips.
However, not everything is clear for me... so before giving you a
feed-back I need to study further your answers. For the moment I
could
specify that I'm looking for the following 19 patterns:
1. True, False, False, False, False # return period of 5 years (1/2)
2. False, False, False, False, True # return period of 5 years (2/2)
3. True, False, False, False, True # return period of 4 years (1/3)
4. False, True, False, False, False # return period of 4 years (2/3)
5. False, False, False, True, False # return period of 4 years (3/3)
6. True, False, False, True, False # return period of 3 years (1/3)
7. False, True, False, False, True # return period of 3 years (2/3)
8. False, False, True, False, False # return period of 3 years (3/3)
9. False, True, False, True, False # return period of 2 years (1/2)
10. True, False, True, False, True # return period of 2 years (1/2)
11. True, True, True, True, True # mono-succession of 5 years
12. False, True, True, True, True # mono-succession of 4 years (1/2)
13. True, True, True, True, False # mono-succession of 4 years (2/2)
14. True, False, True, True, True # mono-succession of 3 years (1/5)
15. True. True. True. False, True # mono-succession of 3 years (2/5)
16. False, False, True, True, True # mono-succession of 3 years (3/5)
17. True, True, True, False, False # mono-succession of 3 years (4/5)
18. False, True, True, True, False # mono-succession of 3 years (5/5)
19. True, True, False, True, True # crops repeated two years
In particular, I want to apply all these 19 patterns to 7 (out of 11)
land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG. The pattern are so
structured: True means presence of a given land cover (iteratively,
one of the seven listed above), False means any other land-cover
(amidst the remainder 10).
Thanks again for any further help.
Greetings,
Dd
***********************************************************
Davide Rizzo
website :: http://sites.google.com/site/ridavide/
On Sat, Sep 15, 2012 at 5:51 PM, John Kane <jrkrid...@inbox.com>
wrote:
I have not seen any replies to your questions so I will suggest an
approach that may work if I can get a function to work.
If I understand what you want, you have a pattern something like
this:
pattern1 <- c("2Ma", "no2Ma","no2Ma", "no2Ma","no2Ma")
pattern2 <- c("no2Ma", 'no2Ma', "no2Ma", "no2Ma", "2Ma")
for each five year period where 2Ma stands to Maize, one of 11
different
grains
1AU 2BC 2Co 2Ma 2MG 2ML 2oc 2PG 2SA 2We 3sN
and what you want to know is if each year gives a pattern like
check1 <- c(TRUE, FALSE, FALSE, FALSE, FALSE)
check2 <- c(FALSE, FALSE, FALSE, FALSE, TRUE)
If I understand the patterns you only care for the two above, is
that
correct?
I am running out of time today but I think that this approach will
get
you started
===========================================================
T80<-read.table(file="C:/sample.txt", header=T, sep=";")
# Reminder of just what we want to get as a final result.
check1 <- c(TRUE, FALSE, FALSE, FALSE, FALSE)
check2 <- c(FALSE, FALSE, FALSE, FALSE, TRUE)
pattern1 <- c("2Ma", "2Ma","2Ma", "2Ma","2Ma")
# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern1
T80[405, 3:7]
T80[405, 3:7] == pattern1
# now we apply the patterns to the entire data set.
pp1 <- T80[, 3:7] == pattern1
pp2 <- T80[, 3:7] == pattern2
# reassign the WS values so we know where the data is from
WSnames <- rep(T80$WS, 2)
# Assmble new data frame.
maizedata <- data.frame(WSnames, rbind(pp1,pp2))
========================================================
Now, assuming this runs for you and I have not made a serious
mistake
in
logic, kyou should be able to do some subsetting (?subset) to
extract
only the
check1 and check2 patterns above.
This is where I ran into trouble as I don't have the time this
morning
to work out the subsetting conditions. It looks tricking and you
probably need a couple of subsetting moves.
It's not a pretty solutlion and, particularly, I expect someone
could
clean it up to make the subsetting easier or even unnecessary but I
hope
it helps.
Once you have extracted what you want use apply() or perhaps the
plyr
package to aggregate the results.
Repeat for all grains. Actually look into setting the whole thing
up
as
a function. You should be able to write the program once as a
function
and do a loop or an apply() to do all 11 grains in one go.
Best of luck.
John Kane
Kingston ON Canada
-----Original Message-----
From: ridav...@gmail.com
Sent: Thu, 13 Sep 2012 15:36:28 +0200
To: r-help@r-project.org
Subject: [R] [newbie] aggregating table() results and simplifying
code
with loop
Dear all,
I'm looking for primary help at aggregating table() results and at
writing a loop (if useful)
My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each
one
representing a point in the space of which we know the land cover
over
10 years (column y01 to y10).
I need to analyse it with a temporal sliding window of 5 years (y01
to
y05, y02 to y06 and so forth)
For each period I'm looking for specific sequences (e.g., Maize,
-noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return
time"
of principal land covers: barley (2BC), colza (2Co), maize (2Ma),
etc.
I define the "return time" as the presence of a given land cover
according to a given sequence. Hence, each return time could
require
the sum of different sequences (e.g., a return time of 5 years
derives
from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
[no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
I need to repeat the calculation for each land cover for each time
window. In addition, I need to repeat the process over three
datasets
(the one I give is the first one, the second one is from year 12 to
year 24, the third one from year 27 to year 31. So I have breaks in
the monitoring of land cover that avoid me to create a continuous
dataset). At the end I expect to aggregate the sum for each spatial
entity (column WS)
I've started writing the code for the first crop in the first 5yrs
period (http://goo.gl/FhZNx) then copying and pasting it for each
crop
then for each time window...
Moreover I do not know how to aggregate the results of table(). (NB
sometimes I have a different number of WS per table because a given
sequence could be absent in a given spatial entity... so I have the
following warning msg: number of columns of result is not a
multiple
of vector length (arg 1)). Therefore, I'm "obliged" to copy&paste
the
table corresponding to each sequence....
FIRST QUEST. How to aggregate the results of table() when the
number
of columns is different?
Or the other way around: Is there a way to have a table where each
row
reports the number of points per time return per WS? something like
WS1 WS2 WS3 WS4 ... WS16 crop period
23 15 18 43 ... 52 Ma5 01
18 11 25 84 ... 105 Ma2 01
... ... ... ... ... ... ... ...
... ... ... ... ... ... Co5 01
... ... ... ... ... ... ... ...
... ... ... ... ... ... Ma5 02
... ... ... ... ... ... ... ...
In this table each row should represent a return time for a given
land
cover a given period (one of the 6 time window of 5 years)?
SECOND QUEST. Could a loop (instead of a modular copy/paste code)
improve the time/reliability of the calculation? If yes, could you
please indicate me some entry-level references to write it?
I am aware this are newbie's questions, but I have not be able to
solve them using manuals and available sources.
Thank you in advance for your help.
Greetings,
Dd
PS
R: version 2.14.2 (2012-02-29)
OS: MS Windows XP Home 32-bit SP3
*****************************
Davide Rizzo
post-doc researcher
INRA UR055 SAD-ASTER
website :: http://sites.google.com/site/ridavide/
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
Check it out at http://www.inbox.com/earth