On Jul 22, 2013, at 4:18 AM, Dark wrote:

> Hi all,
> 
> For a project we have to process some very large CSV files (up to 40 gig)
> To reduce them in size and increase operating performance I wanted to store
> them as RData files.
> Since it was to big I decided to split the csv and saving those parts as
> separate .RDA files.
> So far so good. Now I want to bind them all together to save as one RDA file
> again and this is supprisingly difficult.
> 
> First I load my rda files into my environment:
> load(paste(rdaoutputdir, "file1.rda", sep=""))
> load(paste(rdaoutputdir, "file2.rda", sep=""))
> load(paste(rdaoutputdir, "file3.rda", sep=""))
> etc
> 
> Then I try to combine them into one object.
> 
> Using rbind like this gives memory allocation problems ('Error: cannot
> allocate vector of size')
> objectToSave <- rbind(object1, object2, object3)
> 
> using pre-allocation gives me a factor level error. I used this code:
>       nextrow <- nrow(object1)+1
>       object1[nextrow:(nextrow+nrow(object2)-1),] <- object2
>       # we need to assure unique row names
>        row.names(object1) = 1:nrow(object1)
>       rm(object2)
>        gc()
> 
> 15! warning messages:
> 1: In `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L,  ... :
>  invalid factor level, NA generated
> 2: In `[<-.factor`(`*tmp*`, iseq, value = structure(c(1L,  ... :
>  invalid factor level, NA generated
> 

The warning messages suggests that the factor levels in object1, object2, 
object3 in corresponding columns are not the same.

> What can I do?

You can identify which columns are factors and make the corresponding columns 
have levels that span the values.

OR:

Depending on the contents of that factor you could convert to character before 
the rbind operation. If the levels are not particularly long (in character 
length), that procedure might not expand the memory footprint very much.

-- 
David
> 
> Regards Derk
> 
> 


David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to