Hello,

Are you looking for what follows Andrew's code below to download and untar the files?



read_one_gz_file <- function(x, path){
  fl <- file.path(path, x)
  tryCatch({
    read.table(zz <- gzfile(fl))
  },
  warning = function(w) w,
  error = function(e) e
  )
}

URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar";
FILE <- file.path(tempdir(), basename(URL))
utils::download.file(URL, FILE, mode = "wb")
utils::untar(FILE, exdir = dirname(FILE))

fls <- list.files(path = dirname(FILE), pattern = "\\.gz$")
length(fls)
#[1] 108

data_list <- lapply(fls, read_one_gz_file, path = dirname(FILE))
length(data_list)
#[1] 108

head(data_list[[1]])
#        V1  V2
#1     A1BG   4
#2 A1BG-AS1  52
#3     A1CF  12
#4      A2M 645
#5  A2M-AS1 113
#6    A2ML1  21



I don't understand what you mean by to aggregate the files but if you want them all in one df, maybe this will do it.



sapply(data_list, ncol) # All files have 2 columns

# create a column with the original dataset name
data_list <- lapply(seq_along(data_list), function(i){
  dftmp <- data_list[[i]]
  dftmp$dataset <- sub("\\.txt\\.gz$", "", fls[i])
  dftmp
})

# put all data sets in one data.frame
df1 <- do.call(rbind, data_list)

dim(df1)  # Over 2.8 million rows, 3 columns
head(df1) # see the first 6 rows
#        V1  V2                 dataset
#1     A1BG   4 GSM4954457_A_1_Asymptom
#2 A1BG-AS1  52 GSM4954457_A_1_Asymptom
#3     A1CF  12 GSM4954457_A_1_Asymptom
#4      A2M 645 GSM4954457_A_1_Asymptom
#5  A2M-AS1 113 GSM4954457_A_1_Asymptom
#6    A2ML1  21 GSM4954457_A_1_Asymptom




Hope this helps,

Rui Barradas


Às 01:16 de 24/08/21, Anas Jamshed escreveu:
sir after that I want to run:
#get the list of sample names
GSMnames <- t(list.files("~/Desktop/GSE162562_RAW", full.names = F))

#remove .txt from file/sample names
GSMnames <- gsub(pattern = ".txt", replacement = "", GSMnames)

#make a vector of the list of files to aggregate
files <- list.files("~/Desktop/GSE162562_RAW", full.names = TRUE)


but it is not running as after running utils::untar(FILE, exdir =
dirname(FILE)) it creates another 108 archieves

On Tue, Aug 24, 2021 at 2:03 AM Andrew Simmons <akwsi...@gmail.com> wrote:

Hello,


I tried downloading that file using 'utils::download.file' (which worked),
but then continued to complain about "damaged archive" when trying to use
'utils::untar'. However, it seemed to work when I downloaded the archive
manually. Finally, the solution I found is that you have to specify the
mode in which you're downloading the file. Something like:


URL <- "
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar
"
FILE <- file.path(tempdir(), basename(URL))


utils::download.file(URL, FILE, mode = "wb")
utils::untar(FILE, exdir = dirname(FILE))


worked perfectly for me. It seems to also work still on Ubuntu, but you
can let us know if you find it doesn't. I hope this helps!



On Mon, Aug 23, 2021 at 3:20 PM Anas Jamshed <anasjamshed1...@gmail.com>
wrote:

I am trying this URL: "
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar
"

but it is not giving me any file

On Mon, Aug 23, 2021 at 11:42 PM Andrew Simmons <akwsi...@gmail.com>
wrote:

Hello,


I don't think you need to use a system command directly, I think
'utils::untar' is all you need. I tried the same thing myself, something
like:


URL <- "https://exiftool.org/Image-ExifTool-12.30.tar.gz";
FILE <- file.path(tempdir(), basename(URL))


utils::download.file(URL, FILE)
utils::untar(FILE, exdir = dirname(FILE))


and it makes a folder "Image-ExifTool-12.30". It seems to work perfectly
fine in Windows 10 x64 build 19042. Can you send the specific file (or
provide a URL to the specific file) that isn't working for you?

On Mon, Aug 23, 2021 at 12:53 PM Anas Jamshed <anasjamshed1...@gmail.com>
wrote:

I have the file GSE162562_RAW. First I untar them
by untar("GSE162562_RAW.tar")
then I am running like:
  system("gunzip ~/Desktop/GSE162562_RAW/*.gz")


This is running fine in Linux but not in windows. What changes I
should make to run this command in windows as well

         [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to