Hi Bert,
Sorry that I was in a hurry going home yesterday afternoon and just posted my
question and hoped to get some advice.
Here is what I got yesterday before going home.
---------------------------------------------------------------
setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")
file_list <- list.files(pattern="*.out")
#to read all 652 files into Rstudio and found that NOT all files have same
number of rows
for (i in 1:length(file_list)){
assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,
read.delim(file_list[i], head=F))
}
#the first file, GTEX_1117F, in the following format, one column and 19482 rows
#4 is marker id, 25/48 is its marker value;
# V1
# 4
# 25/48
# 201
# 2/2
# ...
# 648589
# None
#to make this one-column file into a two-column file as below
# so first column is marker id, second is corresponding marker values for the
sample GTEX_1117F
# VNTRid GTEX_1117F
# 4 25/48
# 201 2/2
# ... ...
# 648589 None
for (i in 1:length(file_list)){
temp <- read.delim(file_list[i], head=F)
even <-seq(2, length(temp$V1),2)
odd <-seq(1, length(temp$V1)-1, 2)
output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
colnames(output)<- c("VNTRid",substr(file_list[i], 1, nchar(file_list[i]) -4))
for (j in 1:length(temp$V1)/2){
output[j,1]<- as.character(temp$V1)[odd[j]]
output[j,2]<- as.character(temp$V1)[even[j]]}
assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)),
as.data.frame(output))
}
Yesterday, I intended to reshape the output file above from long to wide using
VNTRid as key.
Since not all files have the same number of rows, after reshaping, those file
would not bind correctly using rbind function.
One my way to work place this morning, I changed my intension; I will not
reshape to wide format and actually like the long format I generated. I will
read in a VNTR marker annotation file including VNTRid in first column and
marker locations in human chromosomes in the second column, this annotation
file should include all the VNTR markers. I know the VNTRid in the annotation
file are same as the VNTRid in the 652 file I read in.
Do you know a good way to merge all those 652 files (with two columns) ?
Thank you,
Ding
#merge all 652 files into one file with VNTRid as first column, 2nd to 653th
column are genotype with header
#as sample ID, so
From: Bert Gunter [mailto:[email protected]]
Sent: Thursday, December 19, 2019 6:52 PM
To: Yuan Chun Ding
Cc: [email protected]
Subject: Re: [R] data reshape
________________________________
[Attention: This email came from an external source. Do not open attachments or
click on links from unknown senders or unexpected emails.]
________________________________
Did you even make an attempt to do this? -- or would you like us do all your
work for you?
If you made an attempt, show us your code and errors.
If not, we usually expect you to try on your own first.
If you have no idea where to start, perhaps you need to spend some more time
with tutorials to learn basic R functionality before proceeding.
Bert
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding
<[email protected]<mailto:[email protected]>> wrote:
Hi R users,
I have a folder (called genotype) with 652 files; the file names are
GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file, only one
column of data without a header as below
201
2/2
238
3/4
245
1/2
.....
983255
3/3
983766
None
A total of 20528 rows;
I need to read all those 652 files in the genotype folder and then reshape the
one column in each file as:
SampleID 201 238 245 .... 983255
983766
GTEX-1A3MV 2/2 3/4 1/2 3/3
None
There are 10264 data columns plus the sample ID column, so 10265 columns in
total after data reshaping.
After reading those 652 file and reshape the one column in each file, I will
stack them by the rbind function, then I have a file with a dimension of 653
row, 10265 column.
Thank you,
Ding
----------------------------------------------------------------------
------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-
This message and any attachments are intended solely for the individual or
entity to which they are addressed. This communication may contain information
that is privileged, confidential, or exempt from disclosure under applicable
law (e.g., personal health information, research data, financial information).
Because this e-mail has been sent without encryption, individuals other than
the intended recipient may be able to view the information, forward it to
others or tamper with the information without the knowledge or consent of the
sender. If you are not the intended recipient, or the employee or person
responsible for delivering the message to the intended recipient, any
dissemination, distribution or copying of the communication is strictly
prohibited. If you received the communication in error, please notify the
sender immediately by replying to this message and deleting the message and any
accompanying files from your system. If, due to the security risks, you do not
wish to rec
eive further communications via e-mail, please reply to this message and
inform the sender that you do not wish to receive further e-mail from the
sender. (LCP301)
------------------------------------------------------------
[[alternative HTML version deleted]]
______________________________________________
[email protected]<mailto:[email protected]> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.