?merge ## note the all.x option Example: > a <- data.frame(x = 1:3, y1 = 11:13) > b <- data.frame(x = c(1,3), y2 = 21:22)
> merge(a,b, all.x = TRUE) x y1 y2 1 1 11 21 2 2 12 NA 3 3 13 22 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Dec 20, 2019 at 9:00 AM Yuan Chun Ding <ycd...@coh.org> wrote: > Hi Bert, > > > > Sorry that I was in a hurry going home yesterday afternoon and just > posted my question and hoped to get some advice. > > > > Here is what I got yesterday before going home. > > --------------------------------------------------------------- > > setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes") > > > > file_list <- list.files(pattern="*.out") > > > > #to read all 652 files into Rstudio and found that NOT all files have same > number of rows > > for (i in 1:length(file_list)){ > > > > assign( substr(file_list[i], 1, nchar(file_list[i]) -4) , > > > > read.delim(file_list[i], head=F)) > > } > > > > #the first file, GTEX_1117F, in the following format, one column and > 19482 rows > > #4 is marker id, 25/48 is its marker value; > > # V1 > > # 4 > > # 25/48 > > # 201 > > # 2/2 > > # ... > > # 648589 > > # None > > > > #to make this one-column file into a two-column file as below > > # so first column is marker id, second is corresponding marker values for > the sample GTEX_1117F > > # VNTRid GTEX_1117F > > # 4 25/48 > > # 201 2/2 > > # ... ... > > # 648589 None > > > > for (i in 1:length(file_list)){ > > temp <- read.delim(file_list[i], head=F) > > even <-seq(2, length(temp$V1),2) > > odd <-seq(1, length(temp$V1)-1, 2) > > output <-matrix(0, ncol=2, nrow=length(temp$V1)/2) > > colnames(output)<- c("VNTRid",substr(file_list[i], 1, > nchar(file_list[i]) -4)) > > for (j in 1:length(temp$V1)/2){ > > output[j,1]<- as.character(temp$V1)[odd[j]] > > output[j,2]<- as.character(temp$V1)[even[j]]} > > assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)), > as.data.frame(output)) > > } > > > > Yesterday, I intended to reshape the output file above from long to wide > using VNTRid as key. > > Since not all files have the same number of rows, after reshaping, those > file would not bind correctly using rbind function. > > One my way to work place this morning, I changed my intension; I will not > reshape to wide format and actually like the long format I generated. I > will read in a VNTR marker annotation file including VNTRid in first column > and marker locations in human chromosomes in the second column, this > annotation file should include all the VNTR markers. I know the VNTRid in > the annotation file are same as the VNTRid in the 652 file I read in. > > > > Do you know a good way to merge all those 652 files (with two columns) ? > > > > Thank you, > > > > Ding > > > > > > #merge all 652 files into one file with VNTRid as first column, 2nd to > 653th column are genotype with header > > #as sample ID, so > > > > *From:* Bert Gunter [mailto:bgunter.4...@gmail.com] > *Sent:* Thursday, December 19, 2019 6:52 PM > *To:* Yuan Chun Ding > *Cc:* r-help@r-project.org > *Subject:* Re: [R] data reshape > > > ------------------------------ > > [Attention: This email came from an external source. Do not open > attachments or click on links from unknown senders or unexpected emails.] > ------------------------------ > > Did you even make an attempt to do this? -- or would you like us do all > your work for you? > > > > If you made an attempt, show us your code and errors. > > If not, we usually expect you to try on your own first. > > If you have no idea where to start, perhaps you need to spend some more > time with tutorials to learn basic R functionality before proceeding. > > > > Bert > > > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <ycd...@coh.org> wrote: > > Hi R users, > > I have a folder (called genotype) with 652 files; the file names are > GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file, only > one column of data without a header as below > 201 > 2/2 > 238 > 3/4 > 245 > 1/2 > ..... > 983255 > 3/3 > 983766 > None > > > A total of 20528 rows; > > I need to read all those 652 files in the genotype folder and then reshape > the one column in each file as: > SampleID 201 238 245 .... 983255 > 983766 > GTEX-1A3MV 2/2 3/4 1/2 3/3 > None > > There are 10264 data columns plus the sample ID column, so 10265 columns > in total after data reshaping. > > After reading those 652 file and reshape the one column in each file, I > will stack them by the rbind function, then I have a file with a dimension > of 653 row, 10265 column. > > > Thank you, > > Ding > > ---------------------------------------------------------------------- > ------------------------------------------------------------ > -SECURITY/CONFIDENTIALITY WARNING- > > This message and any attachments are intended solely for the individual or > entity to which they are addressed. This communication may contain > information that is privileged, confidential, or exempt from disclosure > under applicable law (e.g., personal health information, research data, > financial information). Because this e-mail has been sent without > encryption, individuals other than the intended recipient may be able to > view the information, forward it to others or tamper with the information > without the knowledge or consent of the sender. If you are not the intended > recipient, or the employee or person responsible for delivering the message > to the intended recipient, any dissemination, distribution or copying of > the communication is strictly prohibited. If you received the communication > in error, please notify the sender immediately by replying to this message > and deleting the message and any accompanying files from your system. If, > due to the security risks, you do not wish to rec > eive further communications via e-mail, please reply to this message and > inform the sender that you do not wish to receive further e-mail from the > sender. (LCP301) > ------------------------------------------------------------ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$> > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.