Please read the posting guide linked below for how to post questions that are likely to receive useful responses. "Help me do a gene enrichment analysis" does not conform to the pg. We usually expect posters to offer their best attempts, preferably in a reproducible example ("a reprex" -- see here for discussion: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ), to provide help.
Also note that per the pg: "For questions about functions in standard packages distributed with R (see the FAQ Add-on packages in R), ask questions on R-help. If the question relates to a contributed package , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find("functionname") and packageDescription("packagename") to find this information. Only send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports." Finally, note that "topgo" is a Bioconductor **package** (not "library"), and you should almost certainly post questions about its use on their support site not here: https://www.bioconductor.org/help/ (Do read their pg before posting, of course) Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 30, 2021 at 9:22 AM Anas Jamshed <anasjamshed1...@gmail.com> wrote: > > I have done this analysis: > > #You can select Working Directory according to your choice > setwd("D:") > > #Check Working Directory > getwd() > > #Creation of object(folder) exdir > exdir <- path.expand("~/GSE162562_RAW") > dir.create(exdir, showWarnings = FALSE) > > URL <- " > https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar > " > FILE <- file.path(tempdir(), basename(URL)) > > #Downlaod the Raw Data from GEO > utils::download.file(URL, FILE, mode = "wb") > > #unzip the files and storing them in GSE162562_RAW folder which we created > already > utils::untar(FILE, exdir = exdir) > > #Check your GSE162562_RAW folder after this , it must contains 108 samples > in compressed format > > unlink(FILE, recursive = TRUE, force = TRUE) > > #listing of samples > listed_files <- list.files(exdir, pattern=".gz", full.names=TRUE) > > > #/ load files into R: > loaded <- lapply(listed_files, function(x) read.delim(x, header=FALSE, > row.names = "V1")) > > #/ bind everything into a single count matrix and assign colnames: > raw_counts <- do.call(cbind, loaded) > colnames(raw_counts) <- gsub(".txt.*", "", basename(listed_files)) > > > > #/ there is some nonsense in these files, probably due to the tool that was > used (HTSeq?), > #/ such as rows with names "__no_feature", lets remove that: > raw_counts <- raw_counts[!grepl("^__", rownames(raw_counts)),] > > # / View raw_counts after filtering > raw_counts > #There are total 26364 genes after filtering > > #Store Raw counts in CSV File > #I am sacing them in my Desktop.You can save according to your choice > write.csv(raw_counts,"C:\\Users\\USER\\Desktop\\countsdata.csv") > > #load library > library(DESeq2) > > #/ make a DESeq2 object. We can parse the group membership (Mild etc) from > the colnames, > #/ which are based on the original filenames: > > dds <- DESeqDataSetFromMatrix( > countData=raw_counts, > > colData=data.frame(group=factor(unlist(lapply(strsplit(colnames(raw_counts), > split="_"), function(x) x[4])))), > design=~group) > > > #View Deseq2 object > dds > > #/ some Quality Control via PCA using the in-built PCA function from DESeq2: > vsd <- vst(dds, blind=TRUE) > #/ plot the PCA: > plotPCA(vsd, "group") > > > #/ extract the data: > pcadata <- plotPCA(vsd, "group", returnData=TRUE) > > #view pca data > pcadata > > #/ there is a very obvious batch effect, that we can correct, simply > everything that is greater or less than zero in the first principal > component, very obvious just by looking at the plot: > dds$batch <- factor(ifelse(pcadata$PC1 > 0, "batch1", "batch2")) > > #/ lets use limma::removeBatchEffect to see whether it can be removed: > library(limma) > vsd2 <- vsd > assay(vsd2) <- removeBatchEffect(assay(vsd), batch=dds$batch) > > > #/ plot PCA again, looks much better. this means we modify our design to > include batch and group, > plotPCA(vsd2, "group") > > #include batch and group both in design > design(dds) <- ~batch + group > > #Running the differential expression pipeline > dds <- DESeq(dds) > > #Building the results table > res <- results(dds) > res > > #Saving Results in CSV file > > write.csv(res , "dds.csv") > > mcols(res, use.names = TRUE) > > #We can also summarize the results with the following line of code, which > reports some additional information: > summary(res) > > #Total 1236 DEGs have been found when we se p_value less than 0.1 > > table(res$padj < 0.01) > > res.05 <- results(dds, alpha = 0.05) > > summary(res.05) > > table(res.05$padj < 0.05) > > sum(res$pvalue < 0.05, na.rm=TRUE) > > sum(!is.na(res$pvalue<0.05)) > > sum(res$padj < 0.1, na.rm=TRUE) > > sum(res$padj < 0.05, na.rm=TRUE) > > resSig <- subset(res, padj < 0.1) > > head(resSig[ order(resSig$log2FoldChange),]) > > head(resSig[ order(resSig$log2FoldChange, decreasing = TRUE), ]) > > #Save DEGS at padj < 0.1 > write.csv(resSig,"d...@0.1.csv") > > resSig0.05 <- subset(res, padj < 0.05) > > #Save DEGS at padj < 0.01 > write.csv(resSig0.05,"d...@0.05.csv") > > > Now I want to do gene set enrichment analysis by using topgo library and > also want to make a heatmap and Venn diagram. Please help me > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.