Hi Armin -- See the help page for esearch
http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html especially the 'retmax' key. A couple of other thoughts on this thread... 1) using the full path, e.g., ids <- xpathApply(doc, "/eSearchResult/IdList/Id", xmlValue) is likely to lead to less grief in the long run, as you'll only select elements of the node you're interested in, rather than any element, anywhere in the document, labeled 'Id' 2) From a different post in the thread, things like On Dec 16, 2007 2:53 PM, David Winsemius <dwinsemius at comcast.net> wrote: [snip] > get.info<- function(doc){ > df<-cbind( > Abstract = unlist(xpathApply(doc, "//AbstractText", xmlValue)), > Journal = unlist(xpathApply(doc, "//Title", xmlValue)), > Pmid = unlist(xpathApply(doc, "//PMID", xmlValue)) > ) > return(df) > } will lead to more trouble, because they assume that AbstractText, etc occur exactly once in each record. It would seem better to extract the relevant node, and query that, probably defining appropriate defaults. I started with xpath_or_na <- function(doc, q) { res <- xpathApply(doc, q, xmlValue) if (length(res)==1) res[[1]] else NA_character_ } citn <- function(citation){ Abstract <- xpath_or_na(citation, "/MedlineCitation/Article/Abstract/AbstractText") Journal <- xpath_or_na(citation, "/MedlineCitation/Article/Journal/Title") Pmid <- xpath_or_na(citation, "/MedlineCitation/PMID") c(Abstract=Abstract, Journal=Journal, Pmid=Pmid) } medline_q <- "/PubmedArticleSet/PubmedArticle/MedlineCitation" res <- xpathApply(doc, medline_q, citn) One would still have to coerce res into a data.frame. Also worth thinking about each of the lines in citn -- e.g., clearly only applies to Journals. Eventually one wants to consult the DTD (basically, the contract spelling out the content) of the document, confirm that the xpath queries will perform correctly, and verify that the document actually conforms to its DTD. Following my own advice, I quickly found that doing things 'more right' becomes quite complicated, and suddenly became satisfied with the information I can get out of the 'annotate' package. Martin "Armin Goralczyk" <[EMAIL PROTECTED]> writes: > On Dec 15, 2007 6:31 PM, David Winsemius <[EMAIL PROTECTED]> wrote: > >> After quite a bit of hacking (in the sense of ineffective chopping with >> a dull ax), I finally came up with: >> >> pm.srch<- function (){ >> >> srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=" >> query<-readLines(con=file.choose()) >> query<-gsub("\\\"","",x=query) >> doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, >> useInternalNodes = TRUE) >> return(sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) ) >> } >> >> pm.srch() #choosing the search-file >> //Id >> [1,] "18046565" >> [2,] "17978930" >> [3,] "17975511" >> [4,] "17935912" >> [5,] "17851940" >> [6,] "17765779" >> [7,] "17688640" >> [8,] "17638782" >> [9,] "17627059" >> [10,] "17599582" >> [11,] "17589729" >> [12,] "17585283" >> [13,] "17568846" >> [14,] "17560665" >> [15,] "17547971" >> [16,] "17428551" >> [17,] "17419899" >> [18,] "17419519" >> [19,] "17385606" >> [20,] "17366752" > > I tried the example above, but only the first 20 PMIDs will be > returned. How can I circumvent this (I guesss its a restraint from > pubmed)? > -- > Armin Goralczyk, M.D. > -- > Universitätsmedizin Göttingen > Abteilung Allgemein- und Viszeralchirurgie > Rudolf-Koch-Str. 40 > 39099 Göttingen > -- > Dept. of General Surgery > University of Göttingen > Göttingen, Germany > -- > http://www.chirurgie-goettingen.de > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.