On Sat, Jan 14, 2012 at 12:41 PM, Milan Bouchet-Valat <nalimi...@club.fr>wrote:
> Le samedi 14 janvier 2012 à 12:24 -0600, Andy Adamiec a écrit : > > Hi Milan, > > > > > > The xml solr files are not in a typical format, here is an example > > http://www.omegahat.org/RSXML/solr.xml > > I'm not sure how to parse the documents with out using solrDocs.R > > function, and how to make the function compatible with a tm package. > Indeed, this doesn't seem to be easy to parse using the generic XML > source from tm. So it will be easier for you to create your own custom > source from scratch. Have a look at the source.R and reader.R files in > the tm source: you need to replicate the behavior of one of the sources. > > The code should include the following functions: > > readSorl <- FunctionGenerator(function(...) { > function(elem, language, id) { > # Use elem$content, which contains an item set by SorlSource() > below, > # and create a PlainTextDocument() from it, > # putting the data where appropriate (text, meta-data) > } > }) > > SorlSource <- function(x) { > # Parse the XML file using functions from solrDocs.R, and > # create "content", which is a list with one item for each document, > # to pass to readSorl() one by one > > s <- tm:::.Source(readSorl, "UTF-8", length(content), FALSE, seq(1, > length(content)), 0, FALSE) > s$Content <- content > s$URI <- match.call()$x > class(s) = c("SorlSource", "Source") > s > } > > getElem <- function(x) UseMethod("getElem", x) > getElem.SorlSource <- function(x) { > list(content = x$Content[[x$Position]], uri = match.call()$x) > } > > eoi <- function(x) UseMethod("eoi", x) > eoi.SorlSource <- function(x) length(x$Content) <= x$Position > > > Hope this helps > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.