: Date: Wed, 20 May 2009 16:45:25 -0400 : From: Eric Pugh : Subject: XPath query support in Solr Cell
Not sure if you figured this out, but your error is coming from curl, not from Solr. curl has a "feature" where it can hit multiple URLs that differe only by a sequential number in a range. check the "URL" section of "man curl" for all the details. Full URI escaping of the square brackets (to %5B and %5D) should work however ... it works for me anyway. : So I am trying to filter down what I am indexing, and the basic XPath queries : don't work. For example, working with tutorial.pdf this indexes all the : <div/>: : : curl : http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div\&ext.literal.id=126\&ext.xpath=\/xhtml:html\/xhtml:body\/descendant:node\(\) : -F "tutori...@tutorial.pdf" : : However, if I want to only index the first div, I expect to do this: : : budapest:site epugh$ curl : http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div\&ext.literal.id=126\&ext.xpath=\/xhtml:html\/xhtml:body\/xhtml:div[1] : -F "tutori...@tutorial.pdf" : : But I keep getting back an issue from curl. My attempts to escape the [1] : have failed. Any suggestions? : : curl: (3) [globbing] error: bad range specification after pos 174 : : Eric : : PS, : Also, this site seems to be okay as a place to upload your html and practice : xpath: : : http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm : : I did have to trip out the namespace stuff though. : : : : : ----------------------------------------------------- : Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | : http://www.opensourceconnections.com : Free/Busy: http://tinyurl.com/eric-cal : : : -Hoss