So I am trying to filter down what I am indexing, and the basic XPath queries don't work. For example, working with tutorial.pdf this indexes all the <div/>:

curl http://localhost:8983/solr/update/extract?ext.idx.attr=true \&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div \&ext.literal.id=126\&ext.xpath=\/xhtml:html\/xhtml:body\/ descendant:node\(\) -F "tutori...@tutorial.pdf"

However, if I want to only index the first div, I expect to do this:

budapest:site epugh$ curl http://localhost:8983/solr/update/extract?ext.idx.attr=true \&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div \&ext.literal.id=126\&ext.xpath=\/xhtml:html\/xhtml:body\/ xhtml:div[1] -F "tutori...@tutorial.pdf"

But I keep getting back an issue from curl. My attempts to escape the [1] have failed. Any suggestions?

curl: (3) [globbing] error: bad range specification after pos 174

Eric

PS,
Also, this site seems to be okay as a place to upload your html and practice xpath:

http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm

I did have to trip out the namespace stuff though.




-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal




Reply via email to