If you really, really need to do XML-smart queries, go ahead and buy MarkLogic. I've worked with the principle folk there and they are really sharp. Their engine is awesome. XML search is hard, and you can't take a regular search engine, even a really good one, and make it do full XML without tons of work.
If, as Erik and Matt suggest, you can discover a substantially simpler (and flat) search schema that makes your users happy, then go ahead and use Solr. wunder On 5/27/09 7:00 PM, "Matt Mitchell" <goodie...@gmail.com> wrote: > I've been experimenting with the XML + Solr combo too. What I've found to be > a good working solution is to: > > pick out the nodes you want as solr documents (every div1 or div2 etc.) > index the text only (with lots of metadata fields) > add a field for either the xpath to that node, or > save the individual nodes (at index time) into seperate files and store > the name of the file in the solr doc > You could even store the chunked XML in a non-tokenized, stored field in > the solr document as long as the XML isn't too huge. > > So when you do your search, you get all of the power of solr. Then use the > xpath field or the filename field to load the chunk, then transform. > > Matt > > On Wed, May 27, 2009 at 8:25 PM, Erik Hatcher > <e...@ehatchersolutions.com>wrote: > >> >> On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote: >> >>> i undestand what you say >>> but the problem i have is >>> >>> user can make query like this: >>> >>> //tei.2//p"[quijote"] >>> >> >> A couple of problems with this... for one, there's no query parser that'll >> interpret that syntax as you mean it in Solr. And also, indexing the >> hierarchical structure (of TEI, which I'm painfully familiar with) requires >> flattening or doing lots of overlapped indexing of fields that represent the >> hierarchy at various levels. >> >> In my experience with the TEI domain, users don't *really* want to query >> like that even though they'll say they do because it's the only way they're >> used to doing it. >> >> Perhaps step back and ask yourself and your users what is really desired >> from the search application you're building. What's the goal? What needs >> to displayed? What type of query entry form will they be typing into? >> >> Erik >> >>