Nice and timely topic for me. You may find this this interesting:
http://www.jroller.com/otis/entry/xml_dbs_vs_search_engines Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Walter Underwood <wunderw...@netflix.com> > To: solr-user@lucene.apache.org > Sent: Wednesday, May 27, 2009 10:53:16 PM > Subject: Re: term vectors > > If you really, really need to do XML-smart queries, go ahead and buy > MarkLogic. I've worked with the principle folk there and they are > really sharp. Their engine is awesome. XML search is hard, and you > can't take a regular search engine, even a really good one, and make > it do full XML without tons of work. > > If, as Erik and Matt suggest, you can discover a substantially simpler > (and flat) search schema that makes your users happy, then go ahead and > use Solr. > > wunder > > On 5/27/09 7:00 PM, "Matt Mitchell" wrote: > > > I've been experimenting with the XML + Solr combo too. What I've found to be > > a good working solution is to: > > > > pick out the nodes you want as solr documents (every div1 or div2 etc.) > > index the text only (with lots of metadata fields) > > add a field for either the xpath to that node, or > > save the individual nodes (at index time) into seperate files and store > > the name of the file in the solr doc > > You could even store the chunked XML in a non-tokenized, stored field in > > the solr document as long as the XML isn't too huge. > > > > So when you do your search, you get all of the power of solr. Then use the > > xpath field or the filename field to load the chunk, then transform. > > > > Matt > > > > On Wed, May 27, 2009 at 8:25 PM, Erik Hatcher > > wrote: > > > >> > >> On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote: > >> > >>> i undestand what you say > >>> but the problem i have is > >>> > >>> user can make query like this: > >>> > >>> //tei.2//p"[quijote"] > >>> > >> > >> A couple of problems with this... for one, there's no query parser that'll > >> interpret that syntax as you mean it in Solr. And also, indexing the > >> hierarchical structure (of TEI, which I'm painfully familiar with) requires > >> flattening or doing lots of overlapped indexing of fields that represent > >> the > >> hierarchy at various levels. > >> > >> In my experience with the TEI domain, users don't *really* want to query > >> like that even though they'll say they do because it's the only way they're > >> used to doing it. > >> > >> Perhaps step back and ask yourself and your users what is really desired > >> from the search application you're building. What's the goal? What needs > >> to displayed? What type of query entry form will they be typing into? > >> > >> Erik > >> > >>