Nice and timely topic for me.

You may find this this interesting:

http://www.jroller.com/otis/entry/xml_dbs_vs_search_engines

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Walter Underwood <wunderw...@netflix.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 27, 2009 10:53:16 PM
> Subject: Re: term vectors
> 
> If you really, really need to do XML-smart queries, go ahead and buy
> MarkLogic. I've worked with the principle folk there and they are
> really sharp. Their engine is awesome. XML search is hard, and you
> can't take a regular search engine, even a really good one, and make
> it do full XML without tons of work.
> 
> If, as Erik and Matt suggest, you can discover a substantially simpler
> (and flat) search schema that makes your users happy, then go ahead and
> use Solr.
> 
> wunder
> 
> On 5/27/09 7:00 PM, "Matt Mitchell" wrote:
> 
> > I've been experimenting with the XML + Solr combo too. What I've found to be
> > a good working solution is to:
> > 
> > pick out the nodes you want as solr documents (every div1 or div2 etc.)
> > index the text only (with lots of metadata fields)
> > add a field for either the xpath to that node, or
> >   save the individual nodes (at index time) into seperate files and store
> > the name of the file in the solr doc
> >   You could even store the chunked XML in a non-tokenized, stored field in
> > the solr document as long as the XML isn't too huge.
> > 
> > So when you do your search, you get all of the power of solr. Then use the
> > xpath field or the filename field to load the chunk, then transform.
> > 
> > Matt
> > 
> > On Wed, May 27, 2009 at 8:25 PM, Erik Hatcher
> > wrote:
> > 
> >> 
> >> On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote:
> >> 
> >>> i undestand what you say
> >>> but the problem i have is
> >>> 
> >>> user can make query like this:
> >>> 
> >>> //tei.2//p"[quijote"]
> >>> 
> >> 
> >> A couple of problems with this... for one, there's no query parser that'll
> >> interpret that syntax as you mean it in Solr.  And also, indexing the
> >> hierarchical structure (of TEI, which I'm painfully familiar with) requires
> >> flattening or doing lots of overlapped indexing of fields that represent 
> >> the
> >> hierarchy at various levels.
> >> 
> >> In my experience with the TEI domain, users don't *really* want to query
> >> like that even though they'll say they do because it's the only way they're
> >> used to doing it.
> >> 
> >> Perhaps step back and ask yourself and your users what is really desired
> >> from the search application you're building.  What's the goal?  What needs
> >> to displayed?  What type of query entry form will they be typing into?
> >> 
> >>        Erik
> >> 
> >> 

Reply via email to