Hola Saša,
You don't have to recreate logic for proximity (I assume that by that you mean proximity of words/terms for phrase queries), if you have a text field with all your content. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Saša Mutić <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 2, 2008 3:43:33 PM > Subject: Re: complex XML structure problem > > Bok Otis, > > I was thinking about this approach, but was wondering if there is more > elegant approach where I wouldn't have to recreate logic for proximity and > quoted complex queries (identification of neighbor hits and quote queries > for highlighting and positioning on image). > > If nobody comes up with better approach, I will use something similar as you > described. > > Thanks for fast response :) > > Kind Regards, > Saša > > > On Thu, Oct 2, 2008 at 5:51 PM, Otis Gospodnetic > > wrote: > > > Bok Saša, > > > > It sounds like you need to keep per-word metadata, plus the raw content so > > you can full-text search it. > > If so, consider keeping the meta data elsewhere - e.g. different index, > > external DB, etc. > > For full-text search you probably want to index the full content, something > > like: > > > > article > > Une date.......... > > 123 > > > > > > You could create another index with words and each word Document have an ID > > of their "parent" (e.g. the article's ID), so you do a query against the > > above index, get the IDs of matches, and then get words for those matches. > > Of course, you can also use a RDBMS or some other storage for the second > > part. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > > > From: Saša Mutić > > > To: solr-user@lucene.apache.org > > > Sent: Thursday, October 2, 2008 6:14:14 AM > > > Subject: complex XML structure problem > > > > > > Hello, > > > > > > I would appreciate any suggestions on solving following problem: > > > > > > I'm trying to index newspaper. After processing logical structure and > > > articles, I have similar structure to this... > > > > > > > > > date="18560301"> > > > > > > type="TEXT" cont="0"/> > > > > > > type="TEXT" cont="0"/> > > > > > > type="TEXT" cont="0"/> > > > ... > > > > > > date="18560301"> > > > > > > type="ADVERTISEMENT" cont="0"/> > > > ... > > > > > > Obviously, I would like to have all the benefits of full-text search with > > > proximity and other advanced options. > > > After going through SCHEMA.XML and docs, I can see that I should split > > each > > > "word" into something like this... > > > > > > ARTICLE > > > 201 > > > 5 > > > 6 > > > 18560301 > > > Une > > > 1137 > > > 147 > > > 1665 > > > 951 > > > 1 > > > TEXT > > > 0 > > > > > > > > > However, if I use this approach, it seems like I lost some core > > > functionality of search... > > > > > > - multiword searching ? For example searching for "Une date" ? Since each > > > word is treated as standalone document ? > > > > > > - Proximity search ? > > > > > > ... and so on. > > > > > > So I guess this approach isn't solution to my goal. Does anyone have some > > > recommendations on how to solve this ? > > > > > > Goal would be to receive results that would have mentioned "attributes" > > for > > > each hit...so for previous example "Une date", I would receive hits with > > all > > > attributes that would allow me to correctly position them on image > > (t,l,b,r > > > as coordinates for example). > > > > > > Kind Regards, > > > > > > Sasha > > > >