Bok Saša, It sounds like you need to keep per-word metadata, plus the raw content so you can full-text search it. If so, consider keeping the meta data elsewhere - e.g. different index, external DB, etc. For full-text search you probably want to index the full content, something like:
<field name="type">article</field> <field name="content">Une date..........</field> <field name="id">123</field> You could create another index with words and each word Document have an ID of their "parent" (e.g. the article's ID), so you do a query against the above index, get the IDs of matches, and then get words for those matches. Of course, you can also use a RDBMS or some other storage for the second part. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Saša Mutić <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, October 2, 2008 6:14:14 AM > Subject: complex XML structure problem > > Hello, > > I would appreciate any suggestions on solving following problem: > > I'm trying to index newspaper. After processing logical structure and > articles, I have similar structure to this... > > > date="18560301"> > > type="TEXT" cont="0"/> > > type="TEXT" cont="0"/> > > type="TEXT" cont="0"/> > ... > > date="18560301"> > > type="ADVERTISEMENT" cont="0"/> > ... > > Obviously, I would like to have all the benefits of full-text search with > proximity and other advanced options. > After going through SCHEMA.XML and docs, I can see that I should split each > "word" into something like this... > > ARTICLE > 201 > 5 > 6 > 18560301 > Une > 1137 > 147 > 1665 > 951 > 1 > TEXT > 0 > > > However, if I use this approach, it seems like I lost some core > functionality of search... > > - multiword searching ? For example searching for "Une date" ? Since each > word is treated as standalone document ? > > - Proximity search ? > > ... and so on. > > So I guess this approach isn't solution to my goal. Does anyone have some > recommendations on how to solve this ? > > Goal would be to receive results that would have mentioned "attributes" for > each hit...so for previous example "Une date", I would receive hits with all > attributes that would allow me to correctly position them on image (t,l,b,r > as coordinates for example). > > Kind Regards, > > Sasha