Hello,

I would appreciate any suggestions on solving following problem:

I'm trying to index newspaper. After processing logical structure and
articles, I have similar structure to this...

<article id="201" article_type="ARTICLE" pub_id="5" iss_id="6"
date="18560301">
   <word t="1137" l="147" b="1665" r="951" content="Une" page="1"
type="TEXT" cont="0"/>
   <word t="1136" l="213" b="1664" r="1017" content="date" page="1"
type="TEXT" cont="0"/>
   <word t="1133" l="292" b="1661" r="1096" content="nouvelle" page="1"
type="TEXT" cont="0"/>
...
<article id="207" article_type="ADVERTISEMENT" pub_id="5" iss_id="6"
date="18560301">
   <word t="1749" l="1094" b="1825" r="1731" content="INTÉRIEUR" page="4"
type="ADVERTISEMENT" cont="0"/>
...

Obviously, I would like to have all the benefits of full-text search with
proximity and other advanced options.
After going through SCHEMA.XML and docs, I can see that I should split each
"word" into something like this...
    <doc>
        <field name="type">ARTICLE</field>
        <field name="id">201</field>
        <field name="pub_id">5</field>
        <field name="iss_id">6</field>
        <field name="date">18560301</field>
        <field name="content">Une</field>
        <field name="t">1137</field>
        <field name="l">147</field>
        <field name="b">1665</field>
        <field name="r">951</field>
        <field name="page">1</field>
        <field name="wordttype">TEXT</field>
        <field name="cont">0</field>
    </doc>

However, if I use this approach, it seems like I lost some core
functionality of search...

- multiword searching ? For example searching for "Une date" ? Since each
word is treated as standalone document ?

- Proximity search ?

... and so on.

So I guess this approach isn't solution to my goal. Does anyone have some
recommendations on how to solve this ?

Goal would be to receive results that would have mentioned "attributes" for
each hit...so for previous example "Une date", I would receive hits with all
attributes that would allow me to correctly position them on image (t,l,b,r
as coordinates for example).

Kind Regards,

Sasha

Reply via email to