Hello,

and thank you for your answer Shawn.

I tried to simplify my problem but I realize I chose a bad example : I
don't process phone numbers, and I do process unstructured documents.

My GATE application might return several annotations for the same group of
words (because I'm using an ontology). So for example, I will have an
Animal annotation, which marks the words "cat", "catfish" and "eider" as
Animal(s), and (depending on the ontology used) the "cat" annotation will
have 2 features : Animal.class=mammal and Animal.class="cat", the "catfish"
will have 1 feature Animal.class=fish, and the more specific term "eider"
will have 2 features : Animal.class=bird, Animal.class=duck.

I don't want to consider 1 solr "document" for each animal, I really want 1
index for each actual document. I'd like to be able to query my solr index
for "bird" and get all the documents containing the terms "bird", or any
subclass or instance (like "duck" or "eider"). Since all the words "bird",
"duck" and "eider" appearing in my documents will be tagged as Animal and
there will be an annotation with Animal.class=bird, it is easy to get Solr
to return the right documents.

But since I get something like :

<result>
  <doc>
    <str name="id">hdfs://...</str>
    <arr name="animal">
      <str>cat</str>
      <str>cat</str>
      <str>catfish</str>
      <str>eider</str>
      <str>eider</str>
    </arr>
    <arr name="class">
      <str>mammal</str>
      <str>cat</str>
      <str>fish</str>
      <str>bird</str>
      <str>duck</str>
    </arr>
    <arr name="instance">
      <str>http://.../Animal#catfish</str>
      <str>http://.../Animal#eider</str>
      <str>http://.../Animal#eider</str>
    </arr>
  </doc>
  <doc>
       ...
  </doc>
  <doc>
       ...
  </doc>
</result>

... when I want to generate a snippet of the document and highlight the
terms whose appearance made solr return the document (like the first
document containing "eider" when the user is searching for a "bird"), I'd
like to highlight the term "eider" in the snippet, but I don't know how to
do that. Having a correspondance between my solr "animal" and "class"
fields (for example, an id attribute that would link them : <str
id="5">eider</str> and the same id for the class "bird") would make it
easier to highlight my term "eider".

What do you think ?

Thanks !
Jim

Reply via email to