Re: Feed index with analyzer output

Lox Sat, 02 Jul 2011 09:57:01 -0700

Yes, from an utilitarian perspective you're absolutely right.
Mine is actually a more academic exercise.


I will be more clear on the steps that I would like to take:
1) Call the analyzer of Solr that returns me an XML response in the
following format (just a snippet as example)

<lst name="attributeNames">
        <lst name="index">
         <lst name="incomingArc|1.6 outgoingArc|1.6">
          <arr name="org.apache.lucene.analysis.WhitespaceTokenizer">
                <lst>
                <str name="text">incomingArc|1.6</str>
                <str name="type">word</str>
                <int name="start">0</int>
                <int name="end">15</int>
                <int name="position">1</int>
                </lst>
                <lst>
                <str name="text">outgoingArc|1.6</str>
                <str name="type">word</str>
                <int name="start">16</int>
                <int name="end">31</int>
                <int name="position">2</int>
                </lst>
          </arr>
          <arr
name="org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter">
                <lst>
                <str name="text">incomingArc</str>
                <str name="type">word</str>
                <int name="start">0</int>
                <int name="end">15</int>
                <int name="position">1</int>
                <str
name="payload">org.apache.lucene.index.Payload:org.apache.lucene.index.Payload@ffe807d2</str>
                </lst>
                <lst>

etc.....

2) now I would like to be able to extract the info that I need from there
and tell Solr directly which things to index, telling him directly also
which are the tokens with their respective payload without performing more
analysis.
I know that solr does all those things internally starting from the original
text but is there a way to skip that phase by telling it immediately from a
given field which are the tokens with their payloads? So that they will be
stored internally as before, only that this time I would have performed the
2 steps (analysis and indexing) in 2 different phases, with my application
orchestrating both of them.

I don't know if building the documents with SolrJ could help...maybe that's
the way to go?
Or is there a particular XML format to send to Solr? For example somthing
like:

<add>
   <doc>
     <field name="id">0001</field>
     <field name="text">
         <rawValue>this is text</rawValue>
         <token pos="1" payload="2.0">this</token>
         <token pos="2" payload="1.0">is</token>
         <token pos="3" payload="2.5">text</token>
     </field>
   </doc>
</add>

Does it make sense? Or maybe I'm dreaming? :)

Thank you for answering!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3132556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Feed index with analyzer output

Reply via email to