Re: Reusable tokenstream

2017-11-23 Thread Roxana Danger
That's great!! Got it. Thank you very much. On Wed, Nov 22, 2017 at 5:07 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Roxana, > The idea with update request processor is to have following parameters: > * inputField - document field with text to analyse > * sharedAnalysis - fie

Re: Reusable tokenstream

2017-11-22 Thread Emir Arnautović
Hi Roxana, The idea with update request processor is to have following parameters: * inputField - document field with text to analyse * sharedAnalysis - field type with shared analysis definition * targetFields - comma separated list of fields where results should be stored. * fieldSpecificAnalysis

Re: Reusable tokenstream

2017-11-22 Thread Roxana Danger
Mikhail, Yes, I've just seen your message... "Hello, Roxana. You probably looking for TeeSinkTokenFilter, but I believe the idea is cumbersome to implement in Solr. Also there is a preanalyzed field which can keep tokenstream in external form." This is the answer I was looking for. Thanks a lot.

Re: Reusable tokenstream

2017-11-22 Thread Roxana Danger
Hi Emir, In this case, I need more control at Lucene level, so I have to use the lucene index writer directly. So, I can not use Solr for importing. Or, is there anyway I can add a tokenstream to a SolrInputDocument (is there any other class exposed by Solr during indexing that I can use for this p

Re: Reusable tokenstream

2017-11-22 Thread Mikhail Khludnev
Roxana, Have you seen my response in "tokenstream reusable" thread? reusableTokenStream(java.lang.String , doesn't help you. TokenStream is stateless, it holds the attributes

Re: Reusable tokenstream

2017-11-22 Thread Emir Arnautović
Hi Roxana, I think you can use https://lucene.apache.org/core/5_4_0/analyzers-common/org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html like suggested earlier. HTH, Emir -- Mo

Re: Reusable tokenstream

2017-11-22 Thread Roxana Danger
Hi Emir, Many thanks for your reply. The UpdateProcessor can do this work, but is analyzer.reusableTokenStream the way to obtain a previous generated tokens

Re: Reusable tokenstream

2017-11-22 Thread Emir Arnautović
Hi Roxana, I don’t think that it is possible. In some cases (seems like yours is good fit) you could create custom update request processor that would do the shared analysis (you can have it defined in schema) and after analysis use those tokens to create new values for those two fields and remo

Reusable tokenstream

2017-11-22 Thread Roxana Danger
Hello all, I would like to reuse the tokenstream generated for one field, to create a new tokenstream (adding a few filters to the available tokenstream), for another field without the need of executing again the whole analysis. The particular application is: - I have field *tokens* that uses an