Hi Pratik,
Shingle filter should do that.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 10 Feb 2020, at 18:57, Pratik Patel <pra...@semandex.net> wrote:
> 
> Thanks for the reply Emir.
> 
> I will be exploring the option of creating a custom filter. It's good to
> know that we can consume more than one tokens from previous filter and emit
> different number of tokens. Do you know of any existing filter in Solr
> which does something similar? It would be greatly helpful to see how more
> than one tokens can be consumed. I can implement my custom logic once I
> have access to multiple tokens from previous filter.
> 
> Thanks
> Pratik
> 
> On Mon, Feb 10, 2020 at 2:47 AM Emir Arnautović <
> emir.arnauto...@sematext.com> wrote:
> 
>> Hi Pratik,
>> You might be able to do some of required things using
>> PatternReplaceChartFilter, but as you can see it does not operate on tokens
>> level but input string. Your best bet is custom token filter. Not sure how
>> familiar you are with how token filters work, but you have access to tokens
>> from previous filter and you can implement any logic you want: you consume
>> three tokens and emit tokens based on adjacent tokens.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 7 Feb 2020, at 19:27, Pratik Patel <pra...@semandex.net> wrote:
>>> 
>>> Hello Everyone,
>>> 
>>> Let's say I have an analyzer which has following token stream as an
>> output.
>>> 
>>> *token stream : [], a, ab, [], c, [], d, de, def .....*
>>> 
>>> Now let's say I want to add another filter which will drop a certain
>> tokens
>>> based on whether adjacent token on the right side is [] or some string.
>>> 
>>> for a given token,
>>>    drop/replace it by empty string it if there is a non-empty string
>>> token on its right and
>>>    keep it if there is an empty token string on its right.
>>> 
>>> based on this, the resulting token stream would be like this.
>>> 
>>> *desired output stream : [], [a]<dropped>, ab, [], c, [], d<dropped>,
>>> de<dropped>, def *
>>> 
>>> 
>>> *Is there any Filter available in solr with which this can be achieved?*
>>> *If writing a custom filter is the only possible option then I want to
>> know
>>> whether its possible to access adjacent tokens in the custom filter?*
>>> 
>>> *Any idea about this would be really helpful.*
>>> 
>>> Thanks,
>>> Pratik
>> 
>> 

Reply via email to