Hi Pratik, Shingle filter should do that. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 10 Feb 2020, at 18:57, Pratik Patel <pra...@semandex.net> wrote: > > Thanks for the reply Emir. > > I will be exploring the option of creating a custom filter. It's good to > know that we can consume more than one tokens from previous filter and emit > different number of tokens. Do you know of any existing filter in Solr > which does something similar? It would be greatly helpful to see how more > than one tokens can be consumed. I can implement my custom logic once I > have access to multiple tokens from previous filter. > > Thanks > Pratik > > On Mon, Feb 10, 2020 at 2:47 AM Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Pratik, >> You might be able to do some of required things using >> PatternReplaceChartFilter, but as you can see it does not operate on tokens >> level but input string. Your best bet is custom token filter. Not sure how >> familiar you are with how token filters work, but you have access to tokens >> from previous filter and you can implement any logic you want: you consume >> three tokens and emit tokens based on adjacent tokens. >> >> HTH, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 7 Feb 2020, at 19:27, Pratik Patel <pra...@semandex.net> wrote: >>> >>> Hello Everyone, >>> >>> Let's say I have an analyzer which has following token stream as an >> output. >>> >>> *token stream : [], a, ab, [], c, [], d, de, def .....* >>> >>> Now let's say I want to add another filter which will drop a certain >> tokens >>> based on whether adjacent token on the right side is [] or some string. >>> >>> for a given token, >>> drop/replace it by empty string it if there is a non-empty string >>> token on its right and >>> keep it if there is an empty token string on its right. >>> >>> based on this, the resulting token stream would be like this. >>> >>> *desired output stream : [], [a]<dropped>, ab, [], c, [], d<dropped>, >>> de<dropped>, def * >>> >>> >>> *Is there any Filter available in solr with which this can be achieved?* >>> *If writing a custom filter is the only possible option then I want to >> know >>> whether its possible to access adjacent tokens in the custom filter?* >>> >>> *Any idea about this would be really helpful.* >>> >>> Thanks, >>> Pratik >> >>