Prioritizing search results in Solr that start with the search string

2020-02-09 Thread Yirmiyahu Fischer
Posted a bounty on the question
https://stackoverflow.com/questions/59811749/prioritizing-search-results-in-solr-that-start-with-the-search-string
Would appreciate some input.


Re: Prioritizing search results in Solr that start with the search string

2020-02-09 Thread Erick Erickson
Have you thought about EdgeNgramTokenFilter? You’d have to use KeywordTokenizer 
on a text type rather than a string. The idea would be to enforce some 
reasonable length limit on the input, say 16 characters (straw man). This is on 
the _index_ side.

Then for querying, do not use EdgeNgramTokenFilter, rather use 
TruncateTokenFilter to the same limit as above and boost the heck out of it. 
This’ll be significantly more performant since it’s a simple token match, 
admittedly also making your index larger.

Best,
Erick

> On Feb 9, 2020, at 5:04 AM, Yirmiyahu Fischer  
> wrote:
> 
> Posted a bounty on the question
> https://stackoverflow.com/questions/59811749/prioritizing-search-results-in-solr-that-start-with-the-search-string
> Would appreciate some input.



Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-09 Thread Emir Arnautović
Hi Pratik,
You might be able to do some of required things using 
PatternReplaceChartFilter, but as you can see it does not operate on tokens 
level but input string. Your best bet is custom token filter. Not sure how 
familiar you are with how token filters work, but you have access to tokens 
from previous filter and you can implement any logic you want: you consume 
three tokens and emit tokens based on adjacent tokens.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Feb 2020, at 19:27, Pratik Patel  wrote:
> 
> Hello Everyone,
> 
> Let's say I have an analyzer which has following token stream as an output.
> 
> *token stream : [], a, ab, [], c, [], d, de, def .*
> 
> Now let's say I want to add another filter which will drop a certain tokens
> based on whether adjacent token on the right side is [] or some string.
> 
> for a given token,
> drop/replace it by empty string it if there is a non-empty string
> token on its right and
> keep it if there is an empty token string on its right.
> 
> based on this, the resulting token stream would be like this.
> 
> *desired output stream : [], [a], ab, [], c, [], d,
> de, def *
> 
> 
> *Is there any Filter available in solr with which this can be achieved?*
> *If writing a custom filter is the only possible option then I want to know
> whether its possible to access adjacent tokens in the custom filter?*
> 
> *Any idea about this would be really helpful.*
> 
> Thanks,
> Pratik