RE: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-17 Thread Junte Zhang
My take on e-commerce search. Similarity matching using a vector space based model, probabilistic or Boolean ranking has not so much importance as compared to web search or other domains with full-text search. The reason is the content. Usually very short texts, highly structured, and often not

RE: tf function query

2017-10-05 Thread Junte Zhang
I am afraid this is not possible, since getting frequencies for phrases is not possible, unless the phrases are created as tokens (i.e. using n-grams or shingles) and indexed. If someone has a solution for this, then I am interested as well. /JZ -Original Message- From: Dmitry Kan [mai

RE: multi language search engine in solr

2017-09-11 Thread Junte Zhang
Having the language already separated makes it a lot easier. You could add the language suffix (e.g. 3 letter with ISO 639-2B https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) per field where you have the different languages. Or else you could have copied an entire field to their language

RE: Search by similarity?

2017-08-25 Thread Junte Zhang
If you already have the title of the document, then you could run that title as a new query against the whole index and exclude the source document from the results as a filter. You could use the DisMax query parser: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser And

RE: EdgeNGramFilterFactory did not work

2017-08-21 Thread Junte Zhang
3 06 2017-08-21 11:07 GMT+02:00 Junte Zhang : > You have to specify the field where you specified this field analyzer > in your request. If you use the catch all field by omitting the field, > it does not use your filter factory. > > /JZ > > -Original M

RE: EdgeNGramFilterFactory did not work

2017-08-21 Thread Junte Zhang
You have to specify the field where you specified this field analyzer in your request. If you use the catch all field by omitting the field, it does not use your filter factory. /JZ -Original Message- From: Guilleret Florian [mailto:guilleret.flor...@gmail.com] Sent: Thursday, August 1