Hi there,

I’m developing custom java application with lucene 8.5.0.

I've tried to use DelimitedBoostTokenFilterFactory but I have a problem, so
please help me if I'm doing something wrong.

I’m using StandardAnalyzer for search, and my SynonymGraphFilter has
configuration as below:

Map<String, String> synonymParam = new HashMap<>();
            synonymParam.put("synonyms", synonymFileName);
            synonymParam.put("ignoreCase", "true");
            synonymParam.put("format", "solr");
            synonymParam.put("expand","true");

synonymParam.put("tokenizerFactory","org.apache.lucene.analysis.core.
*StandardTokenizerFactory*");
Map<String, String> *delimitedBoostTokenFilterMap *= new HashMap<>();
delimitedBoostTokenFilterMap.put("delimiter", "|");
Analyzer customAnalyzer = CustomAnalyzer.builder(Paths.get(synonymFolder))
                    .withTokenizer(StandardTokenizerFactory.NAME)
                    .addTokenFilter(SynonymGraphFilterFactory.NAME,
synonymParam)
                    .addTokenFilter(DelimitedBoostTokenFilterFactory.NAME,
delimitedBoostTokenFilterMap)
                    .build();


Here’s my debug output:

Query:  +spanOr([spanNear([morphology_term_original_name:tumor,
morphology_term_original_name:0.8], 0, true),
spanNear([morphology_term_original_name:neoplasm,
morphology_term_original_name:0.7], 0, true),
spanNear([morphology_term_original_name:tumour,
morphology_term_original_name:0.6], 0, true)])
(spanOr([spanNear([morphology_term_pathognomonic:tumor,
morphology_term_pathognomonic:0.8], 0, true),
spanNear([morphology_term_pathognomonic:neoplasm,
morphology_term_pathognomonic:0.7], 0, true)

Problem is in using StandardTokenizerFactory. It doesn't recognise properly
delimiter character and because of that final query isn't ok. It hasn't
boosting factor and looks like I have already described in previous mail.
It should look like this:
: +Synonym(morphology_term_original_name_key:neoplasm^0.7
morphology_term_original_name_key:tumor^0.8
morphology_term_original_name_key:tumour^0.6)

Synonym row looks like this:



neoplasm|0.7, tumor|0.8, tumour|0.6


I tried also with rows like bilow but the result is the same:


neoplasm => neplasm|0.7, tumor|0.8, tumour|0.6

tumor => neplasm|0.7, tumor|0.8, tumour|0.6

tumour => neplasm|0.7, tumor|0.8, tumour|0.6




Thanks in advance!

Reply via email to