Re: partial search help request

Philip Smith Wed, 05 Aug 2020 05:08:48 -0700

Hello,
I've had a break-through with my partial string search problem, I don't
understand why though.


I found yet another example,
https://medium.com/aubergine-solutions/partial-string-search-in-apache-solr-4b9200e8e6bb
and this one uses a different tokenizer, whitespaceTokenizerFactory

<fieldType name="text_ngrm" class="solr.TextField" positionIncrementGap=
"100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="50"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

The analysis results look very different. It seems to be returning the
desired results so far.
[image: image.png]

I don't understand why the other examples that worked for other people
weren't working for me. Is it version 8?
StandardTokenizerFactory didn't work and when I was trying with
the KeywordTokenizerFactory it wasn't even matching the full search term.
If anyone can shed any light, then I'd be grateful.
Thanks.


On Wed, Aug 5, 2020 at 7:12 PM Philip Smith <phi...@keep.edu.hk> wrote:

> Hello,
> I'm new to Solr and to this user group. Any help with this problem
> would be greatly appreciated.
>
> I'm trying to get partial keyword search results working. This seems like
> a fairly common problem, I've found numerous google results offering
> solutions
> for instance
> https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
> but when I attempt to implement them I'm not receiving the desired
> results.
>
> I'm running solr 8.5.2 in standalone mode, manually editing the configs.
>
> I have configured the title field as
>
> <field name="title" type="edge_ngram_test_5" indexed="true" stored="true"
> multiValued="false"/>
>
> I have also tried it with this parameter  omitTermFreqAndPositions="true"
>
> The field type definition is:
>
> <fieldType name="edge_ngram_test_5" class="solr.TextField" omitNorms=
> "false">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize=
> "35" />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.PorterStemFilterFactory"/>
> </analyzer>
> </fieldType>
>
> I'm using edismax and searching on title.
>
>
> http://localhost:8983/solr/events/select?defType=edismax&df=title&fl=title&q=educatio
>
> when using edge_ngram_test_5
>
> edu          correctly finds 4 results
> educa       finds 0
> educat      finds 0
> educati     finds 0
> educatio   finds 0
> education correctly finds 4.
>
> Steps taken between changes to the schema.
> bin/solr restart
> reimport data
> core admin > reload core
>
> In admin, I see the correct value,
> Typeedge_ngram_test_5 when I check in schema.
>
> In admin , when I check in analysis and search on text analyse
>
> [image: image.png]
> it appears to be breaking the word down into letters as I would guess is
> the correct step.
>
> These are the query results:
> [image: image.png]
>
> it looks like it is applying the correct filter names and the search term
> isn't being altered. I don't understand enough to be able to determine why
> the query can't find the search result when it appears to have been
> indexed. Any advice is very welcome as I've spent hours trying to get this
> working.
>
>
> I've also tried with:
> <fieldType name="edge_n2_kw_text" class="solr.TextField" omitNorms="true"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize=
> "25"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
> <fieldType name="text_edgengram_prod" class="solr.TextField"
> positionIncrementGap="100" >
> <analyzer type="index" >
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words=
> "stopwords.txt" />
> <filter class="solr.PorterStemFilterFactory" />
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize=
> "30"/> <!-- RDH - removed side="front"-->
> </analyzer>
> <analyzer type="query" >
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" words=
> "stopwords.txt" />
> <filter class="solr.PorterStemFilterFactory" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
> </fieldType>
>
>
> <fieldType name="edge_ngram_test_4" class="solr.TextField"
> positionIncrementGap="100" >
> <analyzer type="index" >
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English" />
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize=
> "25" />
> </analyzer>
> <analyzer type="query" >
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> </analyzer>
> </fieldType>
>
>
> Thanks in advance for any insights offered.
> Kind regards,
> Phil.
>

Re: partial search help request

Reply via email to