partial search help request

2020-08-05 Thread Philip Smith
Hello,
I'm new to Solr and to this user group. Any help with this problem would be
greatly appreciated.

I'm trying to get partial keyword search results working. This seems like a
fairly common problem, I've found numerous google results offering
solutions
for instance
https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
but when I attempt to implement them I'm not receiving the desired results.

I'm running solr 8.5.2 in standalone mode, manually editing the configs.

I have configured the title field as



I have also tried it with this parameter  omitTermFreqAndPositions="true"

The field type definition is:

















I'm using edismax and searching on title.

http://localhost:8983/solr/events/select?defType=edismax&df=title&fl=title&q=educatio

when using edge_ngram_test_5

edu  correctly finds 4 results
educa   finds 0
educat  finds 0
educati finds 0
educatio   finds 0
education correctly finds 4.

Steps taken between changes to the schema.
bin/solr restart
reimport data
core admin > reload core

In admin, I see the correct value,
Typeedge_ngram_test_5 when I check in schema.

In admin , when I check in analysis and search on text analyse

[image: image.png]
it appears to be breaking the word down into letters as I would guess is
the correct step.

These are the query results:
[image: image.png]

it looks like it is applying the correct filter names and the search term
isn't being altered. I don't understand enough to be able to determine why
the query can't find the search result when it appears to have been
indexed. Any advice is very welcome as I've spent hours trying to get this
working.


I've also tried with:









































Thanks in advance for any insights offered.
Kind regards,
Phil.


Re: partial search help request

2020-08-05 Thread Philip Smith
Hello,
I've had a break-through with my partial string search problem, I don't
understand why though.

I found yet another example,
https://medium.com/aubergine-solutions/partial-string-search-in-apache-solr-4b9200e8e6bb
and this one uses a different tokenizer, whitespaceTokenizerFactory













The analysis results look very different. It seems to be returning the
desired results so far.
[image: image.png]

I don't understand why the other examples that worked for other people
weren't working for me. Is it version 8?
StandardTokenizerFactory didn't work and when I was trying with
the KeywordTokenizerFactory it wasn't even matching the full search term.
If anyone can shed any light, then I'd be grateful.
Thanks.


On Wed, Aug 5, 2020 at 7:12 PM Philip Smith  wrote:

> Hello,
> I'm new to Solr and to this user group. Any help with this problem
> would be greatly appreciated.
>
> I'm trying to get partial keyword search results working. This seems like
> a fairly common problem, I've found numerous google results offering
> solutions
> for instance
> https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
> but when I attempt to implement them I'm not receiving the desired
> results.
>
> I'm running solr 8.5.2 in standalone mode, manually editing the configs.
>
> I have configured the title field as
>
>  multiValued="false"/>
>
> I have also tried it with this parameter  omitTermFreqAndPositions="true"
>
> The field type definition is:
>
>  "false">
> 
> 
> 
> 
> 
>  "35" />
> 
> 
> 
> 
> 
> 
> 
> 
>
> I'm using edismax and searching on title.
>
>
> http://localhost:8983/solr/events/select?defType=edismax&df=title&fl=title&q=educatio
>
> when using edge_ngram_test_5
>
> edu  correctly finds 4 results
> educa   finds 0
> educat  finds 0
> educati finds 0
> educatio   finds 0
> education correctly finds 4.
>
> Steps taken between changes to the schema.
> bin/solr restart
> reimport data
> core admin > reload core
>
> In admin, I see the correct value,
> Typeedge_ngram_test_5 when I check in schema.
>
> In admin , when I check in analysis and search on text analyse
>
> [image: image.png]
> it appears to be breaking the word down into letters as I would guess is
> the correct step.
>
> These are the query results:
> [image: image.png]
>
> it looks like it is applying the correct filter names and the search term
> isn't being altered. I don't understand enough to be able to determine why
> the query can't find the search result when it appears to have been
> indexed. Any advice is very welcome as I've spent hours trying to get this
> working.
>
>
> I've also tried with:
>  positionIncrementGap="100">
> 
> 
> 
>  "25"/>
> 
> 
> 
> 
> 
> 
>
>  positionIncrementGap="100" >
> 
> 
>  "stopwords.txt" />
> 
>  "30"/> 
> 
> 
> 
>  "stopwords.txt" />
> 
> 
> 
> 
>
>
>  positionIncrementGap="100" >
> 
> 
> 
>  "25" />
> 
> 
> 
> 
> 
>
>
> Thanks in advance for any insights offered.
> Kind regards,
> Phil.
>


Re: partial search help request

2020-08-05 Thread Philip Smith
Great advice Erick, kindly appreciated.

I removed PorterStemFilter as you suggested and it worked as one would
expect it to. Very useful to learn about avoiding KeywordTokenizerFactory,
the limitation of the WhitespaceTokenizer and the testing approach.

Best,
Phil

On Wed, Aug 5, 2020 at 8:37 PM Erick Erickson 
wrote:

> First of all, lots of attachments are stripped by the mail server so a
> number of your attachments didn’t come through, although your field
> definitions did so we can’t see your results.
>
> KeywordTokenizerFactory is something I’d avoid at this point. It doesn’t
> break up the input at all, so input of “my dog has fleas” indexes exactly
> one token, “my dog has fleas” which is usually not what people want.
>
> For the other problems, I’d suggest several ways to narrow down the issue.
>
> 1> remove PorterStemFilter and see what you get. This is something of a
> long shot, but I’ve seen this cause unexpected results due to the
> altorighmic nature of the stemmer not quite matching your assumptions.
>
> 2> add &debug=query to your URL and look particularly at the “parsed
> query” section. That’ll show you exactly how the search string was
> transmorgified prior to search and often offers clues.
>
> 3> Don’t use edismax to start. What you’ve shown looks correct, this is
> just on the theory that using something simpler to start means fewer moving
> parts.
>
>
> Also, be a little careful of WhitespaceTokenizer. For controlled
> experiments where you’re tightly controlling the input, but going to prod
> has some issues. That tokenizer works fine, it’s just that it’ll include,
> say, the period at the end of a sentence with the last word of the sentence…
>
> Best,
> Erick
>
> > On Aug 5, 2020, at 8:08 AM, Philip Smith  wrote:
> >
> > Hello,
> > I've had a break-through with my partial string search problem, I don't
> understand why though.
> >
> > I found yet another example,
> https://medium.com/aubergine-solutions/partial-string-search-in-apache-solr-4b9200e8e6bb
> > and this one uses a different tokenizer, whitespaceTokenizerFactory
> >
> >  positionIncrementGap="100">
> >   
> > 
> >  maxGramSize="50"/>
> > 
> >   
> >   
> > 
> > 
> >   
> > 
> >
> > The analysis results look very different. It seems to be returning the
> desired results so far.
> >
> >
> > I don't understand why the other examples that worked for other people
> weren't working for me. Is it version 8?
> > StandardTokenizerFactory didn't work and when I was trying with the
> KeywordTokenizerFactory it wasn't even matching the full search term.
> > If anyone can shed any light, then I'd be grateful.
> > Thanks.
> >
> >
> > On Wed, Aug 5, 2020 at 7:12 PM Philip Smith  wrote:
> > Hello,
> > I'm new to Solr and to this user group. Any help with this problem would
> be greatly appreciated.
> >
> > I'm trying to get partial keyword search results working. This seems
> like a fairly common problem, I've found numerous google results offering
> solutions
> > for instance
> https://stackoverflow.com/questions/28753671/how-to-configure-solr-to-do-partial-word-matching
> > but when I attempt to implement them I'm not receiving the desired
> results.
> >
> > I'm running solr 8.5.2 in standalone mode, manually editing the configs.
> >
> > I have configured the title field as
> >
> >  stored="true" multiValued="false"/>
> >
> > I have also tried it with this parameter
> omitTermFreqAndPositions="true"
> >
> > The field type definition is:
> >
> >omitNorms="false">
> >   
> > 
> > 
> > 
> > 
> >  maxGramSize="35" />
> >   
> >   
> > 
> > 
> > 
> > 
> >   
> > 
> >
> > I'm using edismax and searching on title.
> >
> >
> http://localhost:8983/solr/events/select?defType=edismax&df=title&fl=title&q=educatio
> >
> > when using edge_ngram_test_5
> >
> > edu  correctly finds 4 results
> > educa   finds 0
> > educat  finds 0
> > educati finds 0
> > educatio   finds 0
> > education correctly finds 4.
> >
> > Steps taken between changes to the schema.
> > bin/solr restart
> > reimport data
> > core admin > reload core
> >
> > In admin, I see the correct value,

Suggester not suggesting but spellchecker is

2020-08-06 Thread Philip Smith
Hello,

Any advice on the following suggester not suggesting issue would be very
welcome.

I can get suggestions when using a spell checker but not when using the
suggester. Both types are querying the same suggestion field.

I'm using the following settings:

 



  


  



  


I've tried using different field types for suggestion. I read that the
field shouldn't be heavily processed with stemmers for instance, I tried
with string but it made no difference.

  
  
  
  


I've tried with about 6 different examples from online, and none return
results, below is an example of one, the other examples were variations
using FuzzyLookupFactory instead.

  
  
  mySuggester
  FSTLookupFactory
  DocumentDictionaryFactory
  suggestion
  string

  


  

  true
  10
 mySuggester


  suggest5

  

http://localhost:8983/solr/events/suggest5?suggest.dictionary=mySuggester&suggest=true&suggest.build=true&suggest.q=edu
http://localhost:8983/solr/events/suggest5?q=edu

both return

{
  "responseHeader":{
"status":0,
"QTime":4},
  "suggest":{"mySuggester":{
  "edu":{
"numFound":0,
"suggestions":[]


spell checker returns a couple of results for this.
I was restarting solr after making any changes.

This is the setup for the spellchecker:

  
  string
  
  suggest
  org.apache.solr.spelling.suggest.Suggester
 FuzzyLookupFactory
 string
  suggestion
  0.1
  spellchecker
  freq
  true
  0.5
  
  
 

  
  
  on
  suggest
  true
  true
  10
  true
  10
  5
  
  
  suggest
  
  


I'm trying to get an old version 4 config working updated to work with 8.5
Am I missing out by not being able to use the dedicated suggester?
Is the field type for suggestion optimal?

Many thanks in advance.

Best,
Phil.













  
  string
  
  suggest
  org.apache.solr.spelling.suggest.Suggester
  FuzzyLookupFactory
  string
  suggestion
  0.1
  spellchecker
  freq
  true
  0.5