Hi Matt,
On 21/04/2020 13:41, matthew sporleder wrote:
Sorry for the vague question and I appreciate the book recommendations
-- I actually think I am mostly confused about suggest vs spellcheck
vs morelikethis as they relate to what I referred to as "expected"
behavior (like from a typed-in search bar).
Suggest - here's some results that might match based on what you've
typed so far (usually powered by a behind-the-scenes search of the index
with some restrictions). Note the difference between this and
autocompletion, which suggests complete search terms from the index
based on the partial word you've typed so far.
Spellcheck - The word you typed isn't anywhere in the index, so I've
used an edit distance algorithm to suggest a few words you might have
meant that are in the index (note this isn't spelling correction as the
engine doesn't necessarily have the corrected form in its index)
Morelikethis - here's some results that share some characteristics with
the document you're looking at, e.g. they're indexed by some of the same
terms
For reference we have been using solr as search in some form for
almost 10 years and it's always been great in finding things based on
clear keywords, programmatic-type discovery, a nosql/distrtibuted k:v
(actually really really good at this) but has always fallen short
(imho and also our fault, obviously) in the "typed in a search query"
experience.
I'm guessing you're bumping into the problem that most people type very
little into a search bar, and expect the engine to magically know what
they meant. It doesn't of course, so it has to suggest some ways for the
user to tell it more specific information - facets for example, or some
of the features above.
We are in the midst of re-developing our internal content ranking
system and it has me grasping on how to *really* elevate our game in
terms of giving an excellent human-driven discovery vs our current
behavior of: "here is everything we have that contains those words,
minus ones I took out".
I think you need to look at several angles:
- What defines a 'good' result in your world/for your content?
- Who judges this? How do you record this? Human/clicks/both?
- What Solr features *could* help - and how are you going to test that
they actually do using the two lines above?
We think that building up this measurement-driven, experimental process
is absolutely key to improving relevance.
Cheers
Charlie
On Tue, Apr 21, 2020 at 5:35 AM Charlie Hull <char...@flax.co.uk> wrote:
Hi Matt,
Are you looking for a good, general purpose schema and config for Solr?
Well, there's the problem: you need to define what you mean by general
purpose. Every search application will have its own requirements and
they'll be slightly different to every other application. Yes, there
will be some commonalities too. I guess by "as a human might expect one
to behave" you mean "a bit like how Google works" but unfortunately
Google is a poor example: you won't have Google's money or staff or
platform in your company, nor are you likely to be building a
massive-scale web search engine, so at best you can just take
inspiration from it, not replicate it.
In practice, what a lot of people do is start with an example setup
(perhaps from one of the examples supplied with Solr, e.g.
'techproducts') and adapt it: or they might start with the Solr
configset provided by another framework, e.g. Drupal (yay! Pink
Ponies!). Unfortunately the standard example configsets are littered
with comments that say things like 'Here is how you *could* do XYZ but
please don't actually attempt it this way' and other config sections
that if you un-comment them may just get you into further trouble. It's
grown rather than been built, and to my mind there's a good argument for
starting with an absolutely minimal Solr configset and only adding
things in as you need them and understand them (see
https://lucene.472066.n3.nabble.com/minimal-solrconfig-example-td4322977.html
for some background and a great presentation from Alex Rafalovitch on
the examples).
You're also going to need some background on *why* all these features
should be used, and for that I'd recommend my colleague Doug's book
Relevant Search https://www.manning.com/books/relevant-search - or maybe
our training (quick plug: we're running some online training in a couple
of weeks
https://opensourceconnections.com/blog/2020/05/05/tlre-solr-remote/ )
Hope this helps,
Cheers
Charlie
On 20/04/2020 23:43, matthew sporleder wrote:
Is there a comprehensive/big set of tips for making solr into a
search-engine as a human would expect one to behave? I poked around
in the nutch github for a minute and found this:
https://github.com/apache/nutch/blob/9e5ae7366f7dd51eaa76e77bee6eb69f812bd29b/src/plugin/indexer-solr/schema.xml
but I was wondering if I was missing a very obvious document
somewhere.
I guess I'm looking for things like:
use suggester here, use spelling there, use DocValues around here, DIY
pagerank, etc
Thanks,
Matt
--
Charlie Hull
OpenSource Connections, previously Flax
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.o19s.com
--
Charlie Hull
OpenSource Connections, previously Flax
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.o19s.com