On 2/9/2010 2:57 PM, Ahmet Arslan wrote:
I'm trying to improve the search box on our website by
adding an autosuggest field. The dataset is a set of
properties in the world (mostly europe) and the searchbox is
intended to be filled with a country-, region- or city name.
To do this I've created a separate, simple core with one
document per geographic location, for example the document
for the country "France" contains several fields including
the number of properties (so we can show the approximate
amount of results in the autosuggest box) and the name of
the country France in several languages and some other
bookkeeping information. The name of the property is stored
in two fields: "name" which simple contains the canonical
name of the country, region or city and "names" which is a
multivalued field containing the name in several different
languages. Both fields use an EdgeNGramFilter during
analysis so the query "Fr" can match "France".
This all seems to work, the autosuggest box gives
appropriate suggestions. But when I turn on highlighting the
results are less than desirable, for example the query "rho"
using dismax (and hl.snippets=5) returns the
following:
<lst name="5119">
<arr name="names">
<str><em>Rég</em>ion
Rhône-Alpes</str>
<str><em>Rhô</em>ne-Alpes</str>
<str><em>Rhô</em>ne-Alpes</str>
<str><em>Rhô</em>ne-Alpes</str>
<str><em>Rhô</em>ne-Alpes</str>
</arr>
<arr name="name">
<str><em>Rég</em>ion
Rhône-Alpes</str>
</arr>
</lst>
<lst name="5440">
<arr name="names">
<str><em>Dép</em>artement du
Rhône</str>
<str><em>Dép</em>artement du
Rhône</str>
<str><em>Rhô</em>ne</str>
<str><em>Dép</em>artement du
Rhône</str>
<str><em>Rhô</em>ne</str>
</arr>
<arr name="name">
<str><em>Dép</em>artement du
Rhône</str>
</arr>
</lst>
As you can see, no matter where the match is, the first 3
characters are highlighted. Obviously not correct for many
of the fields. Is this because of the NGramFilterFactory or
am I doing something wrong?
I used https://issues.apache.org/jira/browse/SOLR-357 for this sometime ago. It
was giving correct highlights.
I just ran a test with the NGramFilter removed (and reindexing) which
did give correct highlighting results but I had to query using the whole
word. I'll try the PrefixingFilterFactory next although according to the
comments it's nothing but a subset of the EdgeNGramFilterFactory so
unless I'm configuring it wrong it should yield the same results...
However we are now using
http://www.ajaxupdates.com/mootools-autocomplete-ajax-script/ It automatically
makes bold matching characters without using solr highlighting.
Using a pure javascript based solution isn't really an option for us as
that wouldn't work for the diacritical marks without a lot of
transliteration brouhaha.
Regards,
gwk