Hello,

we are a long time nutch user (Since 0.7)
Now we made the big jump from 0.9 to 1.5 and solr 4.0


We use it to index different websites and then provide site specific search for these.

Currently we index the sites and store them all in one solr instance.
The different sites are separated via the host entry in solr, this works fine.

An important thing is, that each site can have text in multiple languages (For example en, de, fr, cn etc.)
We separate the via the lang flag (thins works fine)

We now with to integrate the spellchecker to provide the "Did you mean...." functionality. This works only partly fine, since it will always have a word list over all sites and all languages.... We would need to have a wordlist/spellchecker (based on the content field) which is "separate" for each site and language.

What would a clean way to solve this requirement bee ?

When we create a solr instance per site, then we would at least get the wordlist separated by site,
but then we still have the problem on separating them by language.....


Any ideas/hints ?

With best regards


--
Aarboard AG    Phone: +41 32 332 97 14
Egliweg 10     Fax:   +41 32 332 97 15
2560 Nidau
Switzerland    www.aarboard.ch

Reply via email to