date:20100507

Re: Short DismaxRequestHandler Question

2010-05-07 Thread MitchK

Okay, let me be more specific:
I got a custom StopWordFilter and a WordMarkingFilter.

The WordMarkingFilter is an easy implementation to determine which type a
word is.
The StopWordFilter (my implementation) removes specific types of words *and*
all markers from all words.

This leads to a deletion of some parts of sentences.

In my disMaxQuery I specified some fields with such filters and some
without.

a) what docs should *not* match the query you listed
In this case: docs where only Solr OR development occur should not match.
It is not important, if both words occur in different fields.

b) what queries should *not* match the doc you listed
Actually "Solr Development Lucidworks" should not match, for example
(assuming that "lucidworks" does not occur in a field like content).
In this case, the user searches for development-work with Solr in relation
to LucidWorks.
Solr does not know about the relation, however with the 100%mm-definition, I
can tell Solr
something like this in a more easier way.

c) what types of URLs you've already tried
Those I have shown here. No more.

Let me be sure, that I have understood your part about how the
DisMaxRequestHandler works.
If I got 4 fields:
name, colour, category, manufacturer

And an example-doc like this:
title: iPhone
colour: black
category: smartphone
manufacturer: apple

And I got a dismax-query like this:
q=apple iPhone & qf=title^5 manufacturer & mm=100%
Than the whole thing will match (assumed that iPhone and /or apple where no
stopwords)?

If yes, than the problem is my filter-definition.
There were some threads with discussions about such problems with the
standard-stopWordFilter.

Another example:
title: "Solr in a production environment"
cat: "tutorial"

At index-time, title is reduced to: "Solr production environment".
A query like this "using Solr in a production environment"
will be reduced to "Solr production environment".
This will work, as I have understood, because both: the indexed terms and
the query are the same.

However, if I got a "content" field, that indexes the content of the text
without my markerFilter, this won't work, because the parsed query-strings
are different??? I don't understand the problem

example:
title: "Solr in a production environment"
cat: "tutorial"
content: "here is some text about using Solr in production. This fieldType
consists of a lowerCaseFilter and a standard-StopWordFilter to delete all
words like 'the, and, in' etc."

Please, note that "environment" does not occur in the content-field.
So a parsed querystring would look like:
"using Solr in a production environment" -> "using Solr production
environment" (stopwords are removed).
This won't match, because the word "environment" does not occur in the
content-field? And according to that, the whole doc does not match?

If you are confused about my examples and questions - I was trying to
understand the explanations that were described here:
http://lucene.472066.n3.nabble.com/DisMax-request-handler-doesn-t-work-with-stopwords-td478128.html#a478128

Thank you for help.

- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/Short-DismaxRequestHandler-Question-tp775913p783063.html
Sent from the Solr - User mailing list archive at Nabble.com.

41 matches

Mail list logo