Hello,

I am starting to wonder whether the module giving finnish language support (lingsoft) might be the cause? Like I earlier said I have inherited this project so my understanding of all the bells and whistles is a bit limited.

Some selected parts from the schema.xml file:

<schema name="example" version="1.2">
...
<fieldType name="suomi" class="solr.TextField" positionIncrementGap="100">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                />
<filter class="lingSoft.LSFactory"/>
<filter class="solr.PositionFilterFactory" />
</analyzer>
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                />
<filter class="lingSoft.LSFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="0"
                preserveOriginal="1"
                />
</analyzer>
</fieldType>
...
<field name="text_fi" type="suomi" indexed="true" stored="true" multiValued="true" required="false" />
...
<dynamicField name="*_t" type="text" indexed="true" stored="true" multiValued="true"/>
...
<!-- dynamic field for finnish language support with the lingsoft transformation --> <dynamicField name="*_fi" type="suomi" indexed="true" stored="true" multiValued="true" />
....
<dynamicField name="ignored_*" type="ignored" multiValued="true"/>
<dynamicField name="attr_*" type="textgen" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="random_*" type="random" />
<dynamicField name="*" type="text" multiValued="true" index="true" stored="true" />

Best regards,
Lauri Hyttinen


On 11/03/2011 10:09 PM, Chris Hostetter wrote:
Interesting, in the case where you use quotes...

: +<result name="response" numFound="6888" start="0" maxScore="3.0879765">
        ...
:</lst><str name="rawquerystring">"asuntojen hinnat"</str>
:<str name="querystring">"asuntojen hinnat"</str>

...there is one DisjunctionMaxQuery (expected) for the entire phrase,
but in the sub-clauses for each individual field the clauses coming from
your "_fi" fields are just building boolean "OR" queries of the terms from
your phrase (instead of building an actual phrase query...

:<str name="parsedquery">+DisjunctionMaxQuery((table.title_t:"asuntojen
: hinnat"^2.0 | title_t:"asuntojen hinnat"^2.0 | ingress_t:"asuntojen hinnat" |
: (text_fi:asunto text_fi:hinta) | (table.description_fi:asunto
: table.description_fi:hinta) | table.description_t:"asuntojen hinnat" |
: graphic.title_t:"asuntojen hinnat"^2.0 | ((graphic.title_fi:asunto
: graphic.title_fi:hinta)^2.0) | ((table.title_fi:asunto
: table.title_fi:hinta)^2.0) | table.contents_t:"asuntojen hinnat" |
: text_t:"asuntojen hinnat" | (ingress_fi:asunto ingress_fi:hinta) |
: (table.contents_fi:asunto table.contents_fi:hinta) | ((title_fi:asunto
: title_fi:hinta)^2.0))~0.01) () type:tie^6.0 type:kuv^2.0 type:tau^2.0
: 
FunctionQuery((1.0/(3.16E-11*float(ms(const(1319437912691),date(date.modified_dt)))+1.0))^100.0)</str>

...is this perhaps a side effect of the new autoGeneratePhraseQueries
option? ... you are explicitly specifying a quoted phrase, but
maybe somehwere in the code path of the dismax parser that information is
getting lost?

can you post the details of your schema.xml?  (ie: the "version" property
on the schema file, and the dynamicField/field + fieldType definitions for
all these fields)

In contrast, your unquoted example is working exactly as i'd expect.  a
DisjunctionMaxQuery is built for each clause of the input, and the two
DisjunctionMaxQuery objects are then combined in a BooleanQuery where the
minNrShouldMatch property is set to "2"....

: +<result name="response" numFound="1065" start="0"
: maxScore="2.230382"></result>
        ...
:<str name="rawquerystring">asuntojen hinnat</str>
:<str name="querystring">asuntojen hinnat</str>
:
:<str name="parsedquery">+((DisjunctionMaxQuery((table.title_t:asuntojen^2.0 |
: title_t:asuntojen^2.0 | ingress_t:asuntojen | text_fi:asunto |
: table.description_fi:asunto | table.description_t:asuntojen |
: graphic.title_t:asuntojen^2.0 | graphic.title_fi:asunto^2.0 |
: table.title_fi:asunto^2.0 | table.contents_t:asuntojen | text_t:asuntojen |
: ingress_fi:asunto | table.contents_fi:asunto | title_fi:asunto^2.0)~0.01)
: DisjunctionMaxQuery((table.title_t:hinnat^2.0 | title_t:hinnat^2.0 |
: ingress_t:hinnat | text_fi:hinta | table.description_fi:hinta |
: table.description_t:hinnat | graphic.title_t:hinnat^2.0 |
: graphic.title_fi:hinta^2.0 | table.title_fi:hinta^2.0 |
: table.contents_t:hinnat | text_t:hinnat | ingress_fi:hinta |
: table.contents_fi:hinta | title_fi:hinta^2.0)~0.01))~2) () type:tie^6.0
: type:kuv^2.0 type:tau^2.0
: 
FunctionQuery((1.0/(3.16E-11*float(ms(const(1319438484878),date(date.modified_dt)))+1.0))^100.0)</str>


-Hoss



--
Lauri Hyttinen
Tietopalvelusuunnittelija
Tilastokeskus
Yksikkö
Käyntiosoite: Työpajankatu 13, 00580 Helsinki
Postiosoite: PL 3 A, 00022 Tilastokeskus
puh. 09 1734 0000
lauri.hytti...@tilastokeskus.fi
www.tilastokeskus.fi

Reply via email to