> but Mick Semb Wever will be taking over this job for the next two weeks.

back from holidays and taking over where Glenn-Erik left. i'm very new
to Solr so please bear with me, 

i'll run through our setup from scratch.

Our test list has 9 entries:
 "abcd efgh ijkl", "abcd efgh", "efgh ijkl", "abcd", "efgh", "ijkl",
"ijkl efgh", "efgh abcd", and "ijkl efgh abcd".

I'm using a trunk build of Solr, and using the example/solr for the solr
home.

Editing schema.xml so to put these entries in as type="string" and using
defaultOperator="OR" gives the expected exact matching functionality
given queries are quoted, eg /solr/select/?q="abcd efgh ijkl"

So then i change type="string" to type="shingleString" along with

> <fieldType name="shingleString" class="solr.StrField" 
> positionIncrementGap="100" omitNorms="true" >
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" outputUnigrams="true" 
> outputUnigramIfNoNgram="true" maxShingleSize="99" />
>       </analyzer>
> </fieldType>

I never get any hits with quoted queries.
Without quotes i only get the unigrams.

I get the same outcomes using: 
[EMAIL PROTECTED]"solr.TextField" and 
in the index analyzer [EMAIL PROTECTED]"solr.KeywordTokenizerFactory".

In fact the ShingleFilter does nothing at all here, commenting the
filter line out leads exactly the same behaviour.

What am i missing to get shingles actually matching the indexed entries?
  It seems to be if this was solved it would work without having to use
quoted queries.

I have been using the analysis.jsp tool
Everything looks good except that quotes are captured into the words and
shingles, eg

> term position 1                2               3 
> term text     "abcd            efgh            ijkl" 
>               "abcd            efgh efgh ijkl" 
>               "abcd efgh ijkl"

This would explain why quoted queries are not working - the
ShingleFilter produces tokens with the " character in it. But here i
would have atleast expected a hit against efgh

~mck

-- 
"He who joyfully marches to music in rank and file has already earned my
contempt. He has been given a large brain by mistake, since for him the
spinal cord would suffice." Albert Einstein 
| semb.wever.org | sesat.no | sesam.no |

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to