the queryparser first splits on whitespace. so each individual word of your query: short,red,evil,fox gets its own tokenstream, and therefore isn't shingled.
On Fri, Jun 4, 2010 at 6:21 PM, Greg Bowyer <gbow...@shopzilla.com> wrote: > Hi all > > Interesting and by the looks of things very solid project you have here > with > SOLR, however .. > > I have an index that contains a large number of "phrases" that I need to > search > for over, each of these phrases is fairly small being on average about 4 > words > long. > > The search terms that I am given to search these phrases are very long, and > quite arbitrary, sometimes the search terms will be up to 25 words long. > > As such the performance of my index when built naively is sporadic > sometimes > searches are very fast on average they are somewhat slower. > > I have attempted to improve this situation by using shingling for the > phrases > and the related search queries, in my schema I have the following > > > <fieldType name="bigramed_phrase" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ShingleFilterFactory" outputUnigrams="true" > outputUnigramIfNoNgram="true" /> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ShingleFilterFactory" outputUnigrams="false" > outputUnigramIfNoNgram="true" /> > </analyzer> > </fieldType> > > In the indexes, as seen with luke I do indeed have a large range of > shingled > terms. > > When I run the analyser for either query or index terms I also see the > breakdown > with the shingled terms correctly displayed. > > However when I attempt to use this in a query I do not see the terms > applied in > the debug output, for example with the term "short red evil fox" I would > expect > to see the shingles > 'short_red' 'red_evil' 'evil_fox' > > but instead I get the following > > "debug":{ > "rawquerystring":"short red evil fox", > "querystring":"short red evil fox", > "parsedquery":"+() ()", > "parsedquery_toString":"+() ()", > "explain":{}, > "QParser":"DisMaxQParser", > "altquerystring":null, > "boostfuncs":null, > "filter_queries":["atomId:(8235 100000914 100000911 )"], > "parsed_filter_queries":["atomId:8235 atomId:100000914 atomId:100000911"], > "timing":{ ...... > > Does anyone know what I could be doing wrong here, is it a bug in the debug > output, a stupid mistake misconception or piece of idiocy on my part or > something else. > > > Many thanks > > -- Greg Bowyer > > > -- Robert Muir rcm...@gmail.com