Re: Document match with no highlight

Phong Dais Thu, 12 May 2011 11:06:17 -0700

Hi,

I read the link provided and I'll need some time to digest what it is
saying.


Here's my "text" fieldtype.

<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimeterFilterFactory" generateWordParts="1"
generateNumberParts="1"
      catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimeterFilterFactory" generateWordParts="1"
generateNumberParts="1"
      catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
  </analyzer>
<fieldtype>
Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
"0176 R3 1.5 TO "

Searching for "3 1 15" returns a match with "empty" highlight.
Searching for "3 1 15"~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE <pierre.go...@arisem.com>wrote:

> > Since you're using the standard "text" field, this should NOT be you're
> case.
>
> Sorry, for the missing NOT in previous phrase. You should have the same
> issue given what you said, but still, it sound very similar.
>
> Are you sure your fieldtype "text" has nothing special ? a tokenizer or
> filter that could add some token in your indexed text but not in your query,
> like for example a WordDelimiter present in <index> and not <query> ?
>
> Pierre
>
> -----Message d'origine-----
> De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
> Envoyé : jeudi 12 mai 2011 18:21
> À : solr-user@lucene.apache.org
> Objet : RE: Document match with no highlight
>
> > In fact if I did "3 1 15"~1 I do get snipet also.
>
> Strange, I had a very similar problem, but with overlapping tokens. Since
> you're using the standard "text" field, this should be you're case.
>
> Maybe you could have a look at this issue, since it sound very familiar to
> me :
> https://issues.apache.org/jira/browse/LUCENE-3087
>
> Pierre
>
> -----Message d'origine-----
> De : Phong Dais [mailto:phong.gd...@gmail.com]
> Envoyé : jeudi 12 mai 2011 17:26
> À : solr-user@lucene.apache.org
> Objet : Re: Document match with no highlight
>
> Hi,
>
> <field name="DOC_TEXT" type="text" indexed="true" stored="true"/>
>
> The type "text" is the default one that came with the default solr 1.4
> install w.o any modifications.
>
> If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I do
> get snipet also.
>
> Hope that helps.
>
> P.
>
> On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
>
> >  > URL:
> > >
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> > >
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> > >
> > > XML:
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <response>
> > >   <lst name="responseHeader">
> > >     <int name="status">0</int>
> > >     <int name="QTime">19</int>
> > >     <lst name="params">
> > >       <str name="explainOther"/>
> > >       <str
> > > name="indent">on</str>
> > >       <str
> > > name="hl.fl">DOC_TEXT</str>
> > >       <str
> > > name="wt">standard</str>
> > >       <str
> > > name="hl.maxAnalyzedChars">-1</str>
> > >       <str name="hl">on</str>
> > >       <str name="rows">10</str>
> > >       <str
> > > name="version">2.2</str>
> > >       <str
> > > name="debugQuery">on</str>
> > >       <str
> > > name="fl">DOC_TEXT,score</str>
> > >       <str name="start">0</str>
> > >       <str name="q">DOC_TEXT:"3 1
> > > 15"</str>
> > >       <str
> > > name="qt">standard</str>
> > >       <str name="fq"/>
> > >     </lst>
> > >   </lst>
> > >   <result name="response" numFound='1" start="0"
> > > maxScore="0.035959315">
> > >     <doc>
> > >       <float
> > > name="score">0.035959315</float>
> > >       <arr name="DOC_TEXT"><str>
> > > ... </str></arr>
> > >     <doc>
> > >   </result>
> > >   <lst name="highlighting">
> > >     <lst name="123456"/>
> > >   </lst>
> > >   <lst name="debug">
> > >     <str name="rawquerystring">DOC_TEXT:"3
> > > 1 15"</str>
> > >     <str name="querystring">DOC_TEXT:"3 1
> > > 15"</str>
> > >     <str
> > > name="parsedquery">PhraseQuery(DOC_TEXT:"3 1
> > > 15)"</str>
> > >     <str
> > > name="parsedquery_toString">DOC_TEXT:"3 1
> > > 15"</str>
> > >     <lst name="explain">
> > >       <str name="123456">
> > >         0.035959315 =
> > > fieldWeight(DOC_TEXT:"3 1 15" in 0), product of: 1.0 =
> > > tf(phraseFreq=1.0)
> > >         0.92055845 = idf(DOC_TEXT: 3=1
> > > 1=1 15=1)
> > >         0.0390625 =
> > > fieldNorm(field=DOC_TEXT, doc=0)
> > >     </str>
> > >   </lst>
> > >   <str name="QParser">LuceneQParser</str>
> > >   <arr name="filter_queries">
> > >     <str/>
> > >   </arr>
> > >   <arr name="parsed_filter_queries"/>
> > >   <lst name="timing">
> > >     ...
> > >   </lst>
> > > </response>
> >
> >
> > Nothing looks suspicious.
> >
> > Can you provide two things more;
> > fieldType of DOC_TEXT
> > and
> > field definition of DOC_TEXT.
> >
> > Also do you get snippet from the same doc, when you remove quotes from
> your
> > query?
> >
> >
>

Re: Document match with no highlight

Reply via email to