Thank you, Aloke and Bryan!  I'll give this a try and I'll report back on what 
happens!

- Eric

On Sep 9, 2013, at 2:32 AM, Aloke Ghoshal <alghos...@gmail.com> wrote:

> Hi Eric,
> 
> As Bryan suggests, you should look at appropriately setting up the
> fragSize & maxAnalyzedChars for long documents.
> 
> One issue I find with your search request is that in trying to
> highlight across three separate fields, you have added each of them as
> a separate request param:
> hl.fl=contents&hl.fl=title&hl.fl=original_url
> 
> The way to do it would be
> (http://wiki.apache.org/solr/HighlightingParameters#hl.fl) to pass
> them as values to one comma (or space) separated field:
> hl.fl=contents,title,original_url
> 
> Regards,
> Aloke
> 
> On 9/9/13, Bryan Loofbourrow <bloofbour...@knowledgemosaic.com> wrote:
>> Eric,
>> 
>> Your example document is quite long. Are you setting hl.maxAnalyzedChars?
>> If you don't, the highlighter you appear to be using will not look past
>> the first 51,200 characters of the document for snippet candidates.
>> 
>> http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars
>> 
>> -- Bryan
>> 
>> 
>>> -----Original Message-----
>>> From: Eric O'Hanlon [mailto:elo2...@columbia.edu]
>>> Sent: Sunday, September 08, 2013 2:01 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Some highlighted snippets aren't being returned
>>> 
>>> Hi again Everyone,
>>> 
>>> I didn't get any replies to this, so I thought I'd re-send in case
>> anyone
>>> missed it and has any thoughts.
>>> 
>>> Thanks,
>>> Eric
>>> 
>>> On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon <elo2...@columbia.edu> wrote:
>>> 
>>>> Hi Everyone,
>>>> 
>>>> I'm facing an issue in which my solr query is returning highlighted
>>> snippets for some, but not all results.  For reference, I'm searching
>>> through an index that contains web crawls of human-rights-related
>>> websites.  I'm running solr as a webapp under Tomcat and I've included
>> the
>>> query's solr params from the Tomcat log:
>>>> 
>>>> ...
>>>> webapp=/solr-4.2
>>>> path=/select
>>>> 
>>> 
>> params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.m
>>> 
>> imetype_code.facet.limit=7&hl.simple.pre=<code>&q.alt=*:*&f.organization_t
>>> 
>> ype__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of
>>> 
>> _capture_yyyy.facet.limit=6&group.field=original_url&hl.simple.post=</code
>>> 
>>> &facet.field=domain&facet.field=date_of_capture_yyyy&facet.field=mimetype
>>> 
>> _code&facet.field=geographic_focus__facet&facet.field=organization_based_i
>>> 
>> n__facet&facet.field=organization_type__facet&facet.field=language__facet&
>>> 
>> facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.face
>>> 
>> t.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=orig
>>> 
>> inal_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&r
>>> 
>> ows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.fac
>>> et.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true} hits=8
>>> status=0 QTime=108
>>>> ...
>>>> 
>>>> For the query above (which can be simplified to say: find all
>> documents
>>> that contain the word "unangan" and return facets, highlights, etc.), I
>>> get five search results.  Only three of these are returning highlighted
>>> snippets.  Here's the "highlighting" portion of the solr response (note:
>>> printed in ruby notation because I'm receiving this response in a Rails
>>> app):
>>>> 
>>>> --------
>>>> "highlighting"=>
>>>> 
>>> 
>> {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%
>>> 202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>>>   {},
>>>> 
>>> 
>> "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2
>>> 02002%20tentang%20Perlindungan%20Anak.pdf"=>
>>>>   {},
>>>> 
>>> 
>> "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2
>>> 02002%20tentang%20Perlindungan%20Anak.pdf"=>
>>>>   {},
>>>>  "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>>>   {"contents"=>
>>>>     ["...actual snippet is returned here..."]},
>>>>  "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>>>   {"contents"=>
>>>>     ["...actual snippet is returned here..."]},
>>>>  "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-
>>> uu-no-39-tahun-1999"=>
>>>>   {"contents"=>
>>>>     ["...actual snippet is returned here..."]},
>>>> 
>> "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-
>>> 39-tahun-1999?tmpl=component&format=raw"=>
>>>>   {"contents"=>
>>>>     ["...actual snippet is returned here..."]},
>>>> 
>>> 
>> "20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_U
>>> timut_heritage.pdf"=>
>>>>   {}}
>>>> --------
>>>> 
>>>> I have eight (as opposed to five) results above because I'm also doing
>> a
>>> grouped query, grouping by a field called "original_url", and this leads
>>> to five grouped results.
>>>> 
>>>> I've confirmed that my highlight-lacking results DO contain the word
>>> "unangan", as expected, and this term is appearing in a text field
>> that's
>>> indexed and stored, and being searched for all text searches.  For
>>> example, one of the search results is for a crawl of this document:
>>> 
>> http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.p
>>> df
>>>> 
>>>> And if you view that document on the web, you'll see that it does
>>> contain "unangan".
>>>> 
>>>> Has anyone seen this before?  And does anyone have any good
>> suggestions
>>> for troubleshooting/fixing the problem?
>>>> 
>>>> Thanks!
>>>> 
>>>> - Eric
>> 
> 

Reply via email to