maxAnalyzedChars did it!  I wasn't setting that param, and I'm working with 
some very long documents.  I also made the hl.fl param formatting change that 
you suggested, Aloke.

Thanks again!

- Eric

On Sep 11, 2013, at 3:10 AM, Eric O'Hanlon <elo2...@columbia.edu> wrote:

> Thank you, Aloke and Bryan!  I'll give this a try and I'll report back on 
> what happens!
> 
> - Eric
> 
> On Sep 9, 2013, at 2:32 AM, Aloke Ghoshal <alghos...@gmail.com> wrote:
> 
>> Hi Eric,
>> 
>> As Bryan suggests, you should look at appropriately setting up the
>> fragSize & maxAnalyzedChars for long documents.
>> 
>> One issue I find with your search request is that in trying to
>> highlight across three separate fields, you have added each of them as
>> a separate request param:
>> hl.fl=contents&hl.fl=title&hl.fl=original_url
>> 
>> The way to do it would be
>> (http://wiki.apache.org/solr/HighlightingParameters#hl.fl) to pass
>> them as values to one comma (or space) separated field:
>> hl.fl=contents,title,original_url
>> 
>> Regards,
>> Aloke
>> 
>> On 9/9/13, Bryan Loofbourrow <bloofbour...@knowledgemosaic.com> wrote:
>>> Eric,
>>> 
>>> Your example document is quite long. Are you setting hl.maxAnalyzedChars?
>>> If you don't, the highlighter you appear to be using will not look past
>>> the first 51,200 characters of the document for snippet candidates.
>>> 
>>> http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars
>>> 
>>> -- Bryan
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Eric O'Hanlon [mailto:elo2...@columbia.edu]
>>>> Sent: Sunday, September 08, 2013 2:01 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Some highlighted snippets aren't being returned
>>>> 
>>>> Hi again Everyone,
>>>> 
>>>> I didn't get any replies to this, so I thought I'd re-send in case
>>> anyone
>>>> missed it and has any thoughts.
>>>> 
>>>> Thanks,
>>>> Eric
>>>> 
>>>> On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon <elo2...@columbia.edu> wrote:
>>>> 
>>>>> Hi Everyone,
>>>>> 
>>>>> I'm facing an issue in which my solr query is returning highlighted
>>>> snippets for some, but not all results.  For reference, I'm searching
>>>> through an index that contains web crawls of human-rights-related
>>>> websites.  I'm running solr as a webapp under Tomcat and I've included
>>> the
>>>> query's solr params from the Tomcat log:
>>>>> 
>>>>> ...
>>>>> webapp=/solr-4.2
>>>>> path=/select
>>>>> 
>>>> 
>>> params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.m
>>>> 
>>> imetype_code.facet.limit=7&hl.simple.pre=<code>&q.alt=*:*&f.organization_t
>>>> 
>>> ype__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of
>>>> 
>>> _capture_yyyy.facet.limit=6&group.field=original_url&hl.simple.post=</code
>>>> 
>>>> &facet.field=domain&facet.field=date_of_capture_yyyy&facet.field=mimetype
>>>> 
>>> _code&facet.field=geographic_focus__facet&facet.field=organization_based_i
>>>> 
>>> n__facet&facet.field=organization_type__facet&facet.field=language__facet&
>>>> 
>>> facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.face
>>>> 
>>> t.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=orig
>>>> 
>>> inal_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&r
>>>> 
>>> ows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.fac
>>>> et.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true} hits=8
>>>> status=0 QTime=108
>>>>> ...
>>>>> 
>>>>> For the query above (which can be simplified to say: find all
>>> documents
>>>> that contain the word "unangan" and return facets, highlights, etc.), I
>>>> get five search results.  Only three of these are returning highlighted
>>>> snippets.  Here's the "highlighting" portion of the solr response (note:
>>>> printed in ruby notation because I'm receiving this response in a Rails
>>>> app):
>>>>> 
>>>>> --------
>>>>> "highlighting"=>
>>>>> 
>>>> 
>>> {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%
>>>> 202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>>>>  {},
>>>>> 
>>>> 
>>> "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2
>>>> 02002%20tentang%20Perlindungan%20Anak.pdf"=>
>>>>>  {},
>>>>> 
>>>> 
>>> "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2
>>>> 02002%20tentang%20Perlindungan%20Anak.pdf"=>
>>>>>  {},
>>>>> "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>>>>  {"contents"=>
>>>>>    ["...actual snippet is returned here..."]},
>>>>> "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>>>>  {"contents"=>
>>>>>    ["...actual snippet is returned here..."]},
>>>>> "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-
>>>> uu-no-39-tahun-1999"=>
>>>>>  {"contents"=>
>>>>>    ["...actual snippet is returned here..."]},
>>>>> 
>>> "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-
>>>> 39-tahun-1999?tmpl=component&format=raw"=>
>>>>>  {"contents"=>
>>>>>    ["...actual snippet is returned here..."]},
>>>>> 
>>>> 
>>> "20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_U
>>>> timut_heritage.pdf"=>
>>>>>  {}}
>>>>> --------
>>>>> 
>>>>> I have eight (as opposed to five) results above because I'm also doing
>>> a
>>>> grouped query, grouping by a field called "original_url", and this leads
>>>> to five grouped results.
>>>>> 
>>>>> I've confirmed that my highlight-lacking results DO contain the word
>>>> "unangan", as expected, and this term is appearing in a text field
>>> that's
>>>> indexed and stored, and being searched for all text searches.  For
>>>> example, one of the search results is for a crawl of this document:
>>>> 
>>> http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.p
>>>> df
>>>>> 
>>>>> And if you view that document on the web, you'll see that it does
>>>> contain "unangan".
>>>>> 
>>>>> Has anyone seen this before?  And does anyone have any good
>>> suggestions
>>>> for troubleshooting/fixing the problem?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> - Eric
>>> 
>> 
> 
> 

Reply via email to