Hi Eric, As Bryan suggests, you should look at appropriately setting up the fragSize & maxAnalyzedChars for long documents.
One issue I find with your search request is that in trying to highlight across three separate fields, you have added each of them as a separate request param: hl.fl=contents&hl.fl=title&hl.fl=original_url The way to do it would be (http://wiki.apache.org/solr/HighlightingParameters#hl.fl) to pass them as values to one comma (or space) separated field: hl.fl=contents,title,original_url Regards, Aloke On 9/9/13, Bryan Loofbourrow <bloofbour...@knowledgemosaic.com> wrote: > Eric, > > Your example document is quite long. Are you setting hl.maxAnalyzedChars? > If you don't, the highlighter you appear to be using will not look past > the first 51,200 characters of the document for snippet candidates. > > http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars > > -- Bryan > > >> -----Original Message----- >> From: Eric O'Hanlon [mailto:elo2...@columbia.edu] >> Sent: Sunday, September 08, 2013 2:01 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Some highlighted snippets aren't being returned >> >> Hi again Everyone, >> >> I didn't get any replies to this, so I thought I'd re-send in case > anyone >> missed it and has any thoughts. >> >> Thanks, >> Eric >> >> On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon <elo2...@columbia.edu> wrote: >> >> > Hi Everyone, >> > >> > I'm facing an issue in which my solr query is returning highlighted >> snippets for some, but not all results. For reference, I'm searching >> through an index that contains web crawls of human-rights-related >> websites. I'm running solr as a webapp under Tomcat and I've included > the >> query's solr params from the Tomcat log: >> > >> > ... >> > webapp=/solr-4.2 >> > path=/select >> > >> > params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.m >> > imetype_code.facet.limit=7&hl.simple.pre=<code>&q.alt=*:*&f.organization_t >> > ype__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of >> > _capture_yyyy.facet.limit=6&group.field=original_url&hl.simple.post=</code >> >>&facet.field=domain&facet.field=date_of_capture_yyyy&facet.field=mimetype >> > _code&facet.field=geographic_focus__facet&facet.field=organization_based_i >> > n__facet&facet.field=organization_type__facet&facet.field=language__facet& >> > facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.face >> > t.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=orig >> > inal_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&r >> > ows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.fac >> et.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true} hits=8 >> status=0 QTime=108 >> > ... >> > >> > For the query above (which can be simplified to say: find all > documents >> that contain the word "unangan" and return facets, highlights, etc.), I >> get five search results. Only three of these are returning highlighted >> snippets. Here's the "highlighting" portion of the solr response (note: >> printed in ruby notation because I'm receiving this response in a Rails >> app): >> > >> > -------- >> > "highlighting"=> >> > >> > {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun% >> 202002%20tentang%20Perlindungan%20Anak.pdf"=> >> > {}, >> > >> > "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2 >> 02002%20tentang%20Perlindungan%20Anak.pdf"=> >> > {}, >> > >> > "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%2 >> 02002%20tentang%20Perlindungan%20Anak.pdf"=> >> > {}, >> > "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=> >> > {"contents"=> >> > ["...actual snippet is returned here..."]}, >> > "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=> >> > {"contents"=> >> > ["...actual snippet is returned here..."]}, >> > "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2- >> uu-no-39-tahun-1999"=> >> > {"contents"=> >> > ["...actual snippet is returned here..."]}, >> > > "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no- >> 39-tahun-1999?tmpl=component&format=raw"=> >> > {"contents"=> >> > ["...actual snippet is returned here..."]}, >> > >> > "20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_U >> timut_heritage.pdf"=> >> > {}} >> > -------- >> > >> > I have eight (as opposed to five) results above because I'm also doing > a >> grouped query, grouping by a field called "original_url", and this leads >> to five grouped results. >> > >> > I've confirmed that my highlight-lacking results DO contain the word >> "unangan", as expected, and this term is appearing in a text field > that's >> indexed and stored, and being searched for all text searches. For >> example, one of the search results is for a crawl of this document: >> > http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.p >> df >> > >> > And if you view that document on the web, you'll see that it does >> contain "unangan". >> > >> > Has anyone seen this before? And does anyone have any good > suggestions >> for troubleshooting/fixing the problem? >> > >> > Thanks! >> > >> > - Eric >