Zip up all your configs 

Bill Bell
Sent from mobile


On Sep 8, 2013, at 3:00 PM, "Eric O'Hanlon" <elo2...@columbia.edu> wrote:

> Hi again Everyone,
> 
> I didn't get any replies to this, so I thought I'd re-send in case anyone 
> missed it and has any thoughts.
> 
> Thanks,
> Eric
> 
> On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon <elo2...@columbia.edu> wrote:
> 
>> Hi Everyone,
>> 
>> I'm facing an issue in which my solr query is returning highlighted snippets 
>> for some, but not all results.  For reference, I'm searching through an 
>> index that contains web crawls of human-rights-related websites.  I'm 
>> running solr as a webapp under Tomcat and I've included the query's solr 
>> params from the Tomcat log:
>> 
>> ...
>> webapp=/solr-4.2
>> path=/select
>> params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.mimetype_code.facet.limit=7&hl.simple.pre=<code>&q.alt=*:*&f.organization_type__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of_capture_yyyy.facet.limit=6&group.field=original_url&hl.simple.post=</code>&facet.field=domain&facet.field=date_of_capture_yyyy&facet.field=mimetype_code&facet.field=geographic_focus__facet&facet.field=organization_based_in__facet&facet.field=organization_type__facet&facet.field=language__facet&facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.facet.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=original_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&rows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.facet.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true}
>>  hits=8 status=0 QTime=108
>> ...
>> 
>> For the query above (which can be simplified to say: find all documents that 
>> contain the word "unangan" and return facets, highlights, etc.), I get five 
>> search results.  Only three of these are returning highlighted snippets.  
>> Here's the "highlighting" portion of the solr response (note: printed in 
>> ruby notation because I'm receiving this response in a Rails app):
>> 
>> --------
>> "highlighting"=>
>> {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  
>> "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  
>> "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>   {"contents"=>
>>     ["...actual snippet is returned here..."]},
>>  "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>   {"contents"=>
>>     ["...actual snippet is returned here..."]},
>>  
>> "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999"=>
>>   {"contents"=>
>>     ["...actual snippet is returned here..."]},
>>  
>> "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=component&format=raw"=>
>>   {"contents"=>
>>     ["...actual snippet is returned here..."]},
>>  
>> "20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf"=>
>>   {}}
>> --------
>> 
>> I have eight (as opposed to five) results above because I'm also doing a 
>> grouped query, grouping by a field called "original_url", and this leads to 
>> five grouped results.
>> 
>> I've confirmed that my highlight-lacking results DO contain the word 
>> "unangan", as expected, and this term is appearing in a text field that's 
>> indexed and stored, and being searched for all text searches.  For example, 
>> one of the search results is for a crawl of this document: 
>> http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf
>> 
>> And if you view that document on the web, you'll see that it does contain 
>> "unangan".
>> 
>> Has anyone seen this before?  And does anyone have any good suggestions for 
>> troubleshooting/fixing the problem?
>> 
>> Thanks!
>> 
>> - Eric
> 

Reply via email to