Thanks Jack for the explanation. But lets say if my requirement needs me to return all occurrences of the search term along with the text snippet around them for each document under the search scope, how do we go about achieving that with Solr?
Thanks & Regards, Soumya. -----Original Message----- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: 29 January 2013 08:00 PM To: solr-user@lucene.apache.org Subject: Re: Issue with mutiple records in full text search The number of "hits" of a term in a Solr document impacts the score, but still only counts as one "hit" in the numFound count. Solr doesn't track "hits" for individual term occurrences, except that you could check the "term frequency" of a specific term in a specific document if you wanted, using a function query - tf(field,term) - which can also be included in the &fl field list. To be clear - Solr has no concept of "records", just documents and fields. -- Jack Krupansky -----Original Message----- From: Soumyanayan Kar Sent: Tuesday, January 29, 2013 9:01 AM To: solr-user@lucene.apache.org Subject: Issue with mutiple records in full text search Hi, We are trying to use solr for a text based search solution in a web application. The documents that are getting indexed are essentially text based files like *.txt, *.pdf, etc. We are using the Tika extraction plugin to extract the text content from the files and storing it using a "text_general" type field in the solr schema file. Relevant part of the schema file: <field name="CaseId" type="long" indexed="true" stored="true" required="true"/> <field name="CaseTitle" type="string" indexed="false" stored="true" required="true"/> <field name="CaseNumber" type="string" indexed="false" stored="true" required="true"/> <field name="MediaType" type="int" indexed="true" stored="true" required="true"/> <field name="MediaId" type="string" indexed="true" stored="true" required="true"/> <field name="CaptionName" type="string" indexed="false" stored="true" required="true"/> <field name="MediaPath" type="string" indexed="false" stored="true" required="true"/> <field name="MimeType" type="string" indexed="false" stored="true" required="false"/> <field name="DocumentNumber" type="string" indexed="false" stored="true" required="false"/> <field name="DeponentFullName" type="string" indexed="false" stored="true" required="false"/> <field name="DepositionDate" type="date" indexed="false" stored="true" required="false"/> <field name="DocCreatedDate" type="date" indexed="false" stored="true" required="false"/> <field name="DocModifiedDate" type="date" indexed="false" stored="true" required="false"/> <field name="Content" type="text_general" indexed="false" stored="true" required="true"/> <field name="WorkgroupIdList" type="text_general" indexed="true" stored="true" required="true" multiValued="true"/> <field name="ContentSearch" type="text_general" indexed="true" stored="false" multiValued="true"/> <field name="_version_" type="long" indexed="true" stored="true"/> <uniqueKey>MediaId</uniqueKey> <copyField source="Content" dest="ContentSearch"/> We are using a .net based solution and using the solrnet client to communicate with Solr. The content field is supposed to store the text content of the file and the ContentSearch field will be used for executing the search. While the documents are getting indexed properly, while executing search we are getting only the first occurrence of the search term returned for each document. For example, if we have a.txt and b.pdf which are indexed, and the search term "case" exists in both the documents multiple times(a.txt - 7 hits, b.pdf - 10 hits), when executing a search for "case" against both the documents, we are getting two records returned which are the first occurrences of the search term in the respective docs, while this should return 17 hits. Used Luke to test the index records but cannot find anything apparently wrong. Is this something to do with the type(text_general) of the search field or the way we are loading the entire content of the file into one index document? Soumya. Thanks & Regards, Soumya.