Thanks Jack for the explanation.

But lets say if my requirement needs me to return all occurrences of the
search term along with the text snippet around them for each document under
the search scope, how do we go about achieving that with Solr?

Thanks & Regards,

Soumya.



-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 29 January 2013 08:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Issue with mutiple records in full text search

The number of "hits" of a term in a Solr document impacts the score, but
still only counts as one "hit" in the numFound count. Solr doesn't track
"hits" for individual term occurrences, except that you could check the
"term frequency" of a specific term in a specific document if you wanted,
using a function query - tf(field,term) - which can also be included in the
&fl field list.

To be clear - Solr has no concept of "records", just documents and fields.

-- Jack Krupansky

-----Original Message-----
From: Soumyanayan Kar
Sent: Tuesday, January 29, 2013 9:01 AM
To: solr-user@lucene.apache.org
Subject: Issue with mutiple records in full text search

Hi,



We are trying to use solr for a text based search solution in a web
application. The documents that are getting indexed are essentially text
based files like *.txt, *.pdf, etc. We are using the Tika extraction plugin
to extract the text content from the files and storing it using a
"text_general" type field in the solr schema file.  Relevant part of the
schema file:



<field name="CaseId" type="long" indexed="true" stored="true"
required="true"/>

                <field name="CaseTitle" type="string" indexed="false"
stored="true" required="true"/>

                <field name="CaseNumber" type="string" indexed="false"
stored="true" required="true"/>

                <field name="MediaType" type="int" indexed="true"
stored="true" required="true"/>

                <field name="MediaId" type="string" indexed="true"
stored="true" required="true"/>

                <field name="CaptionName" type="string" indexed="false"
stored="true" required="true"/>

                <field name="MediaPath" type="string" indexed="false"
stored="true" required="true"/>

                <field name="MimeType" type="string" indexed="false"
stored="true" required="false"/>

                <field name="DocumentNumber" type="string" indexed="false"
stored="true" required="false"/>

                <field name="DeponentFullName" type="string" indexed="false"
stored="true" required="false"/>

                <field name="DepositionDate" type="date" indexed="false"
stored="true" required="false"/>

                <field name="DocCreatedDate" type="date" indexed="false"
stored="true" required="false"/>

                <field name="DocModifiedDate" type="date" indexed="false"
stored="true" required="false"/>

                <field name="Content" type="text_general" indexed="false"
stored="true" required="true"/>

                <field name="WorkgroupIdList" type="text_general"
indexed="true" stored="true" required="true" multiValued="true"/>



                 <field name="ContentSearch" type="text_general"
indexed="true" stored="false" multiValued="true"/>

                <field name="_version_" type="long" indexed="true"
stored="true"/>



<uniqueKey>MediaId</uniqueKey>

<copyField source="Content" dest="ContentSearch"/>



We are using a .net based solution and using the solrnet client to
communicate with Solr.



The content field is supposed to store the text content of the file and the
ContentSearch field will be used for executing the search.

While the documents are getting indexed properly, while executing search we
are getting only the first occurrence of the search term returned for each
document.

For example, if we have a.txt and b.pdf which are indexed, and the search
term "case" exists in both the documents multiple times(a.txt - 7 hits,
b.pdf - 10 hits), when executing a search for "case" against both the
documents, we are getting two records returned which are the first
occurrences of the search term in the respective docs, while this should
return 17 hits.



Used Luke to test the index records but cannot find anything apparently
wrong.

Is this something to do with the type(text_general) of the search field or
the way we are loading the entire content of the file into one index
document?



Soumya.





Thanks & Regards,



Soumya.






Reply via email to