Re: solr finds allways all documents

robert rottermann Tue, 21 Aug 2012 08:43:17 -0700

Thanks Jack,

On 08/20/2012 06:41 PM, Jack Krupansky wrote:

How are you ingesting the offic documents? SolrCell, or some othermethod?

I am using pytika, a python module that uses Tika to extract the content.
I then add it using a python tool called sunburnt.

Do you have CopyFields?

Yes I have a copy field like this:
    <copyField source="fulltext" dest="text"/>

What fields are you querying on?

on fulltext


What does your "text" field type look like?

<fieldType name="text" class="solr.TextField"positionIncrementGap="100">

      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time

<filter class="solr.SynonymFilterFactory"synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>

        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />

<filter class="solr.WordDelimiterFilterFactory"generateWordParts="1" generateNumberParts="1" catenateWords="1"catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.SnowballPorterFilterFactory"language="English" protected="protwords.txt"/>

      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />

<filter class="solr.WordDelimiterFilterFactory"generateWordParts="1" generateNumberParts="1" catenateWords="0"catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.SnowballPorterFilterFactory"language="English" protected="protwords.txt"/>

      </analyzer>
    </fieldType>


thanks again
robert

-- Jack Krupansky
-----Original Message----- From: robert rottermann
Sent: Monday, August 20, 2012 10:39 AM
To: solr-user@lucene.apache.org
Cc: robert rottermann
Subject: solr finds allways all documents

Hi there,
I am new to solr et all. Besides I am a  java noob.

What I am doing:
I want to do full text retrival on office documents. The metadata of
these documents are maintained in Postgesql.
So the only intormation I need to get out of solr is a documet ID.

My problem no is, that my index seem to be done badly.
(nearly) What ever I look up, returns all documents.

I would be very glad, if somebody could give me an idea what I shoulchange.


thanks
Robert


What I am using is the sample configuration that comes with solr 3.6.
I removed all the fields and added the following:

<fields>

    <field name="docid" type="string" indexed="true" stored="true"
required="true"/>
    <field name="docnum" type="text" indexed="true" stored="true"
required="false"/>
    <field name="titel" type="text" indexed="true" stored="true"
required="false"/>
    <field name="fsname" type="text" indexed="true" stored="true"
required="false"/>
    <field name="directory" type="text" indexed="true" stored="true"
required="false"/>
    <field name="fulltext" type="text" indexed="true" stored="false"
required="false"/>
    <dynamicField name="*" type="ignored" />
</fields>
<!-- Field to use to determine and enforce document uniqueness.
    Unless this field is marked with required="false", it will be a
required field
-->
<uniqueKey>docid</uniqueKey>

Re: solr finds allways all documents

Reply via email to