Thanks Jack,

On 08/20/2012 06:41 PM, Jack Krupansky wrote:
How are you ingesting the offic documents? SolrCell, or some other method?

I am using pytika, a python module that uses Tika to extract the content.
I then add it using a python tool called sunburnt.
Do you have CopyFields?
Yes I have a copy field like this:
    <copyField source="fulltext" dest="text"/>


What fields are you querying on?
on fulltext

What does your "text" field type look like?

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>


thanks again
robert
-- Jack Krupansky
-----Original Message----- From: robert rottermann
Sent: Monday, August 20, 2012 10:39 AM
To: solr-user@lucene.apache.org
Cc: robert rottermann
Subject: solr finds allways all documents

Hi there,
I am new to solr et all. Besides I am a  java noob.

What I am doing:
I want to do full text retrival on office documents. The metadata of
these documents are maintained in Postgesql.
So the only intormation I need to get out of solr is a documet ID.

My problem no is, that my index seem to be done badly.
(nearly) What ever I look up, returns all documents.

I would be very glad, if somebody could give me an idea what I shoul change.

thanks
Robert


What I am using is the sample configuration that comes with solr 3.6.
I removed all the fields and added the following:

<fields>

    <field name="docid" type="string" indexed="true" stored="true"
required="true"/>
    <field name="docnum" type="text" indexed="true" stored="true"
required="false"/>
    <field name="titel" type="text" indexed="true" stored="true"
required="false"/>
    <field name="fsname" type="text" indexed="true" stored="true"
required="false"/>
    <field name="directory" type="text" indexed="true" stored="true"
required="false"/>
    <field name="fulltext" type="text" indexed="true" stored="false"
required="false"/>
    <dynamicField name="*" type="ignored" />
</fields>
<!-- Field to use to determine and enforce document uniqueness.
    Unless this field is marked with required="false", it will be a
required field
-->
<uniqueKey>docid</uniqueKey>



Reply via email to