Re: solr finds allways all documents

Erick Erickson Tue, 21 Aug 2012 09:03:25 -0700

OK, one other piece of information that would help a lot (or maybe
lead you to the
answer). Attach &debugQuery=on to the URL and look at the debug information,
particularly the parsed query down below. I'm going to guess that
you're searching
on something that is actually found in nearly all your documents, you just
don't realize it <G>..... Look for the xml section like:
<lst name="debug">
the parsed form of the query should be immediately below that.
The <lst name="explain"> section can probably be ignored for now.



Of course I've often been waaaay off base before...

Best
Erick

On Tue, Aug 21, 2012 at 11:42 AM, robert rottermann
<robert.rotterm...@gmx.ch> wrote:
> Thanks Jack,
>
>
> On 08/20/2012 06:41 PM, Jack Krupansky wrote:
>>
>> How are you ingesting the offic documents? SolrCell, or some other method?
>>
> I am using pytika, a python module that uses Tika to extract the content.
> I then add it using a python tool called sunburnt.
>>
>> Do you have CopyFields?
>
> Yes I have a copy field like this:
>     <copyField source="fulltext" dest="text"/>
>
>
>
>> What fields are you querying on?
>
> on fulltext
>
>>
>> What does your "text" field type look like?
>>
>     <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>           add enablePositionIncrements=true in both the index and query
>           analyzers to leave a 'gap' for more accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>       </analyzer>
>     </fieldType>
>
>
> thanks again
> robert
>
>> -- Jack Krupansky
>> -----Original Message----- From: robert rottermann
>> Sent: Monday, August 20, 2012 10:39 AM
>> To: solr-user@lucene.apache.org
>> Cc: robert rottermann
>> Subject: solr finds allways all documents
>>
>> Hi there,
>> I am new to solr et all. Besides I am a  java noob.
>>
>> What I am doing:
>> I want to do full text retrival on office documents. The metadata of
>> these documents are maintained in Postgesql.
>> So the only intormation I need to get out of solr is a documet ID.
>>
>> My problem no is, that my index seem to be done badly.
>> (nearly) What ever I look up, returns all documents.
>>
>> I would be very glad, if somebody could give me an idea what I shoul
>> change.
>>
>> thanks
>> Robert
>>
>>
>> What I am using is the sample configuration that comes with solr 3.6.
>> I removed all the fields and added the following:
>>
>> <fields>
>>
>>     <field name="docid" type="string" indexed="true" stored="true"
>> required="true"/>
>>     <field name="docnum" type="text" indexed="true" stored="true"
>> required="false"/>
>>     <field name="titel" type="text" indexed="true" stored="true"
>> required="false"/>
>>     <field name="fsname" type="text" indexed="true" stored="true"
>> required="false"/>
>>     <field name="directory" type="text" indexed="true" stored="true"
>> required="false"/>
>>     <field name="fulltext" type="text" indexed="true" stored="false"
>> required="false"/>
>>     <dynamicField name="*" type="ignored" />
>> </fields>
>> <!-- Field to use to determine and enforce document uniqueness.
>>     Unless this field is marked with required="false", it will be a
>> required field
>> -->
>> <uniqueKey>docid</uniqueKey>
>>
>>
>

Re: solr finds allways all documents

Reply via email to