Re: Query for exact part of sentence

Arkadi Colson Mon, 13 Feb 2012 05:05:38 -0800

It is still not working after reindexing. Below you can find the outputof the filed analysis. Any idea what can be wrong?


Index Analyzer

org.apache.solr.analysis.HTMLStripCharFilterFactory{luceneMatchVersion=LUCENE_35}

text    "123 456"

org.apache.solr.analysis.KeywordTokenizerFactory{luceneMatchVersion=LUCENE_35}

position    1
term text    "123 456"
startOffset    0
endOffset    9

org.apache.solr.analysis.StopFilterFactory{words=stopwords_en.txt,stopwords_du.txt, ignoreCase=true,enablePositionIncrements=true, luceneMatchVersion=LUCENE_35}

position    1
term text    "123 456"
startOffset    0
endOffset    9

org.apache.solr.analysis.WordDelimiterFilterFactory{splitOnCaseChange=1, generateNumberParts=1, catenateWords=1,luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0,catenateNumbers=1}

position    1    2
term text    123    456
123456
startOffset    1    5
1
endOffset    4    8
8
type    word    word
word



On 01/31/2012 06:20 PM, zarni aung wrote:

Did you rebuild the index?  That would help since the index analyzer has
been changed.

On Tue, Jan 31, 2012 at 9:53 AM, Arkadi Colson<ark...@smartbit.be>  wrote:

The text field in the schema configuration looks like this. I changed
catenateNumbers to 0 but it still doesn't work as aspected.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.**WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.**SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
        -->
<!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->

<filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="**true"
                />
<filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_du.txt"
                enablePositionIncrements="**true"
                />
<filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.**LowerCaseFilterFactory"/>
<filter class="solr.**SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.**SnowballPorterFilterFactory" language="Dutch" />
<filter class="solr.**NGramFilterFactory" minGramSize="3"
maxGramSize="15"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.**WhitespaceTokenizerFactory"/>
<filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="**true"
                />
<filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_du.txt"
                enablePositionIncrements="**true"
                />
<filter class="solr.**WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.**LowerCaseFilterFactory"/>
<filter class="solr.**SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
<filter class="solr.**SnowballPorterFilterFactory" language="Dutch" />
</analyzer>
</fieldType>



On 01/31/2012 03:03 PM, Erick Erickson wrote:

Unless you provide your schema configuration, there's
not much to go on here. Two things though:

1>   look at the admin/analysis page to see how your
      data is broken up into tokens.
2>   at a guess you have WordDelimiterFilterFactory
      in your chain and perhaps catenateNumbers="1"

Best
Erick

On Mon, Jan 30, 2012 at 3:21 AM, Arkadi Colson<ark...@smartbit.be>
  wrote:

Hi

I'm using the pecl PHP class to query SOLR and was wondering how to query
for a part of a sentence exactly.

There are 2 data items index in SOLR
1327497476: 123 456 789
1327497521. 1234 5678 9011

However when running the query, both data items are returned as you can
see
below. Any idea why?

Thanks!

SolrObject Object
(
    [responseHeader] =>     SolrObject Object
        (
            [status] =>     0
            [QTime] =>     5016
            [params] =>     SolrObject Object
                (
                    [debugQuery] =>     true
                    [shards] =>
  solr01:8983/solr,solr02:8983/**solr,solr03:8983/solr
                    [fl] =>
  id,smsc_module,smsc_ssid,smsc_**description,smsc_content,smsc_**
courseid,smsc_date_created,**smsc_date_edited,score,**
metadata_stream_size,metadata_**stream_source_info,metadata_**
stream_name,metadata_stream_**content_type,last_modified,**
author,title,subject
                    [sort] =>     smsc_date_created asc
                    [indent] =>     on
                    [start] =>     0
                    [q] =>     (smsc_content:\"123 456\" ||
smsc_description:\"123 456\")&&     (smsc_module:Intradesk)&&
  (smsc_date_created:[2011-12-**25T10:29:51Z TO NOW])&&     (smsc_ssid:38)
                    [distrib] =>     true
                    [wt] =>     xml
                    [version] =>     2.2
                    [rows] =>     55
                )

        )

    [response] =>     SolrObject Object
        (
            [numFound] =>     2
            [start] =>     0
            [docs] =>     Array
                (
                    [0] =>     SolrObject Object
                        (
                            [smsc_module] =>     Intradesk
                            [smsc_ssid] =>     38
                            [id] =>     1327497476
                            [smsc_courseid] =>     0
                            [smsc_date_created] =>     2011-12-25T10:29:51Z
                            [smsc_date_edited] =>     2011-12-25T10:29:51Z
                            [score] =>     10.028017
                        )

                    [1] =>     SolrObject Object
                        (
                            [smsc_module] =>     Intradesk
                            [smsc_ssid] =>     38
                            [id] =>     1327497521
                            [smsc_courseid] =>     0
                            [smsc_date_created] =>     2011-12-25T10:29:51Z
                            [smsc_date_edited] =>     2011-12-25T10:29:51Z
                            [score] =>     5.541335
                        )

                )

        )
    [debug] =>     SolrObject Object
        (
            [rawquerystring] =>     (smsc_content:\"123 456\" ||
smsc_description:\"123 456\")&&     (smsc_module:Intradesk)&&
  (smsc_date_created:[2011-12-**25T10:29:51Z TO NOW])&&     (smsc_ssid:38)
            [querystring] =>     (smsc_content:\"123 456\" ||
smsc_description:\"123 456\")&&     (smsc_module:Intradesk)&&
  (smsc_date_created:[2011-12-**25T10:29:51Z TO NOW])&&     (smsc_ssid:38)
            [parsedquery] =>     +(smsc_content:123 smsc_content:456
smsc_description:123 smsc_content:456) +smsc_module:intradesk
+smsc_date_created:[2011-12-**25T10:29:51Z TO 2012-01-25T13:33:21.098Z]
+smsc_ssid:38
            [parsedquery_toString] =>     +(smsc_content:123
smsc_content:456
smsc_description:123 smsc_content:456) +smsc_module:intradesk
+smsc_date_created:[2011-12-**25T10:29:51 TO 2012-01-25T13:33:21.098]
+smsc_ssid:`#8;#0;#0;#0;&
            [QParser] =>     LuceneQParser
            [timing] =>     SolrObject Object

--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be


--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be

Re: Query for exact part of sentence

Reply via email to