Re: Some text not indexed in solr4.4

Utkarsh Sengar Tue, 24 Sep 2013 13:55:28 -0700

WordDelimiterFilterFactory was the culprit. Removing that fixed the problem.



Thanks,
-Utkarsh


On Tue, Sep 24, 2013 at 12:17 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote:

> @Furkan Yes, I have run a commit, other text is searchable.
> Not sure what you mean there for MultiPhraseQuery. It is mentioned in
> context to SynonymFilterFactory, RemoveDuplicatesTokenFilterFactory and
> PositionFilterFactory. Which part are you referring to?
>
> @Jason I get this response (I have multi-core setup) by hitting this URL:
> http://SOLR_SERVER/solr/prodinfo/terms?terms.fl=text&terms.prefix=dc
>
> <response><lst name="responseHeader"><int name="status">0</int><int 
> name="QTime">0</int></lst><lst name="terms"><lst 
> name="text"/></lst></response>
>
> Not sure how can I infer this response. I get the same response for any
> prefix like: a, b, iph etc.
>
> My guess is this is happening due to WordDelimiterFilterFactory here:
> https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L16, what do
> you think? dc44 is somehow delimited during the query time?
> Example here says:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
> -> Split on letter-number transitions (can be turned off - see
> splitOnNumerics parameter) "SD500" -> "SD", "500"
>
> I will test it out and update this thread with my findings.
>
> Thanks,
> -Utkarsh
>
>
>
> On Tue, Sep 17, 2013 at 5:10 PM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
>
>> Utkarsh,
>>
>> Check to see if the value is actually indexed into the field by using the
>> Terms request handler:
>>
>> http://localhost:8983/solr/terms?terms.fl=text&terms.prefix=d
>>
>> (adjust the prefix to whatever you're looking for)
>>
>> This should get you going in the right direction.
>>
>> Jason
>>
>>
>> On Sep 17, 2013, at 2:20 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
>> wrote:
>>
>> > I have a copyField called allText with type text_general:
>> > https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68
>> >
>> > I have ~100 documents which have the text: dyson and dc44 or dc41 etc.
>> >
>> > For example:
>> > "title": "Dyson DC44 Animal Digital Slim Cordless Vacuum"
>> > "description": "The DC44 Animal is the new Dyson Digital Slim vacuum
>> > cleaner  the cordless machine that doesn’t lose suction. It has been
>> > engineered for floor to ceiling cleaning. DC44 Animal has a detachable
>> > long-reach wand  which is balanced for floor to ceiling cleaning.   The
>> > motorized floor tool has twice the power of the DC35 floor tool  to
>> drive
>> > the bristles deeper into the carpet pile with more force. It attaches to
>> > the wand or directly to the machine for cleaning awkward spaces. The
>> brush
>> > bar has carbon fiber filaments for removing fine dust from hard floors.
>> > DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
>> > Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
>> > manganese cobalt battery and Root Cyclone technology for constant
>>  powerful
>> > suction.",
>> > UPC: 0879957006362
>> >
>> > The documents are indexed.
>> >
>> > Analysis says its indexeD: http://i.imgur.com/O52ino1.png
>> > But when I search for allText:"dyson dc44" I get no results, response:
>> > http://pastie.org/8334220
>> >
>> > Any suggestions about the problem? I am out of ideas about how to debug
>> > this.
>> >
>> > --
>> > Thanks,
>> > -Utkarsh
>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh

Re: Some text not indexed in solr4.4

Reply via email to