Re: Question about textTight

Yonik Seeley Tue, 28 Oct 2008 10:55:33 -0700

These query parsing results don't match with the config you've posted.
Double-check the type of the "name" field and that you have restarted
Solr since changing the schema.xml


-Yonik

On Tue, Oct 28, 2008 at 11:25 AM, Stephen Weiss <[EMAIL PROTECTED]> wrote:
> Thanks for the reply.  I've been looking at the debug page... and I really
> don't see any clues there (maybe I don't know how to read it).
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> <lst name="params">
>  <str name="wt">standard</str>
>  <str name="rows">10</str>
>
>  <str name="start">0</str>
>  <str name="explainOther"/>
>  <str name="hl.fl"/>
>  <str name="indent">on</str>
>  <str name="q">name:(stm 0810 m_*)</str>
>  <str name="fl">*,score</str>
>  <str name="qt">standard</str>
>
>  <str name="debugQuery">on</str>
>  <str name="version">2.2</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0" maxScore="0.0"/>
> <lst name="debug">
> <str name="rawquerystring">name:(stm 0810 m_*)</str>
> <str name="querystring">name:(stm 0810 m_*)</str>
>
> <str name="parsedquery">+name:stm +name:0810 +name:m_*</str>
> <str name="parsedquery_toString">+name:stm +name:0810 +name:m_*</str>
> <lst name="explain"/>
> </lst>
> </response>
>
> I mean, as far as I can tell, that seems right.  I think I'm missing
> something here.
>
> The wiki page is awesome though, thank you.  The catenateAll option does
> seem to do what I think it did... but should I perhaps just remove any kind
> of filter or analyzer on this field?  It's really not a big deal if someone
> has to get the dashes and underscores exactly right - it's a worse problem
> if they do get them right, but it still doesn't work (usually they copy and
> paste these from an e-mail or something).  Just in general, it's never
> really critical for someone to search by parts of the filename - except for
> searching with wildcard (that is, stm0810m_* and the like), and it would be
> a lot easier if they didn't have to put spaces where letters change to
> numbers & vice versa.
>
> Thanks again for your input.
>
> --
> Steve
>
> On Oct 28, 2008, at 10:49 AM, Feak, Todd wrote:
>
>> You may want to take a very close look at what the WordDelimiterFilter
>> is doing. I believe the underscore is dropped entirely during indexing
>> AND searching as it's not alphanumeric.
>>
>> Wiki doco here
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=(t
>> okenizer)#head-1c9b83870ca7890cd73b193cefed83c283339089
>>
>> The admin analysis page and query debug will help a lot to see what's
>> going on.
>>
>> -Todd
>>
>> -----Original Message-----
>> From: Stephen Weiss [mailto:[EMAIL PROTECTED]
>> Sent: Monday, October 27, 2008 10:32 PM
>> To: solr-user@lucene.apache.org
>> Subject: Question about textTight
>>
>> Hi,
>>
>> So I've been using the textTight field to hold filenames, and I've run
>> into a weird problem.  Basically, people want to search by part of a
>> filename (say, the filename is stm0810m_ws_001ftws and they want to
>> find everything starting with stm0810m_ (stm0810m_*).  I'm hoping
>> someone might have done this before (I bet someone has).
>>
>> Lots of things work - you can search for stm0810m_ws_001ftws and get a
>> result, or (stm 0810 m*), or various other combinations.  What does
>> not work, is searching for (stm0810m_*) or (stm 0810 m_*) or anything
>> like that - a problem, because often they don't want things with ma_
>> or mx_, but just m_.  It's almost like underscores just break
>> everything, escaping them does nothing.
>>
>> Here's the field definition (it should be what came with my solr):
>>
>>    <fieldType name="textTight" class="solr.TextField"
>> positionIncrementGap="100" >
>>      <analyzer>
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="0" generateNumberParts="0" catenateWords="1"
>> catenateNumbers="1" catenateAll="0"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>>
>> and usage:
>>
>>   <field name="name" type="textTight"
>>          indexed="true" stored="true" omitNorms="true"
>>          />
>>
>>
>> Now, I thought textTight would be good because it's the one best
>> suited for SKU's, but I guess I'm wrong.  What should I be using for
>> this?  Would changing any of these "generateWordParts" or
>> "catenateAll" options help?  I can't seem to find any documentation so
>> I'm really not sure what it would do, but reindexing this whole thing
>> will take quite some time so I'd rather know what will actually work
>> before I just start changing things.
>>
>> Thanks so much for any insight!
>>
>> --
>> Steve
>>
>
>

Re: Question about textTight

Reply via email to