Re: Only copy string up to certain character symbol?

Erick Erickson Fri, 31 Oct 2014 11:41:34 -0700

In addition to Alexandre's comment, your index chain looks suspect:

  <filter class="solr.EdgeNGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front" />
        <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(\/.+?$)" replacement=""/>


So the pattern replace stuff happens on the grams, not the full input. You might
be better off with a

solr.PatternReplaceCharFilterFactory

which works on the entire input string before even tokenization is done.

That said, Alexandre's comment is spot on. If your evidence for not respecting
the regex is that the document returns the whole input, it's because the
stored="true" stores the raw input and has nothing to do with the analysis
chain, the split to store the input happens before any kind of
analysis processing.

On Fri, Oct 31, 2014 at 9:33 AM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> copyField can copy only part of the string but it is defined by
> character count. If you want to use regular expressions, you may be
> better off to do the copy in the UpdateRequestProcessor chain instead:
> http://www.solr-start.com/info/update-request-processors/#RegexReplaceProcessorFactory
>
> What you are doing (RegEx in the chain) only affects "indexed"
> representation of the text. Not the stored content. I suspect that's
> not what you want.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 31 October 2014 11:49, hschillig <mouseywi...@live.com> wrote:
>> So I have a title field that is common to look like this:
>>
>> Personal legal forms simplified : the ultimate guide to personal legal forms
>> / Daniel Sitarz.
>>
>> I made a copyField that is of type "title_only". I want to ONLY copy the
>> text "Personal legal forms simplified : the ultimate guide to personal legal
>> forms".. so everything before the "/" symbol. I have it like this in my
>> schema.xml:
>>
>> <fieldType name="title_only" class="solr.TextField">
>>     <analyzer type="index">
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="4"
>> maxGramSize="15" side="front" />
>>         <charFilter class="solr.PatternReplaceCharFilterFactory"
>> pattern="(\/.+?$)" replacement=""/>
>>     </analyzer>
>>     <analyzer type="query">
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <charFilter class="solr.PatternReplaceCharFilterFactory"
>> pattern="(\/.+?$)" replacement=""/>
>>     </analyzer>
>> </fieldType>
>>
>> My regex seems to be off though as the field still holds the entire value
>> when I reindex and restart SolR. Thanks for any help!
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Only-copy-string-up-to-certain-character-symbol-tp4166857.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Only copy string up to certain character symbol?

Reply via email to