match string fields with embedded hyphens

Teresa McMains Fri, 03 Apr 2020 12:41:07 -0700

Forgive me if this is unclear, I am very much new here.

I am working with a customer who needs to be able to query various 
account/customer ID fields which may or may not have embedded dashes.  But they 
want to be able to search by entering the dashes or not and by entering partial 
values or not.


So we may have an account or customer ID like

1234-56AB45

And they would like to retrieve this by searching for any of the following:
1234-56AB45     (full string match)
1234-56                (partial string match)
123456AB45        (full string but no dashes)
123456                  (partial string no dashes)

I've defined this field type in schema.xml as:


<!-- String replace field for account number searches -->

<fieldType name="TrimmedString" class="solr.TextField" omitNorms="true">

<analyzer>

  <tokenizer class="solr.KeywordTokenizerFactory" />


  <!-- Normalizes token text to upper case -->

  <filter class="solr.UpperCaseFilterFactory" />

  <!-- Removes anything that isn't a letter or digit -->

  <filter class="solr.PatternReplaceFilterFactory" pattern="[^A-Za-z0-9]" 
replacement="" replace="all"/>



</analyzer>

</fieldType>

But the behavior I see is completely unexpected.
Full string match works fine on the customer's DEV environment but not in QA 
(which is running the same version of SOLR)
Partial string match works for some ID fields but not others
A Partial string match when the user does not enter the dashes just never works

I don't even know where to begin.  The behavior is not consistent enough to 
give me a sense.

So perhaps I will just ask - how would you define a fieldType which should 
ignore special characters like hyphens or underscores (or anything 
non-alphanumeric) and works for full string or partial string search?

Thank you.

match string fields with embedded hyphens

Reply via email to