: I am working with a customer who needs to be able to query various : account/customer ID fields which may or may not have embedded dashes. : But they want to be able to search by entering the dashes or not and by : entering partial values or not. : : So we may have an account or customer ID like : : 1234-56AB45 : : And they would like to retrieve this by searching for any of the following: : 1234-56AB45 (full string match) : 1234-56 (partial string match) : 123456AB45 (full string but no dashes) : 123456 (partial string no dashes)
To answer your lsat question first... : So perhaps I will just ask - how would you define a fieldType which : should ignore special characters like hyphens or underscores (or : anything non-alphanumeric) and works for full string or partial string : search? This is pretty much exactly what the "Word Delimiter Filter" was designed for, and i encourage you to play with it and it's various options and see what happens... https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#word-delimiter-graph-filter You've definitely need to enable som "non-default" options (like "catenateNumbers=true") to ensure that you'd get indexed terms like "123456" from input "1234-56AB45" Once thing that's not entirely clear from your question & input is how you define "partial string" ... for example: are you expecting a query of "12" to match your input document? because WDF won't help with that. : But the behavior I see is completely unexpected. Full string match works : fine on the customer's DEV environment but not in QA (which is running : the same version of SOLR) I garuntee you there is some difference between your DEV and QA environments. Either in terms of the documents in the index, or the schema THAT WAS USED WHEN INDEXING THE DOCS -- which might have been changed after the indexing happened, or the "current" schema being used when the queries are getting parsed, or the default request options in solrconfig.xml ... something is absolutely different. : Partial string match works for some ID fields but not others : A Partial string match when the user does not enter the dashes just never works I'm assuming these last 2 comments refer to behavior you see on *both* your DEV and QA instances? Depending on your definition of "partial string" (see the question i asked above) then I _think_ the analyzer you have should work -- at least for all the examples you've provided. The missing piece of information is *how* you are querying: what query parser you are using, what exactly the iput looks like; and also: the output: what does "never works" mean? ... does it match 0 docs? does it match docs you don't expect? seeing the exact request URLs you are trying, with "debug=true&echoParams=all" added, and the full output of those requests so we can see things like the header where we can confirm what default params might be getting added, and the query parrser debug info to doble check how your query is being parsed, and the "explain" info to see what docs that are matching (unexpectedly) are there. More tips on details that can be useful to include to "help us help you"... https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists -Hoss http://www.lucidworks.com/