Data storage, and textual analysis

Gora Mohanty Tue, 19 Jan 2010 10:41:44 -0800

Hi,

Another simple query. I have set up a field to hold phonetic
equivalents, with the relevant part of schema.xml looking like:
<analyzer>
 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
 <filter class="solr.WordDelimiterFilterFactory"
 generateWordParts="1" generateNumberParts="0" catenateWords="1"
 catenateNumbers="0" catenateAll="0"/>
 <filter class="solr.LowerCaseFilterFactory"/> <filter
 class="com.srijan.search.solr.analysis.AspellFilterFactory"/>
</analyzer>


Here, com.srijan.search.solr.analysis.AspellFilterFactory is
a custom filter that provides a phonetic soundslike equivalent for
Indian languages transliterated into English. However, that is
irrelevant here, as the issue below holds even if I use the standard
solr.DoubleMetaphoneFilterFactory.

I have a data source where all text is upper-case, and from
various Solr-related discussions found through Google, I would have
thought that fields of this type would be stored as the lower-case,
soundslike equivalent. Instead the data (as seen through the Solr
admin. interface, or through a front-end search) seem to be stored
as is.

The Solr admin. analysis view does show the index and query
conversions as I would expect. Also, phonetic matches, and matches
with lower-case input work properly. I am just curious as to how
this works.

Regards,
Gora

Data storage, and textual analysis

Reply via email to