> Example data: > 01/23/2011 05:12:34 [Test] a=1; hello_there=50; > data=[1,5,30%]; > > I would love to be able to just "grep" the data - ie. if I > search for "ello", it finds and returns "ello", and if I > search for "hello_there=5", it would match too. > > Here's what I'm using now: > > <fieldType name="text_sy" > class="solr.TextField"> > <analyzer> > <tokenizer > class="solr.StandardTokenizerFactory"/> > <filter > class="solr.LowerCaseFilterFactory"/> > <filter > class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="0" > catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="0"/> > </analyzer> > </fieldType> > > The problem with this is that if I search for a substring, > I don't get anything back. For example, searching for > "ello" or "*ello*" doesn't return. Any ideas? > > http://localhost:8983/solr/select?q=*ello*&start=0&rows=50&hl.maxAnalyzedChars=2147483647&hl.useFastVectorHighlighter=true&hl=true&hl.fl=body&hl.snippets=1&hl.fragsize=400
For sub-string match NGramFilterFactory is required at index time. <filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="15"/> Plus you may want to use WhiteSpaceTokenizer instead of StandardTokenizerFactory. Analysis admin page displays behavior of each tokenizer.