Relatively new to solr, and I'm having trouble with indexing some fields coming out of the solr cell extraction handler.
First question - what does the extraction handler do with text? For example, if i throw it an excel file, what am I going to get back as input to solr processing? is anything done at all? Second question. I've got fields like AA_12345 - two chars, underscore, several digits, and AA.44.A3 - 2 chars, period, 2 numbers, period, char, number. I'd like these to match in a variety of different ways. For example, the first should match AA, AA12345, AA_12345, and 12345. The second should match AA.44.A3, AA44A3, 44, A3, etc. all both in upper and lower case, of course. What's the best way to filter and index? I've tried the following workflow 1) whitespace tokenizer 2) trim filter 3) word delimiter filter, with generate number parts, generate word parts, catenate numbesr, catenate words, split on case change, and prserve originals all set. 4) lowercase filter but I get very mixed results. the AA_12345 doesn't work in any form, and theAA.44.A3 is mixed: the whole thing matches, and "A3" matches, but "AA" does not. I've also got a simple string ("abcde") that goes into the same field type, and that string doesn't work at all? Any help would be appreciated. thanks! -harry