: : <field name="text_t" type="text" indexed="true" stored="true" : multiValued="true" termVectors="true" termPositions="true" : termOffsets="true"/> : : It uses the text field type as its defined in Solr schema. I didn't : change it.
which version of Solr? (the schema is just an example, and the field types in the example schema change between versions as new analysis components are added and best practices are re-evaluated) : The input text is a 6 page UTF-8 text document, the relevant line the : term seems to be related to. Just a sentence with no specific : boundaries. Did you try pasting that text into the analysis page to see exactly what your "text_t" field does with it at analysis time like ia suggested? My best hunch is that the "spaces" are not your typical basic "space" character (hex 20) and maybe the tokenizer you are using doesn't tokenize on them, but then perhaps something like word delimiter treats them as non-word characters and chews them up. but that's just a guess ... w/o knowing the exact fieldtype analyzer and the specific Unicode characters used in the text it's just a guess. (Tip: if you use the JSON response writer (wt=json) when looking at the stored field value, it will help you see exactly what characters were in the original values by showing you the unicode escapes) -Hoss