1> I'm assuming when you "see" this data you're looking at the stored data, right? It's a verbatim copy of whatever you sent to the field. I'm guessing it's a character-encoding mismatch between the source and what you use to display.
2> How are you extracting this data? There are Tika options I think that can/do mush fields together. Best, Erick On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI <furkankam...@gmail.com> wrote: > Hi, > > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at > schema has text_general field type which is not modified from original. I > do not copy any fields to content. When I check the data I see content > values as like: > > " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type > application/rtf \nstream_size 13580 \nstream_name MARLON BRANDO.rtf > \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n \n > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" > directed by Elia Kazan \n" > > My questions: > > 1) Is it usual to have that newline characters? > 2) Is it usual to have file metadata at the beginning of the content (i.e. > stream source, stream_content_type) or related to tool that I post data to > Solr? > > Kind Regards, > Furkan KAMACI