Hi Erick, 1) I am looking stored data via Solr Admin UI. I send the query and check what is in content field.
2) I can debug the Tika settings if you think that this is not the desired behaviour to have such metadata fields combined into content field. *PS: *Is there any solution to get rid of it except for using PatternCaptureGroupFilterFactory? Kind Regards, Furkan KAMACI On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson <erickerick...@gmail.com> wrote: > 1> I'm assuming when you "see" this data you're looking at the stored > data, right? It's a verbatim copy of whatever you sent to the field. > I'm guessing it's a character-encoding mismatch between the source and > what you use to display. > > 2> How are you extracting this data? There are Tika options I think > that can/do mush fields together. > > Best, > Erick > > > > On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI <furkankam...@gmail.com> > wrote: > > Hi, > > > > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at > > schema has text_general field type which is not modified from original. I > > do not copy any fields to content. When I check the data I see content > > values as like: > > > > " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type > > application/rtf \nstream_size 13580 \nstream_name MARLON BRANDO.rtf > > \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n > \n > > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" > > directed by Elia Kazan \n" > > > > My questions: > > > > 1) Is it usual to have that newline characters? > > 2) Is it usual to have file metadata at the beginning of the content > (i.e. > > stream source, stream_content_type) or related to tool that I post data > to > > Solr? > > > > Kind Regards, > > Furkan KAMACI >