1> I'm assuming when you "see" this data you're looking at the stored
data, right? It's a verbatim copy of whatever you sent to the field.
I'm guessing it's a character-encoding mismatch between the source and
what you use to display.

2> How are you extracting this data? There are Tika options I think
that can/do mush fields together.

Best,
Erick



On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI <furkankam...@gmail.com> wrote:
> Hi,
>
> I'm testing Solr 4.9.1 I've indexed documents via it. Content field at
> schema has text_general field type which is not modified from original. I
> do not copy any fields to content. When I check the data  I see content
> values as like:
>
>  " \n \nstream_source_info MARLON BRANDO.rtf   \nstream_content_type
> application/rtf   \nstream_size 13580   \nstream_name MARLON BRANDO.rtf
> \nContent-Type application/rtf   \nresourceName MARLON BRANDO.rtf   \n  \n
> \n  1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\"
> directed by Elia Kazan \n"
>
> My questions:
>
> 1) Is it usual to have that newline characters?
> 2) Is it usual to have file metadata at the beginning of the content (i.e.
> stream source, stream_content_type) or related to tool that I post data to
> Solr?
>
> Kind Regards,
> Furkan KAMACI

Reply via email to