PS: \n characters are not shown in browser but breaks how highlighter work.
\n characters are considered at fragsize too.
On Sat, Nov 26, 2016 at 9:47 PM, Furkan KAMACI
wrote:
> Hi Erick,
>
> I resolved my metadata problem with configuring solrconfig.xml However
> even I post data with post.sh
Hi Erick,
I resolved my metadata problem with configuring solrconfig.xml However even
I post data with post.sh I see content as like:
CANADA �1 \n \n \n \n Place
I have newline characters as \n and some non-ASCII characters. As far as I
understand it is usual to have such characters because t
Not sure. What have you tried?
For production situations or when you want to take total control of
the indexing process,I strongly recommend that you put the Tika
parsing on the _client_.
Here's a writeup on this topic:
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
Best,
Erick
O
Hi Erick,
When I check the *Solr* documentation I see that [1]:
*In addition to Tika's metadata, Solr adds the following metadata (defined
in ExtractingMetadataConstants):*
*"stream_name" - The name of the ContentStream as uploaded to Solr.
Depending on how the file is uploaded, this may or may
about PatternCaptureGroupFilterFactory. This isn't going to help. The
data you see when you return stored data is _before_ any analysis so
the PatternFactory won't be applied. You could do this in a
ScriptUpdateProcessorFactory. Or, just don't worry about it and have
the real app deal with it.
Hi Erick,
1) I am looking stored data via Solr Admin UI. I send the query and check
what is in content field.
2) I can debug the Tika settings if you think that this is not the desired
behaviour to have such metadata fields combined into content field.
*PS: *Is there any solution to get rid of i
1> I'm assuming when you "see" this data you're looking at the stored
data, right? It's a verbatim copy of whatever you sent to the field.
I'm guessing it's a character-encoding mismatch between the source and
what you use to display.
2> How are you extracting this data? There are Tika options I t