On 9/18/2017 12:45 PM, Markus Jelsma wrote:
> But, can you then explain why Apache Nutch with SolrJ had this problem? It
> seems that by default SolrJ does use XML as transport format. We have always
> used SolrJ which i assumed would default to javabin, but we had this exact
> problem anyway, a
Ah, thanks!
-Original message-
> From:Chris Hostetter
> Sent: Monday 18th September 2017 23:11
> To: solr-user@lucene.apache.org
> Subject: RE: How to remove control characters in stored value at Solr side
>
>
> : But, can you then explain why Apache Nutc
: But, can you then explain why Apache Nutch with SolrJ had this problem?
: It seems that by default SolrJ does use XML as transport format. We have
: always used SolrJ which i assumed would default to javabin, but we had
: this exact problem anyway, and solved it by stripping non-character cod
Subject: RE: How to remove control characters in stored value at Solr side
>
>
> : You can not do this in Solr, you cannot even send non-character code
> : points in the first place. For Apache Nutch we solved the problem by
>
> Strictly speak: this is false. You *can* send co
: You can not do this in Solr, you cannot even send non-character code
: points in the first place. For Apache Nutch we solved the problem by
Strictly speak: this is false. You *can* send control characters to solr
as field values -- assuming your transport format allows it.
Example: using j
looks as though the problem is in parsing some malformed XML, based on
what I'm seeing:
...
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
((CTRL-CHAR, code 11))
... ( char #11 is a vertical tab).
This should be fixed outside Solr, but if that is not practical, and yo
Thanks for information. Here is the full stack trace. I thought to handle it
from client side but client apps are not under my control and I don't have
access to them.
org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, code
11))
at [row,col {unknown-source}]: [1,413]
at
>
>
>
> -Original message-
> > From:Arnold Bronley
> > Sent: Thursday 14th September 2017 19:46
> > To: solr-user@lucene.apache.org
> > Subject: How to remove control characters in stored value at Solr side
> >
> > I know I can apply PatternRepl
t: Thursday 14th September 2017 19:46
> To: solr-user@lucene.apache.org
> Subject: How to remove control characters in stored value at Solr side
>
> I know I can apply PatternReplaceFilterFactory to remove control characters
> from indexed value. However, is it possible to do simila
Sounds as though an update request processor will do that, and also
eliminate the need to use the PatternReplaceFilterfactory downstream.
Take a look at the documentation in
https://lucene.apache.org/solr/guide/6_6/update-request-processors.html.
I'm thinking that the RegexReplaceProcessorFactory
I know I can apply PatternReplaceFilterFactory to remove control characters
from indexed value. However, is it possible to do similar thing for stored
value? Because of some control characters included in indexing request,
Solr throws Illegal Character Exception.
11 matches
Mail list logo