Re: How to remove control characters in stored value at Solr side

2017-09-19 Thread Shawn Heisey
On 9/18/2017 12:45 PM, Markus Jelsma wrote: > But, can you then explain why Apache Nutch with SolrJ had this problem? It > seems that by default SolrJ does use XML as transport format. We have always > used SolrJ which i assumed would default to javabin, but we had this exact > problem anyway, a

RE: How to remove control characters in stored value at Solr side

2017-09-19 Thread Markus Jelsma
Ah, thanks! -Original message- > From:Chris Hostetter > Sent: Monday 18th September 2017 23:11 > To: solr-user@lucene.apache.org > Subject: RE: How to remove control characters in stored value at Solr side > > > : But, can you then explain why Apache Nutc

RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Chris Hostetter
: But, can you then explain why Apache Nutch with SolrJ had this problem? : It seems that by default SolrJ does use XML as transport format. We have : always used SolrJ which i assumed would default to javabin, but we had : this exact problem anyway, and solved it by stripping non-character cod

RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Markus Jelsma
Subject: RE: How to remove control characters in stored value at Solr side > > > : You can not do this in Solr, you cannot even send non-character code > : points in the first place. For Apache Nutch we solved the problem by > > Strictly speak: this is false. You *can* send co

RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Chris Hostetter
: You can not do this in Solr, you cannot even send non-character code : points in the first place. For Apache Nutch we solved the problem by Strictly speak: this is false. You *can* send control characters to solr as field values -- assuming your transport format allows it. Example: using j

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
looks as though the problem is in parsing some malformed XML, based on what I'm seeing: ... Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 11)) ... ( char #11 is a vertical tab). This should be fixed outside Solr, but if that is not practical, and yo

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread arnoldbronley
Thanks for information. Here is the full stack trace. I thought to handle it from client side but client apps are not under my control and I don't have access to them. org.apache.solr.common.SolrException: Illegal character ((CTRL-CHAR, code 11)) at [row,col {unknown-source}]: [1,413] at

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
> > > > -Original message- > > From:Arnold Bronley > > Sent: Thursday 14th September 2017 19:46 > > To: solr-user@lucene.apache.org > > Subject: How to remove control characters in stored value at Solr side > > > > I know I can apply PatternRepl

RE: How to remove control characters in stored value at Solr side

2017-09-14 Thread Markus Jelsma
t: Thursday 14th September 2017 19:46 > To: solr-user@lucene.apache.org > Subject: How to remove control characters in stored value at Solr side > > I know I can apply PatternReplaceFilterFactory to remove control characters > from indexed value. However, is it possible to do simila

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
Sounds as though an update request processor will do that, and also eliminate the need to use the PatternReplaceFilterfactory downstream. Take a look at the documentation in https://lucene.apache.org/solr/guide/6_6/update-request-processors.html. I'm thinking that the RegexReplaceProcessorFactory

How to remove control characters in stored value at Solr side

2017-09-14 Thread Arnold Bronley
I know I can apply PatternReplaceFilterFactory to remove control characters from indexed value. However, is it possible to do similar thing for stored value? Because of some control characters included in indexing request, Solr throws Illegal Character Exception.