Hello Lawrence,
Which type did you use in the solr schema for your fields?
Cheers,
Diego
On Tue, Aug 29, 2017 at 5:34 PM, Elitzer, Lawrence <
lelit...@lgsinnovations.com> wrote:
> Hello!
>
>
>
> It seems I can correctly import (with DIH) UTF-8 characters such as J but
&g
Hello!
It seems I can correctly import (with DIH) UTF-8 characters such as J but I
am unable to search on the fields containing the UTF-8 data. I have tried
from the Solr admin backend to send just a J and even URL encode it in the q
parameter I am specifying. How would I go about searching
y 28th February 2017 17:27
> To: solr-user@lucene.apache.org
> Subject: Invalid UTF-8 character 0x at char #17373581, byte #17539047
>
> Hello everyone,
>
> We use Solr (with Adobe Coldfusion) to index circa 60,000 pdfs, however the
> daily refresh has been failing wit
Hello everyone,
We use Solr (with Adobe Coldfusion) to index circa 60,000 pdfs, however the
daily refresh has been failing with this error "Invalid UTF-8 character
0x at char #17373581, byte #17539047" [truncated - full error
message is posted below]
-
- Can Solr be con
On Wed, Apr 22, 2015 at 4:17 PM, Yonik Seeley wrote:
> On Wed, Apr 22, 2015 at 11:00 AM, didier deshommes
> wrote:
> > curl "
> >
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation
> "
> > -H "Content-type:application/json"
>
> You're telling Solr the body encodi
Hi, our developers solved the problem. We are using Solarium and we had to
learn Solarium to use selects with content-type:
application/x-www-form-urlencoded
Pavel
--
View this message in context:
http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8
On Wed, Apr 22, 2015 at 11:00 AM, didier deshommes wrote:
> curl "
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation";
> -H "Content-type:application/json"
You're telling Solr the body encoding is JSON, but then you don't send any body.
We could catch that error
application/xml.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/ (my blog)
>
>
> On Apr 22, 2015, at 3:01 AM, bengates wrote:
>
> > Looks like Solarium hardcodes a default header "Content-Type: text/xml;
> > charse
ult header "Content-Type: text/xml;
> charset=utf-8" if none provided.
> Removing it solves the problem.
>
> It seems that Solr 5.1 doesn't support this content-type.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Bad-con
Looks like Solarium hardcodes a default header "Content-Type: text/xml;
charset=utf-8" if none provided.
Removing it solves the problem.
It seems that Solr 5.1 doesn't support this content-type.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Bad-content
e-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201564.html
Sent from the Solr - User mailing list archive at Nabble.com.
Off the cuff, it sounds like you are making a POST request to the
SearchHandler (ie: /search or /query) and the Content-TYpe you are sending
is "text/xml; charset=UTF-8"
In the past SearchHandler might have ignored that Content-Type, but now
that structured queries can be sent as
--
> View this message in context:
> http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html
> Sent from the Solr - User mailing list archive at Nabble.com.
ontentType-for-search-handler-text-xml-charset-UTF-8-tp4200314.html
Sent from the Solr - User mailing list archive at Nabble.com.
: I am indexing Solr 4.9.0 using the /update request handler and am getting
: errors from Tika - Illegal IOException from
: org.apache.tika.parser.xml.DcXMLParser@74ce3bea which is caused by
: MalFormedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. I
FWIW: that error appears to
Hello!
I am indexing Solr 4.9.0 using the /update request handler and am getting
errors from Tika - Illegal IOException from
org.apache.tika.parser.xml.DcXMLParser@74ce3bea which is caused by
MalFormedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. I
believe that this is the
node kicks off an indexing and tries to replicate all the updates using
the UpdateHandler.
What we get instead is an error around a wrong UTF-8 encoding from the
leader trying to call the /udpate endpoint on the replica:
request:
http://10.40.0.25:9765/skus/update?update.chain=custom&_vers
Hi,
we are having problems with an installation of SolrCloud where a leader
node kicks off an indexing and tries to replicate all the updates using the
UpdateHandler.
What we get instead is an error around a wrong UTF-8 encoding from the
leader trying to call the /udpate endpoint on the replica
2013/8/6 Raymond Wiker
> Ok, let me rephrase that slightly: does your database extraction include
> BLOBs or CLOBs that are actually complete documents, that might be UTF-8
> encoded text?
>
> It definitely does, each entry I have in PostgreSQL has a field of type
"text
Ok, let me rephrase that slightly: does your database extraction include
BLOBs or CLOBs that are actually complete documents, that might be UTF-8
encoded text?
>From the stack trace in your second post, it seems that the error occurs
while parsing an XML file uploaded via the UpdateRequestHand
No, the content has no XML tags included (hope I understood what you were
asking here).
Federico
2013/8/5 Raymond Wiker
> On Aug 5, 2013, at 20:12 , Federico Chiacchiaretta <
> federico.c...@gmail.com> wrote:
> > Hi Raymond,
> > I agree with you, 0xfffe is a special character, that is why I wa
On Aug 5, 2013, at 20:12 , Federico Chiacchiaretta
wrote:
> Hi Raymond,
> I agree with you, 0xfffe is a special character, that is why I was asking
> how it's handled in solr.
> In my document, 0xfffe does not appear at the beginning, it's in the
> content.
>
> Just an update about testing I'm d
The problem is that even though unicode point \u and \uFFFE are valid
UTF-8 characters, they will not be parsed by standards conforming XML
parsers. There is something called UTF-8 replacement character \uFFFD that
can be used to replace such characters. While indexing docs, replace all
such
On Mon, Aug 5, 2013 at 3:03 PM, Chris Hostetter
wrote:
>
> : > 0xfffe is not a special character -- it is explicitly *not* a character in
> : > Unicode at all, it is set asside as "not a character." specifically so
> : > that the character 0xfeff can be used as a BOM, and if the BOM is read
> : >
d UTFs?
A: Absolutely not. Noncharacters do not cause a Unicode string
to be ill-formed in any UTF. This can be seen explicitly in the
table above, where every noncharacter code point has a well-
formed representation in UTF-32, in UTF-16, and in UTF
: > 0xfffe is not a special character -- it is explicitly *not* a character in
: > Unicode at all, it is set asside as "not a character." specifically so
: > that the character 0xfeff can be used as a BOM, and if the BOM is read
: > incorrectly, it will cause an error.
:
: XML doesnt allow contro
On 8/5/2013 12:12 PM, Federico Chiacchiaretta wrote:
Hi Raymond,
I agree with you, 0xfffe is a special character, that is why I was asking
how it's handled in solr.
In my document, 0xfffe does not appear at the beginning, it's in the
content.
I believe that 0xfffe not a valid UTF-8
On Mon, Aug 5, 2013 at 11:42 AM, Chris Hostetter
wrote:
>
> : I agree with you, 0xfffe is a special character, that is why I was asking
> : how it's handled in solr.
> : In my document, 0xfffe does not appear at the beginning, it's in the
> : content.
>
> Unless i'm missunderstanding something (an
the content" of your database, then
your database content (by definition) can not be UTF-8, because 0xfffe is
not a character in Unicode.
if you are able to index that content in a single node
Sold+DIH+JDBC+postgress setup, then you are getting (un)lucky -- postgres
isn't complaing that
wer.
> > From the docs you linked i found:
> > "This property is only relevent for server versions less than or equal to
> > 7.2".
> >
> > I'm using version 9.1, I gave it a try but unfortunately I had no luck.
> > Besides, I checked encoding sett
> From the docs you linked i found:
> "This property is only relevent for server versions less than or equal to
> 7.2".
>
> I'm using version 9.1, I gave it a try but unfortunately I had no luck.
> Besides, I checked encoding settings on DB and it's UTF-8.
&g
Hi Shawn,
thanks for your answer.
>From the docs you linked i found:
"This property is only relevent for server versions less than or equal to
7.2".
I'm using version 9.1, I gave it a try but unfortunately I had no luck.
Besides, I checked encoding settings on DB and it's U
83/solr/archive/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Invalid
> UTF-8 character 0xfffe at char #416, byte #127)
It sounds like your database is not using the UTF-8 character set, but
the JDBC driver (or the driver-server combination) is not aware that the
character set is dif
; org.apache.solr.common.SolrException;
java.lang.RuntimeException: [was class java.io.CharConversionException]
Invalid UTF-8 character 0xfffe at char #6755, byte #6143)
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at
com.ctc.wstx.sr.StreamScanner.throwLazyError
, but
it doesnt in 4.3.
I brought up the issue on the dev list. Allowing a user to change the
default character set would cause problems for SolrCloud or distributed
search, because the requests generated by the server are UTF-8.
The responder did say that he can imagine all the code for a sol
:
Invalid
UTF-8 character 0xfffe at char #416, byte #127)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call
I brought up the issue on the dev list. Allowing a user to change the
default character set would cause problems for SolrCloud or distributed
search, because the requests generated by the server are UTF-8.
The responder did say that he can imagine all the code for a solution
that involves
ersion 3.5, but
> it doesnt in 4.3.
Version 3.5 didn't force UTF-8, which led to a TON of problems with
misconfigured containers, notably tomcat. SOLR-4265 (first available in
4.1.0) fixed this problem.
https://issues.apache.org/jira/browse/SOLR-4265
In your case, because you are actually
ge in context:
http://lucene.472066.n3.nabble.com/Solr-4-3-1-only-accepts-UTF-8-encoded-queries-tp4080587p4080706.html
Sent from the Solr - User mailing list archive at Nabble.com.
E9" and "cão" is encoded to "c%E3o".
> My URLencoding in tomcat is "iso-8859-1", but when i do a query like that to
> solr(?q="caf%E9") It returns the error {msg=URLDecoder: Invalid character
> encoding detected after position 2 of query string /
".
My URLencoding in tomcat is "iso-8859-1", but when i do a query like that to
solr(?q="caf%E9") It returns the error {msg=URLDecoder: Invalid character
encoding detected after position 2 of query string / form data (while
parsing as UTF-8),code=400}. It works perfectly in my
e to index next documents coming from
nutch. Or even though I am new to SOLR, maybe, I can write update pre/post
processor plugin to SORL update job to ignore XML errors. Do we have
solution for this problem?
I appreciate your help
class java.io.CharConversionException] Invalid UTF-8 character 0xff
gt; "Jack Krupansky" 18.10.2012 21:36 >>>
Have you verified that the data was indexed properly (UTF-8 encoding)?
Try a
raw HTTP request using the browser or curl and see how that field looks
in
the resulting XML.
-- Jack Krupansky
-Original Message-
From: Andreas Kahl
Sent:
Have you verified that the data was indexed properly (UTF-8 encoding)? Try a
raw HTTP request using the browser or curl and see how that field looks in
the resulting XML.
-- Jack Krupansky
-Original Message-
From: Andreas Kahl
Sent: Thursday, October 18, 2012 1:10 PM
To: j
Jack,
Thanks for the hint, but we have already set URIEncoding="UTF-8" on all
our tomcats, too.
Regards
Andreas
>>> "Jack Krupansky" 18.10.12 17.11 Uhr >>>
It may be that your container does not have UTF-8 enabled. For example,
with
Tomcat you
It may be that your container does not have UTF-8 enabled. For example, with
Tomcat you need something like:
Make sure your "Connector" element has URIEncoding="UTF-8" (for Tomcat.)
-- Jack Krupansky
-Original Message-
From: Andreas Kahl
Sent: Thursday, Octob
asian alphabets, so we need UTF-8.
Now we have the problem that the string returned by
marcXml = results.get(0).getFirstValue("marcxml").toString();
is not valid UTF-8, so the resulting XML-Document is not well formed.
Here is what we do in Java:
<<
ModifiableSolrPar
On 9 October 2012 17:42, Patrick Oliver Glauner
wrote:
> Hello everybody
>
> Meanwhile, I checked this issue in detail: we use pdftotext to extract text
> from our PDFs (<http://cds.cern.ch/>). Some generated text files contain
> \u and \uD835.
>
> unicode(text,
Hello everybody
Meanwhile, I checked this issue in detail: we use pdftotext to extract text
from our PDFs (<http://cds.cern.ch/>). Some generated text files contain \u
and \uD835.
unicode(text, 'utf-8') does not throw any exception for these texts.
Subsequently, Solr thr
UTF-8
Python's unicode function takes an optional (keyword) "errors"
argument, telling it what to do when an invalid UTF8 byte sequence is
seen.
The default (errors='strict') is to throw the exceptions you're
seeing. But you can also pass errors='rep
lace' or errors='ignore'.
See http://docs.python.org/howto/unicode.html for details ...
However, I agree with Robert: you should dig into why whatever process
you used to extract the full text from your binary documents is
producing invalid UTF-8 ... something is wrong with that process.
Mike McCan
quite frequent.
>
I don't really know python either: so I could be wrong here but are
you just taking these binary .PDF and .DOC files and treating them as
UTF-8 text and sending them to Solr?
If so, I don't think that will work very well. Maybe instead try
parsing these binary files w
elsma [markus.jel...@openindex.io]
Sent: Tuesday, September 25, 2012 7:24 PM
To: solr-user@lucene.apache.org; Patrick Oliver Glauner
Subject: RE: Indexing in Solr: invalid UTF-8
Hi - you need to get rid of all non-character code points.
http://unicode.org/cldr/utility/list-unicodeset.
Indexing in Solr: invalid UTF-8
>
> Hello
>
> We use Solr 3.1 and Jetty to index previously extracted fulltexts from PDFs,
> DOC etc. Our indexing script is written in Python 2.4 using solrpy:
>
> [...]
> text = remove_control_characters(text) # except \r, \
Hello
We use Solr 3.1 and Jetty to index previously extracted fulltexts from PDFs,
DOC etc. Our indexing script is written in Python 2.4 using solrpy:
[...]
text = remove_control_characters(text) # except \r, \t, \n
utext = unicode(text, 'utf-8')
SOLR_CONNECTION.add(id=recid, full
the format "UTF-8 without BOM"? Is there a way to get
out of this issue.
French character : étaient état
Thanks
Binoy
--
View this message in context:
http://lucene.472066.n3.nabble.com/UTF-8-without-BOM-French-characters-issue-tp4004751.html
Sent from the Solr - User mailing list
It varies. Last I used Tomcat (some years ago) it defaulted to the system
default encoding and you had to use -Dfile.encoding... to get UTF-8.
Jetty currently defaults to UTF-8.
On Jul 17, 2012, at 11:12 PM, William Bell wrote:
> -Dfile.encoding=UTF-8... Is this usually recommended for S
My experience is that this property has made a whole lot of a difference. At
least till solr 3.1.
The servlet container has not been the only bit.
paul
Le 18 juil. 2012 à 05:12, William Bell a écrit :
> -Dfile.encoding=UTF-8... Is this usually recommended for SOLR indexes?
>
>
-Dfile.encoding=UTF-8... Is this usually recommended for SOLR indexes?
Or is the encoding usually just handled by the servlet container like Jetty?
--
Bill Bell
billnb...@gmail.com
cell 720-256-8076
if it needs improving.
Thanks,
Erik
On Apr 4, 2012, at 04:29 , henri wrote:
> I have finally solved my problem!!
>
> Did the following:
>
> added two lines in the /browse requestHandler
> velocity.properties
> text/html;charset=UTF-8
>
> Moved velocity
I have finally solved my problem!!
Did the following:
added two lines in the /browse requestHandler
velocity.properties
text/html;charset=UTF-8
Moved velocity.properties from solr/conf/velocity to solr/conf
Not being an expert, I am not 100% sure this is the "best" sol
Paul,
velocity.properties are set.
One thing I am not 100% sure about is where this file should reside?
I have placed in in the example/solr/conf/velocity folder (where the .vm
files reside).
Cheers,
Henri
--
View this message in context:
http://lucene.472066.n3.nabble.com/UTF-8-encoding
Henri,
look velocity.properties. I have there:
> input.encoding = UTF-8
Do you also?
This is the vm files' encodings.
Of course also make sure you edit these files in UTF-8 (using jEdit made it
trustable to me).
paul
Le 30 mars 2012 à 08:49, henri.gour...@laposte.net
OK, Ill try to provide more details:
I am using solr-3.5.0
I am running the example provided in the package.
Some of the modifications I have done in the various velocity/*.vm files
have accents!
It is those accents that show up garbled when I look at the results.
The .vm files are utf-8 encoded
I doubt that the pre-installed Jetty server has problems with UTF-8, although
you haven't told us what version of Solr you're running on so it could be really
old.
And you also haven't told us why you think UTF-8 is a problem. How is this
manifesting itself? Failed searches?
Thanks for the tips, but unfortunately, no progress so far.
Reading through the Web, I guess that jetty has utf-8 problems!
I guess that I will have to switch from the embedded (and pre installed ->
easy) jetty server present in Solr in favor of Tomcat (for which I have to
rediscover
success or lack thereof, I'm interested and I am sure others are.
paul
Le 29 mars 2012 à 16:49, Bob Sandiford a écrit :
> Hi, Henri.
>
> Make sure that the container in which you are running Solr is also set for
> UTF-8.
>
> For example, in Tomcat, in the serve
Hi, Henri.
Make sure that the container in which you are running Solr is also set for
UTF-8.
For example, in Tomcat, in the server.xml file, your Connector definitions
should include:
URIEncoding="UTF-8"
Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.
I cant get utf-8 encoding to work!!
I havetext/html;charset=UTF-8
in my request handler, and
input.encoding=UTF-8
output.encoding=UTF-8
in velocity.properties, in various locations (I may have the wrong ones! at
least in the folder where the .vm files reside)
What else should I be
: Subject: UTF-8 support during indexing content
: References: <8ce9f966c6f6769-19a0-9e...@webmail-m069.sysops.aol.com>
: <1326447127.1952.10.camel@snape>
: <8ceade0f7e0ecec-189c-c...@webmail-m069.sysops.aol.com>
: <1328105200.2033.33.camel@snape>
: In-Reply-To: <132
Travis and all,
This is solved and was not directly a Solr issue. I'll note the solution here
in case anyone makes the same mistake. The documents are UTF-8 and the source
documents are converted via XSLT. They look good up to that point.
First off, based off of of some other recommenda
Are you sure the input document is in UTF-8? That looks like classic
ISO-8859-1-treated-as-UTF-8.
How did you confirm the document contains the right quote marks immediately
prior to uploading? If you just visually inspected it, then use whatever
tool you viewed it in to see what the character
Hello everyone,
I have a question that I imagine has been asked many times before, so I
apologize for the repeat.
I have a basic text field with the following text:
the word ”stemming” in quotes
Uploading the data yields no errors, however when it is indexed, the text looks
like this:
I finally managed to answer my own question. UTF-8 data in the body is ok,
but you need to specify charset=utf-8 in the Content-Type header in each
part, to tell the receiver (Solr) that it's not the default ISO-8859-1
Content-Disposition: form-data; name=literal.bptitle
Content-Type:
I'm trying to post a PDF along with a whole bunch of metadata fields to the
ExtractingRequestHandler as multipart/form-data. It works fine except for
the utf-8 character handling. Here is what my post looks like (abridged):
POST /solr/update/extract HTTP/1.1
TE: deflate,gzip;
Thanks Chris. Yes, changing connector settings not just in solr but also in
all webapps that were sending queries into it solved the problem!
Appreciate the help.
R
On Tue, Sep 13, 2011 at 6:11 PM, Chris Hostetter
wrote:
>
> : Any idea why solr is unable to return the pound sign as-is?
> :
> :
: Any idea why solr is unable to return the pound sign as-is?
:
: I tried typing in £ 1 million in Solr admin GUI and got following response.
...
: £ 1 million
...
: Here is my Java Properties I got also from admin interface:
...
: catalina.home =
: /home/rbhagdev/SCCRepo
classworlds.conf = /usr/share/maven2/bin/m2.conf
sun.jnu.encoding = UTF-8
java.library.path =
/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.26/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr
SolrJ 3.1? Anything else on the
> >>> Nutch part i should have taken care off?
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Jun 27, 2011 10:24:28 AM org.apache.solr.core.SolrCore execute
> >>> INFO: [] webapp=/solr path=/updat
lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0x at char
#1142033, byte #1155068) at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:
1 8) at
com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at
com.ctc.wstx.sr.
ath=/update params={wt=javabin&version=2}
> > status=500 QTime=423 Jun 27, 2011 10:24:28 AM
> > org.apache.solr.common.SolrException log
> > SEVERE: java.lang.RuntimeException: [was class
> > java.io.CharConversionException] Invalid UTF-8 character 0x at char
> &
core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={wt=javabin&version=2} status=500
> QTime=423 Jun 27, 2011 10:24:28 AM org.apache.solr.common.SolrException
> log
> SEVERE: java.lang.RuntimeException: [was class
> java.io.CharConversionException]
en
care off?
Thanks!
Jun 27, 2011 10:24:28 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={wt=javabin&version=2} status=500
QTime=423
Jun 27, 2011 10:24:28 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: [was class java.io.Cha
O: [] webapp=/solr path=/update params={wt=javabin&version=2}
> >> status=500 QTime=423 Jun 27, 2011 10:24:28 AM
> >> org.apache.solr.common.SolrException log SEVERE:
> >> java.lang.RuntimeException: [was class java.io.CharConversionException]
> >> Invalid U
[] webapp=/solr path=/update params={wt=javabin&version=2} status=500
>> QTime=423
>> Jun 27, 2011 10:24:28 AM org.apache.solr.common.SolrException log
>> SEVERE: java.lang.RuntimeException: [was class
>> java.io.CharConversionException] Invalid UTF-8 character 0x at char
>> #114203
hı
Its the same error I mentioned here
http://lucene.472066.n3.nabble.com/strange-utf-8-problem-td3094473.html.
Also if you use solr 1.4.1 there is no problem like that.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-3-1-indexing-error-Invalid-UTF-8-character
atus=500
> QTime=423
> Jun 27, 2011 10:24:28 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.RuntimeException: [was class
> java.io.CharConversionException] Invalid UTF-8 character 0x at char
> #1142033, byte #1155068)
> at
> com.ctc.wstx.util.Excep
OK - re-reading your message it seems maybe that is what you were trying
to say too, Robert. FWIW I agree with you that XML is rigid, sometimes
for purely arbitrary reasons. But nobody has really helped Markus here
- unfortunately, there is no easy way out of this mess. What I do to
handle i
Actually - you are both wrong!
It is true that 0x is a valid UTF8 character, and not a valid UTF8
byte sequence.
But the parser is reporting (or trying to) that 0x is an invalid XML
character.
And Robert - if the wording offends you, you might want to send a note
to Tatu (http://ji
Am 27.06.2011 14:48, schrieb Robert Muir:
On Mon, Jun 27, 2011 at 8:47 AM, Bernd Fehling
wrote:
correct!!!
but what i said, is totally different than what you said.
you are still wrong.
http://www.unicode.org/faq//utf_bom.html
see Q: What is a UTF?
On Mon, Jun 27, 2011 at 8:47 AM, Bernd Fehling
wrote:
>
> correct!!!
>
but what i said, is totally different than what you said.
you are still wrong.
Am 27.06.2011 14:35, schrieb Robert Muir:
On Mon, Jun 27, 2011 at 8:30 AM, Bernd Fehling
wrote:
Unicode U+ ist UTF-8 byte sequence "ef bf bf" that is right.
But I was saying that UTF-8 0x (which is byte sequence "ff ff") is
illegal
On Mon, Jun 27, 2011 at 8:30 AM, Bernd Fehling
wrote:
> Unicode U+ ist UTF-8 byte sequence "ef bf bf" that is right.
>
> But I was saying that UTF-8 0x (which is byte sequence "ff ff") is
> illegal
> and that's what the java.io.CharConversionExcept
Am 27.06.2011 14:02, schrieb Robert Muir:
On Mon, Jun 27, 2011 at 7:11 AM, Bernd Fehling
wrote:
So there is no UTF-8 0x. It is illegal.
you are wrong: it is legally encoded as a three byte sequence: ef bf bf
Unicode U+ ist UTF-8 byte sequence "ef bf bf" that is rig
On Mon, Jun 27, 2011 at 7:11 AM, Bernd Fehling
wrote:
>
> So there is no UTF-8 0x. It is illegal.
>
you are wrong: it is legally encoded as a three byte sequence: ef bf bf
I suggest avoid illegal UTF-8 characters by pre-filtering your
contentstream before loading.
Unicode UTF-8(hex)
U+07FFdf bf
U+0800e0 a0 80
So there is no UTF-8 0x. It is illegal.
Regards
Am 27.06.2011 12:40, schrieb Markus Jelsma:
Hi,
I came across the indexing error below
[was class java.io.CharConversionException]
Invalid UTF-8 character 0x at char #1142033, byte #1155068)
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
-1.4.0.jar for solr 1.4.1
becouse of javabin errors.
here is problematic chars. "Sao Tom���nd Princip���STP"
SEVERE: java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0x at char
#681112, byte #700315)
Hello group,
this message is a word of warning and plea to wiki writers.
Reading the wiki and documentation in general, there seems to be an accepted
consensus that most in SOLR is working in utf-8. To my opinion this is
absolutely good.
But this may be a remain of the times. Several efforts
On Thu, Jan 27, 2011 at 3:51 AM, prasad deshpande
wrote:
> The size of docs can be huge, like suppose there are 800MB pdf file to index
> it I need to translate it in UTF-8 and then send this file to index.
PDF is binary AFAIK... you shouldn't need to do any charset
translation before
1 - 100 of 254 matches
Mail list logo