RE: Using asterik(*) with unicode characters.

2017-06-29 Thread Preeti Bhat
Thanks Erick, its working now as expected. Thanks and Regards, Preeti Bhat -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, June 28, 2017 9:20 PM To: solr-user Subject: Re: Using asterik(*) with unicode characters. There's a long bl

Re: Using asterik(*) with unicode characters.

2017-06-28 Thread Erick Erickson
There's a long blog on wildcards here: https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ The gist is that when you are analyzing a token, if the analysis chain splits a token into more than one part then wildcards are impossible to get right. So any "Mult

Using asterik(*) with unicode characters.

2017-06-28 Thread Preeti Bhat
Hi All, I have a requirement where the user can give an Unicode or ascii character as input but expects same result. For example: MöllerGruppen AS vs MollerGruppen AS should give out same result. I am able to get this done using , but due to some reason when it try to do MöllerGruppen* I am ge

Re: Solr and Unicode characters in strings

2013-01-22 Thread Jack Park
Thanks! On Tue, Jan 22, 2013 at 8:59 AM, Otis Gospodnetic wrote: > Hi, > > When you run your indexing app make sure you treat what you send to Solr as > UTF-8. > Use -Dfile.encoding=UTF8 -Dclient.encoding.override=UTF-8 to the Java > command line. > > Otis > -- > Solr & ElasticSearch Support > ht

Re: Solr and Unicode characters in strings

2013-01-22 Thread Otis Gospodnetic
Hi, When you run your indexing app make sure you treat what you send to Solr as UTF-8. Use -Dfile.encoding=UTF8 -Dclient.encoding.override=UTF-8 to the Java command line. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Mon, Jan 21, 2013 at 3:06 PM, Jack Park wrote: > Here is a

Solr and Unicode characters in strings

2013-01-21 Thread Jack Park
Here is a situation I now experience: What Solr has: economist and thus …@en What was sent: economist and thus …@en where those are just snippets from what I sent up -- the ellipsis wa

Re: Solr and Tomcat - problem with unicode characters

2012-08-28 Thread zehoss
n context: http://lucene.472066.n3.nabble.com/Solr-and-Tomcat-problem-with-unicode-characters-tp4003692p4003709.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and Tomcat - problem with unicode characters

2012-08-28 Thread zehoss
} hits=10 status=0 QTime=35 in jetty logs. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-Tomcat-problem-with-unicode-characters-tp4003692p4003697.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr and Tomcat - problem with unicode characters

2012-08-28 Thread François Schiettecatte
hing on the > application site is ok. > And when I start Solr on Jetty I get correct results and in response I get > correct characters. > Only when I start Tomcat there are problems. > > There are no exceptions nor warnings in tomcat logs. > > Could anyone suggest me w

Solr and Tomcat - problem with unicode characters

2012-08-28 Thread zehoss
ceptions nor warnings in tomcat logs. Could anyone suggest me where should I search a problem? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-Tomcat-problem-with-unicode-characters-tp4003692.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unicode characters that are not legal XML characters;

2008-12-23 Thread lucas song
I have wirte a class to deal with this problem. public class XmlCharFilter { public static String doFilter(String in) { StringBuffer out = new StringBuffer(); // Used to hold the output. char current; // Used to reference the current character. if (in == null || ("".equals(in)))

Re: Unicode characters that are not legal XML characters

2008-12-23 Thread Bryan Talbot
I believe you can use the following unicode characters in XML documents: U+0009, U+000A, U+000D, [U+0020-U+D7FF], [U+E000-U+FFFD], and [U+1-U+10] One of your documents contains a U0022 character which is an invalid space character for XML. http://www.unicode.org/unicode/reports

Re: Unicode characters that are not legal XML characters;

2008-12-23 Thread Jarek Zgoda
Wiadomość napisana w dniu 2008-12-23, o godz. 14:46, przez rohit arora: When i give post command to build my Index on my (databases / XML) file it gives me an error which is like . com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 22)) at [row,col {unknown-

Unicode characters that are not legal XML characters

2008-12-23 Thread rohit arora
Hi, When i give post command to build my Index on my (databases / XML) file it gives me an error which is like . com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 22))  at [row,col {unknown-source}]: [1676,86] I find a inbuild function in perl to convert all m

Unicode characters that are not legal XML characters;

2008-12-23 Thread rohit arora
Hi, When i give post command to build my Index on my (databases / XML) file it gives me an error which is like . com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 22))  at [row,col {unknown-source}]: [1676,86] I find a inbuild function in perl to convert all my

RE: Unicode characters

2007-05-01 Thread HUYLEBROECK Jeremy RD-ILAB-SSF
Thanks a lot for the time you spent understanding my problem and checking for a solution in Neko! It helps a lot. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 4:02 PM To: solr-user@lucene.apache.org Subject: Re: Unicode characters

Re: Unicode characters

2007-04-27 Thread Chris Hostetter
: -fetch a web page : -decode entities and unicode characters(such as $#149; ) using Neko : library : -get a unicode String in Java : -Sent it to SOLR through XML created by SAX, with the right encoding : (UTF-8) specified everywhere( writer, header etc...) : -it apparently arrives clean on the

Re: Unicode characters

2007-04-27 Thread Yonik Seeley
On 4/27/07, HUYLEBROECK Jeremy RD-ILAB-SSF -In the query output from SOLR (XML message), the character is not encoded as an entity (not •) but the character itself is used (character 149=95 hexadecimal). That's fine, as they are equivalent representations, and that character is directly represe

Unicode characters

2007-04-27 Thread HUYLEBROECK Jeremy RD-ILAB-SSF
Hi, We experience some encoding probs with the unicode characters getting out of solr. Let me explain our flow: -fetch a web page -decode entities and unicode characters(such as $#149; ) using Neko library -get a unicode String in Java -Sent it to SOLR through XML created by SAX, with the right