Thanks, Robert.  That's exactly what my problem was.  Things work find after
I make sure that all my processing (index and query) are using UTF-8.  FYI,
it took me a while to discover that SolrJ by default uses a GET request for
query, which uses ISO-8859-1.  I had to explicitly use a POST to do query in
SolrJ in order to get it to use UTF-8.

Bill

On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir <rcm...@gmail.com> wrote:

> Bill, somewhere in the process I think you might be treating your
> UTF-8 text as ISO-8859-1.
>
> Your character: 00B5 (µ)
> Bits: 10110101
>
> UTF8-encoded: 11000010 10110101
>
> If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
> file or wrong url encoding) then it looks like:
> 0xC2 (Å) followed by 0xB5 (µ)
>
>
> On Tue, Jul 28, 2009 at 3:26 PM, Bill Au<bill.w...@gmail.com> wrote:
> > I am using SolrJ to index the word µTorrent.  After a commit I was not
> able
> > to query for it.  It turns out that the document in my Solr index
> contains
> > the word µTorrent instead of µTorrent.  Any one has any idea what's
> going
> > on???
> >
> > Bill
> >
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

Reply via email to