Bill, somewhere in the process I think you might be treating your
UTF-8 text as ISO-8859-1.

Your character: 00B5 (µ)
Bits: 10110101

UTF8-encoded: 11000010 10110101

If you were to treat these bytes as ISO-8859-1 (i.e. reading from a
file or wrong url encoding) then it looks like:
0xC2 (Å) followed by 0xB5 (µ)


On Tue, Jul 28, 2009 at 3:26 PM, Bill Au<bill.w...@gmail.com> wrote:
> I am using SolrJ to index the word µTorrent.  After a commit I was not able
> to query for it.  It turns out that the document in my Solr index contains
> the word µTorrent instead of µTorrent.  Any one has any idea what's going
> on???
>
> Bill
>



-- 
Robert Muir
rcm...@gmail.com

Reply via email to