Thanks, Robert. That's exactly what my problem was. Things work find after I make sure that all my processing (index and query) are using UTF-8. FYI, it took me a while to discover that SolrJ by default uses a GET request for query, which uses ISO-8859-1. I had to explicitly use a POST to do query in SolrJ in order to get it to use UTF-8.
Bill On Tue, Jul 28, 2009 at 5:27 PM, Robert Muir <rcm...@gmail.com> wrote: > Bill, somewhere in the process I think you might be treating your > UTF-8 text as ISO-8859-1. > > Your character: 00B5 (µ) > Bits: 10110101 > > UTF8-encoded: 11000010 10110101 > > If you were to treat these bytes as ISO-8859-1 (i.e. reading from a > file or wrong url encoding) then it looks like: > 0xC2 (Å) followed by 0xB5 (µ) > > > On Tue, Jul 28, 2009 at 3:26 PM, Bill Au<bill.w...@gmail.com> wrote: > > I am using SolrJ to index the word µTorrent. After a commit I was not > able > > to query for it. It turns out that the document in my Solr index > contains > > the word µTorrent instead of µTorrent. Any one has any idea what's > going > > on??? > > > > Bill > > > > > > -- > Robert Muir > rcm...@gmail.com >