Thanks Otis. I went ahead and added this section. I hope that others can add to this too but of course the list should be short :-)
- Amit On Sun, Aug 1, 2010 at 12:00 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hi Amit, > > Anyone can edit any Solr Wiki page - just create an account (I think the > link to > that is in the page footer) and edit. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Amit Nithian <anith...@gmail.com> > > To: solr-user@lucene.apache.org > > Sent: Sat, July 31, 2010 4:41:44 PM > > Subject: DIH, UTF8 and default DIH encoding value > > > > All, > > > > I am not sure if this is overly obvious or not (it wasn't to me) but in > > trying to index some international characters from XML files using the > DIH, > > I found that setting the encoding attribute on the dataSource element to > > "UTF-8" fixed my problem. > > > > <dataSource type="FileDataSource" encoding="UTF-8"/> > > > > My question is why the default isn't UTF-8 or if there is a good reason, > can > > the DIH wiki be made more clear that this encoding attribute can affect > the > > indexing of international characters? If I can get access to edit this > wiki > > page, I can add a section to that effect.. perhaps under a > troubleshooting > > section? > > > > Thanks! > > Amit > > >