All,

I am not sure if this is overly obvious or not (it wasn't to me) but in
trying to index some international characters from XML files using the DIH,
I found that setting the encoding attribute on the dataSource element to
"UTF-8" fixed my problem.

<dataSource type="FileDataSource" encoding="UTF-8"/>

My question is why the default isn't UTF-8 or if there is a good reason, can
the DIH wiki be made more clear that this encoding attribute can affect the
indexing of international characters? If I can get access to edit this wiki
page, I can add a section to that effect.. perhaps under a troubleshooting
section?

Thanks!
Amit

Reply via email to