Hello,
First of all thanks to Jacob Singh for his reply on my mail last week, I
completely forgot to reply. Multicore is perfect for my needs. I've got
Solr running now with my new schema partially implemented and I've
started to test importing data with DIH. I've run in to a number of
issues though and I hope someone here can help:
1. Posting UTF-8 data through the example post-script works and I get
the proper results back when I query using the admin page.
However, data imported through the DataImportHandler from a MySQL
database (the database contains correct data, it's a copy of a
production db and selecting through the client gives the correct
characters) I get "ó" instead of "ó". I've tried several
combinations of arguments to my datasource url
(useUnicode=true&characterEncoding=UTF-8) but it does not seem to
help. How do I get this to work correctly?
2. On the wikipage for DataImportHandler, the deletedPkQuery has no
real description, am I correct in assuming it should contain a
query which returns the ids of items which should be removed from
the index?
3. Another question concerning the DataImportHandler wikipage, I'm
not sure about the exact way the field-tag works. From the first
data-config.xml example for the full-import I can infer that the
"column"-attribute represents the column from the sql-query and
the "name"-attribute represents the name of the field in the
schema the column should map to. However further on in the
RegexTransformer section there are column-attributes which do not
correspond to the sql-query result set and its the "sourceColName"
attribute which acually represents that data, which comes from the
RegexTransformer I understand but why then is the "column"
attribute used instead of the "name"-attribute. This has confused
me somewhat, any clarification would be greatly appreciated.
Regards,
gwk