Hi I'm using Solr's data import handler and MySQL 5.5 to index imdb database. However the data-import takes a few minutes to process one document while there are over 3 million movies. This is going to take forever yet I can select the rows in MySQL in no time. Where am I doing wrong? My data-config.xml is like below:
<entity name="movie" transformer="RegexTransformer" query="SELECT DISTINCT * FROM imdb.movie"> <field name="id" column="id" /> <entity name="movie_actor" transformer="RegexTransformer" child="true" query="SELECT DISTINCT * FROM imdb.movie_actor" cacheKey="movie_actor.parent" cacheLookup="movie.id" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache"> <field name="name" column="name" /> </entity> <entity name="movie_actress" transformer="RegexTransformer" child="true" query="SELECT DISTINCT * FROM imdb.movie_actress" cacheKey="movie_actress.parent" cacheLookup="movie.id" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache"> <field name="name" column="name" /> </entity> </entity> I created views for the database: movie: SELECT `title`.`id` AS `id` FROM `title` movie_actor: SELECT CONCAT('movie.', `title`.`id`, '.actor.', `cast_info`.`person_id`) AS `id`, `title`.`id` AS `parent`, `name`.`name` AS `name`, FROM ((`title` JOIN `cast_info` ON ((`cast_info`.`movie_id` = `title`.`id`))) JOIN `name` ON ((`cast_info`.`person_id` = `name`.`id`))) WHERE (`cast_info`.`role_id` = 1) movie_actress: SELECT CONCAT('movie.', `title`.`id`, '.actress.', `cast_info`.`person_id`) AS `id`, `title`.`id` AS `parent`, `name`.`name` AS `name`, FROM ((`title` JOIN `cast_info` ON ((`cast_info`.`movie_id` = `title`.`id`))) JOIN `name` ON ((`cast_info`.`person_id` = `name`.`id`))) WHERE (`cast_info`.`role_id` = 2) Thanks, Yangrui