Hi, I am facing performance issue in SOLR when indexing huge data. Please find below the stats,
<str name="Time Elapsed">8:57:17.334</str> <str name="Total Requests made to DataSource">42778</str> <str name="Total Rows Fetched">273725</str> <str name="Total Documents Processed">42775</str> <str name="Total Documents Skipped">0</str> Indexing of 273725 rows is taking almost 9 hours. Please find below my Data config file <dataConfig> <dataSource driver="com.metamatrix.jdbc.MMDriver" url="jdbc:" /> <document name="doc"> <entity name="object" query="select objectuid as uid, objectid, objecttype, objectname, repositoryname, a.lastupdateddate from MetaModel.POC.Object a, MetaModel.POC.Repository b where a.repositoryid = b.repositoryid" transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer"> <field column="objectname" name="name"/> <field column="uid" name="uid"/> <field column="objectid" name="id"/> <field column="objecttype" name="type"/> <field column="repositoryname" name="repository"/> <entity name="property" query="select ObjectUID,ObjectPropertyName as name, ObjectPropertyValue as value from MetaModel.POC.ObjectProperty" processor="CachedSqlEntityProcessor" cacheKey="ObjectUID" cacheLookup="object.uid" transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer"> <field column="value" name="${property.name}"/> </entity> <entity name="relationship_entity" query="select OBJECT1uid,Object2name as rname,Object2type as rtype,relationshiptype as rship, b.RepositoryName as rrepname from MetaModel.POC.BinaryRelationShip a, MetaModel.POC.Repository b where a.Object2RepositoryId=b.repositoryId" processor="CachedSqlEntityProcessor" cacheKey="OBJECT1uid" cacheLookup="object.uid" transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer"> <field column="rship" name="relationship"/> <field column="rname" name="related_name" /> <field column="rtype" name="related_type"/> <field column="rrepname" name="repositoryname"/> </entity> </entity> </document> Time taken to directly query the database with the above mentioned SQL statements, select objectuid as uid, objectid, objecttype, objectname, repositoryname, a.lastupdateddate from MetaModel.POC.Object a, MetaModel.POC.Repository b where a.repositoryid = b.repositoryid ---> 3 minutes select ObjectUID,ObjectPropertyName as name, ObjectPropertyValue as value from MetaModel.POC.ObjectProperty --> 5 minutes select OBJECT1uid,Object2name as rname,Object2type as rtype,relationshiptype as rship, b.RepositoryName as rrepname from MetaModel.POC.BinaryRelationShip a, MetaModel.POC.Repository b where a.Object2RepositoryId=b.repositoryId" --> 3 seconds As I am using CachedSqlEntityProcessor I assume that SOLR first issues these select statements (mentioned above first) and then it match based on cacheKey (from caching), so SOLR should ideally take (addition of time taken to execute the above 3 queries + some time for doing filtering based on cacheKey ). But in my case its taking hours and hours for indexing. Can someone please let me know if I am doing anything wrong which might cause this issue? Thanks, Barani -- View this message in context: http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-300000-rows-tp27805403p27805403.html Sent from the Solr - User mailing list archive at Nabble.com.