yes. that's right
On Thu, Jun 20, 2013 at 8:16 PM, Constantin Wolber < constantin.wol...@medicalcolumbus.de> wrote: > Hi, > > i may have been a little to fast with my response. > > After reading a bit more I imagine you meant running the full-import with > the entity param for the root entity for full import. And running the delta > import with the entity param for the delta entity. Is that correct? > > Regards > > Constantin > > > -----Ursprüngliche Nachricht----- > Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] > Gesendet: Donnerstag, 20. Juni 2013 16:42 > An: solr-user@lucene.apache.org > Betreff: AW: DataImportHandler: Problems with delta-import and > CachedSqlEntityProcessor > > Hi, > > and thanks for the answer. But I'm a little bit confused about what you > are suggesting. > I did not really use the rootEntity attribute before. But from what I read > in the documentation as far as I can tell that would result in two > documents (maybe with the same id which would probably result in only one > document being stored) because one for each root entity. > > It would be great if you could just sketch the setup with the entities I > provided. Because currently I have no idea on how to do it. > > Regards > > Constantin > > > -----Ursprüngliche Nachricht----- > Von: Noble Paul നോബിള് नोब्ळ् [mailto:noble.p...@gmail.com] > Gesendet: Donnerstag, 20. Juni 2013 15:42 > An: solr-user@lucene.apache.org > Betreff: Re: DataImportHandler: Problems with delta-import and > CachedSqlEntityProcessor > > it is possible to create two separate root entities . one for full-import > and another for delta. for the delta-import you can skip Cache that way > > > > On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber < > constantin.wol...@medicalcolumbus.de> wrote: > > > Hi, > > > > i searched for a solution for quite some time but did not manage to > > find some real hints on how to fix it. > > > > > > I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in > > a tomcat 6 container. > > > > My data import setup is basically the following: > > > > Data-config.xml: > > > > <entity > > name="article" > > dataSource="ds1" > > query="SELECT * FROM article" > > deltaQuery="SELECT myownid FROM articleHistory WHERE > > modified_date > '${dih.last_index_time} > > deltaImportQuery="SELECT * FROM article WHERE > > myownid=${dih.delta.myownid}" > > pk="myownid"> > > <field column="myownid" name="id"/> > > > > <entity > > name="supplier" > > dataSource="ds2" > > query="SELECT * FROM supplier WHERE status=1" > > processor="CachedSqlEntityProcessor" > > cacheKey="SUPPLIER_ID" > > cacheLookup="article.ARTICLE_SUPPLIER_ID"> > > </entity> > > > > <entity > > name="attributes" > > dataSource="ds1" > > query="SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' > > Value:'+ATTRIBUTE_VALUE FROM attributes" > > cacheKey="ARTICLE_ID" > > cacheLookup="article.myownid" > > processor="CachedSqlEntityProcessor"> > > </entity> > > </entity> > > > > > > Ok now for the problem: > > > > At first I tried everything without the Cache. But the full-import > > took a very long time. Because the attributes query is pretty slow > > compared to the rest. As a result I got a processing speed of around 150 > Documents/s. > > When switching everything to the CachedSqlEntityProcessor the full > > import processed at the speed of 4000 Documents/s > > > > So full import is running quite fine. Now I wanted to use the delta > > import. When running the delta import I was expecting the ramp up time > > to be about the same as in full import since I need to load the whole > > table supplier and attributes to the cache in the first step. But when > > looking into the log file the weird thing is solr seems to refresh the > > Cache for every single document that is processed. So currently my > > delta-import is a lot slower than the full-import. I even tried to add > > the deltaImportQuery parameter to the entity but it doesn't change the > > behavior at all (of course I know it is not supposed to change anything > in the setup I run). > > > > The following solutions would be possible in my opinion: > > > > 1. Is there any way to tell the config to ignore the Cache when > > running a delta import? That would help already because we are talking > > about the maximum of 500 documents changed in 15 minutes compared to > > over 5 million documents in total. > > 2. Get solr to not refresh the cash for every document. > > > > Best Regards > > > > Constantin Wolber > > > > > > > -- > ----------------------------------------------------- > Noble Paul > -- ----------------------------------------------------- Noble Paul