I just configured with the caching and it works mighty fast now. Instead of unbelievable amount queries it queris only 4 times. CPU usage has moved from the db to the solr computer but only for a very short time.
Problem : I dont see the multi value fields (Inner Entities) anymore This is my configuration <entity name="PackageVersion" pk="PackageVersionId" query="select PackageVersion.Id PackageVersionId, .... from ...."> <entity name="PackageTag" pk="ResourceId" processor="CachedSqlEntityProcessor" where="ResourceId = '${PackageVersion.PackageId}'" query="SELECT [Text] PackageTag from [dbo].[Tag]"> </entity> <entity name="PackageVersionTag" pk="ResourceId" processor="CachedSqlEntityProcessor" where="ResourceId = PackageVersion.PackageVersionId" query="SELECT [Text] PackageVersionTag from [dbo].[Tag]"> </entity> <entity name="LibraryItem" pk="ResourceId" processor="CachedSqlEntityProcessor" where="Asset.[PackageVersionId] = PackageVersion.PackageVersionId" query="select CatalogVendorPartNum SKU, LibraryItems.[Description] SKUDescription FROM ... INNER JOIN ... ON Asset.Id = LibraryVendors.DesignProjectId INNER JOIN ... ON LibraryVendors.LibraryVendorId = LibraryItems.LibraryVendorId WHERE Asset.[AssetTypeId]=1"> </entity> </entity> Now, when I query http://localhost:8983/solr/vaultCache/select?q=*&indent=true it returns only the main entity attriburtes. Where are my inner entities attributes now? Thanks a lot. On Thu, Jun 27, 2013 at 10:15 AM, Gora Mohanty <g...@mimirtech.com> wrote: > On 27 June 2013 12:32, Mysurf Mail <stammail...@gmail.com> wrote: > > > > I have a relational database model > > This is the basics of my data-config.xml > > > > <entity name="MyMainEntity" pk="pID" query="select ... from > [dbo].[TableA] > > inner join TableB on ..."> > > <entity name="Entity1" pk="Id1" query="SELECT [Text] Tag from [Table2] > > where ResourceId = '${MyMainEntity.pId}'"></entity> > > <entity name="Entity1" pk="Id2" query="SELECT [Text] Tag > > from [Table2] where ResourceId2 = '${MyMainEntity.pId}'"></entity> > > <entity name="LibraryItem" pk="ResourceId" > > query="select SKU > > FROM [TableB] > > INNER JOIN ... > > ON ... > > INNER JOIN ... > > ON ... > > WHERE ... AND ...'"> > > </entity> > > </entity> > > > > Now, this takes a lot of time. > > 10000 rows in the first query and then each other inner entities are > > fetched later (around 10 rows each). > > > > If I use a db profiler I see a the three inner entities query running > over > > and over (3 select sentences than again 3 select sentences over and over) > > This is really not efficient. > > And the import can run over 40 hrs () > > Now, > > What are my options to run it faster . > > 1. Obviously there is an option to flat the tables to one big table - but > > that will create a lot of other side effects. > > I would really like to avoid that extra effort and run solr on my > > production relational tables. > > So far it works great out of the box and I am searching here if there > > is a configuration tweak. > > 2. If I will flat the rows that - does the schema.xml need to be change > > too? or the same fields that are multivalued will keep being multivalued. > > You have not shared your actual queries, so it is difficult > to tell, but my guess would be that it is the JOINs that > are the bottle-neck rather than the SELECTs. You should > start by: > 1. Profile queries from the database back-end to see > which are taking the most time, and try to simplify > them. > 2. Make sure that relevant database columns are indexed. > This can make a huge difference, though going overboard > in indexing all columns might be counter-productive. > 3. Use Solr DIH's CachedSqlEntityProcessor: > http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor > 4. Measure the time that Solr indexing takes: From your > description, you seem to be guessing at it. > > In general, you should not flatten the records in the > database as that is supposed to be relational data. > > Regards, > Gora >