Dear Kellen, Brent & Keith,

There now are fixes available for 2 cache-related bugs that unfortunately made 
their way into the 3.6.0 release.  These were addressed on these 2 JIRA issues, 
which have been committed to the 3.6 branch (as of today):
- https://issues.apache.org/jira/browse/SOLR-3430
- https://issues.apache.org/jira/browse/SOLR-3360
These problem were also affecting Trunk/4.x, with both fixes being committed to 
Trunk under SOLR-3430.

Should Solr 3.6.1 be released, these fixes will become generally available at 
that time.  They also will be part of the 4.0 release, which the Development 
Community hopes will be later this year.

In the mean time, I am hoping each of you can test these fixes with your 
installation.  The best way to do this is to get a fresh SVN checkout of the 
3.6.1 branch 
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch 
to the "solr" directory, then run "ant dist".  I believe you need Ant 1.8 to 
build.

If you are unable to build yourself, I put an *unofficial* shapshot of the DIH 
jar here:
 
http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar

Please let me know if this solves your problems with DIH Caching, giving you 
the functionality you had with 3.5 and prior.  Your feedback is greatly 
appreciatd.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: not interesting [mailto:dye.kel...@gmail.com] 
Sent: Monday, May 07, 2012 9:43 AM
To: solr-user@lucene.apache.org
Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 
3.6?

I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same
data-import.xml for both versions. The import functioned properly with
3.4.

I'm using a nested entity to fetch authors associated with each
document, and I'm using CachedSqlEntityProcessor to avoid hitting the
DB an unreasonable number of times. However, when indexing, Solr
indexes very slowly and appears to be fetching all authors in the DB
for each document. The index should be ~500 megs; I aborted the
indexing when it reached ~6gigs. If I comment out the nested author
entity below, Solr will index normally.

Am I missing something obvious or is this a bug?

<document name="documents">
    <entity name="document" dataSource="production"
     transformer="HTMLStripTransformer,TemplateTransformer,RegexTransformer"
     query="select id, ..., from document">
        <field column="id" name="id"/>
        <field column="uid" name="uid" template="DOC${document.id}"/>
        <!-- more fields .. -->
        <entity name="author" dataSource="production"
         query="select
                cast(da.document_id as text) as document_id,
                a.id, a.name, a.signature from document_author da
                left outer join author a on a.id = da.author_id"
         cacheKey="document_id"
         cacheLookup="document.id"
         processor="CachedSqlEntityProcessor">
             <field name="author_id" column="id" />
             <field name="author" column="name" />
             <field name="author_signature" column="signature" />
        </entity>
    </entity>
</document>

Also posted at SO if you prefer to answer there:
http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6

Kellen

Reply via email to