I don't want to dissuade you from trying but I believe FileListEntityProcessor 
has something special coded up into it to allow for its unique usage.  Not sure 
if your approach isn't do-able.  I would imagine that fixing FLEP to handle a 
row-at-a-time or page-at-a-time in memory wouldn't be terribly hard, but 
haven't looked either.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, May 30, 2013 6:08 AM
To: solr-user@lucene.apache.org
Subject: Re: [DIH] Using SqlEntity to get a list of files and read files in 
XpathEntityProcessor

Did you declare that field name in outer entity? Not just "select as" in
the query.

Regards,
      Alex
On 30 May 2013 04:31, <jerome.dup...@bnf.fr> wrote:

>
> Hello,
>
> I want to use a index a huge list of xml file.
> _ Using FileListEntityProcessor causes an OutOfMemoryException (too many
> files...)
> _ I can do it using a LineEntityProcessor reading a list of files,
> generated externally, but I would prefer to generate the list in SOLR
> _ So to avoid to mantain a list of files, I'm trying to generate the list
> with an sql query, and to give the list of results to XPathEntityProcessor,
> which will read the file
>
> The query select DISTINCT... generate this result
> CHEMINRELATIF
> 3/0/000/30000001
>
> But the problem is that with the following configuration, no request do db
> is done, accoring to the message returned by DIH.
>
>  "statusMessages":{
>     "Total Requests made to DataSource":"0",
>     "Total Rows Fetched":"0",
>     "Total Documents Processed":"0",
>     "Total Documents Skipped":"0",
>     "":"Indexing completed. Added/Updated: 0 documents. Deleted 0
> documents.",
>     "Committed":"2013-05-30 10:23:30",
>     "Optimized":"2013-05-30 10:23:30",
>
> And the log:
> INFO 2013-05-30 10:23:29,924 http-8080-1
> org.apache.solr.handler.dataimport.DataImporter  (121) - Loading DIH
> Configuration: mnb-data-config.xml
> INFO 2013-05-30 10:23:29,957 http-8080-1
> org.apache.solr.handler.dataimport.DataImporter  (224) - Data Configuration
> loaded successfully
> INFO 2013-05-30 10:23:29,969 http-8080-1
> org.apache.solr.handler.dataimport.DataImporter  (414) - Starting Full
> Import
> INFO 2013-05-30 10:23:30,009 http-8080-1
> org.apache.solr.handler.dataimport.SimplePropertiesWriter  (219) - Read
> dataimportMNb.properties
> INFO 2013-05-30 10:23:30,045 http-8080-1
> org.apache.solr.handler.dataimport.DocBuilder  (292) - Import completed
> successfully
>
>
> Did some has already done the kind of configuration, or is just not
> possible?
>
> The config:
> <dataConfig>
>                 <dataSource name="accesPCN" type="JdbcDataSource"
> driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@mymachine:myport:mydb" user="myuser"
> password="mypasswd" readOnly="true"/>
>         <document>
>                 <entity name="requeteurNomsFichiersNotices"
>                                         datasource="accesPCN"
>                                         processor="SqlEntityProcessor"
>                                         query="select DISTINCT...
>                 SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 3,
> 1) ||
> '/' ||
>                 SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 4,
> 1) ||
> '/' ||
>                 SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 5,
> 3) ||
> '/' ||
>                 to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF
>                 from bnf.noticebib
>                 where numnoticebib = '30000001'"
>                                         transformer="LogTransformer"
> logTemplate="In
> entity requeteurNomsFichiersNotices" logLevel="debug"
>                                         >
>                                                 <entity
>  name="processorDocument"
>
> processor="XPathEntityProcessor"
>
> url="file:///D:/jed/noticesBib/$
> {accesPCN.CHEMINRELATIF}"
>                                                 xsl="xslt/mnb/IXM_MNb.xsl"
>                                                 forEach="/record"
>
> transformer="LogTransformer,fr.bnf.solr.BnfDateTransformer"
>                                                 logTemplate="Notice
> fichier: $
> {accesPCN.CHEMINRELATIF}" logLevel="debug"
>                                                 datasource="accesPCN"
>                                                 >
> I'm trying to inde
> Cordialement,
> -----------------------------------------------
> Jérôme Dupont
> Bibliothèque Nationale de France
> Département des Systèmes d'Information
> Tour T3 - Quai François Mauriac
> 75706 Paris Cedex 13
> téléphone: 33 (0)1 53 79 45 40
> e-mail: jerome.dup...@bnf.fr
> -----------------------------------------------
>
> Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet
> 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez
> à l'environnement.

Reply via email to