Hello, I want to use a index a huge list of xml file. _ Using FileListEntityProcessor causes an OutOfMemoryException (too many files...) _ I can do it using a LineEntityProcessor reading a list of files, generated externally, but I would prefer to generate the list in SOLR _ So to avoid to mantain a list of files, I'm trying to generate the list with an sql query, and to give the list of results to XPathEntityProcessor, which will read the file
The query select DISTINCT... generate this result CHEMINRELATIF 3/0/000/30000001 But the problem is that with the following configuration, no request do db is done, accoring to the message returned by DIH. "statusMessages":{ "Total Requests made to DataSource":"0", "Total Rows Fetched":"0", "Total Documents Processed":"0", "Total Documents Skipped":"0", "":"Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.", "Committed":"2013-05-30 10:23:30", "Optimized":"2013-05-30 10:23:30", And the log: INFO 2013-05-30 10:23:29,924 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (121) - Loading DIH Configuration: mnb-data-config.xml INFO 2013-05-30 10:23:29,957 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (224) - Data Configuration loaded successfully INFO 2013-05-30 10:23:29,969 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (414) - Starting Full Import INFO 2013-05-30 10:23:30,009 http-8080-1 org.apache.solr.handler.dataimport.SimplePropertiesWriter (219) - Read dataimportMNb.properties INFO 2013-05-30 10:23:30,045 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (292) - Import completed successfully Did some has already done the kind of configuration, or is just not possible? The config: <dataConfig> <dataSource name="accesPCN" type="JdbcDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@mymachine:myport:mydb" user="myuser" password="mypasswd" readOnly="true"/> <document> <entity name="requeteurNomsFichiersNotices" datasource="accesPCN" processor="SqlEntityProcessor" query="select DISTINCT... SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 3, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 4, 1) || '/' || SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 5, 3) || '/' || to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebib where numnoticebib = '30000001'" transformer="LogTransformer" logTemplate="In entity requeteurNomsFichiersNotices" logLevel="debug" > <entity name="processorDocument" processor="XPathEntityProcessor" url="file:///D:/jed/noticesBib/$ {accesPCN.CHEMINRELATIF}" xsl="xslt/mnb/IXM_MNb.xsl" forEach="/record" transformer="LogTransformer,fr.bnf.solr.BnfDateTransformer" logTemplate="Notice fichier: $ {accesPCN.CHEMINRELATIF}" logLevel="debug" datasource="accesPCN" > I'm trying to inde Cordialement, ----------------------------------------------- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr ----------------------------------------------- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.