Hello,

I want to use a index a huge list of xml file.
_ Using FileListEntityProcessor causes an OutOfMemoryException (too many
files...)
_ I can do it using a LineEntityProcessor reading a list of files,
generated externally, but I would prefer to generate the list in SOLR
_ So to avoid to mantain a list of files, I'm trying to generate the list
with an sql query, and to give the list of results to XPathEntityProcessor,
which will read the file

The query select DISTINCT... generate this result
CHEMINRELATIF
3/0/000/30000001

But the problem is that with the following configuration, no request do db
is done, accoring to the message returned by DIH.

 "statusMessages":{
    "Total Requests made to DataSource":"0",
    "Total Rows Fetched":"0",
    "Total Documents Processed":"0",
    "Total Documents Skipped":"0",
    "":"Indexing completed. Added/Updated: 0 documents. Deleted 0
documents.",
    "Committed":"2013-05-30 10:23:30",
    "Optimized":"2013-05-30 10:23:30",

And the log:
INFO 2013-05-30 10:23:29,924 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (121) - Loading DIH
Configuration: mnb-data-config.xml
INFO 2013-05-30 10:23:29,957 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (224) - Data Configuration
loaded successfully
INFO 2013-05-30 10:23:29,969 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (414) - Starting Full
Import
INFO 2013-05-30 10:23:30,009 http-8080-1
org.apache.solr.handler.dataimport.SimplePropertiesWriter  (219) - Read
dataimportMNb.properties
INFO 2013-05-30 10:23:30,045 http-8080-1
org.apache.solr.handler.dataimport.DocBuilder  (292) - Import completed
successfully


Did some has already done the kind of configuration, or is just not
possible?

The config:
<dataConfig>
                <dataSource name="accesPCN" type="JdbcDataSource"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@mymachine:myport:mydb" user="myuser"
password="mypasswd" readOnly="true"/>
        <document>
                <entity name="requeteurNomsFichiersNotices"
                                        datasource="accesPCN"
                                        processor="SqlEntityProcessor"
                                        query="select DISTINCT...
                SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 3, 1) ||
'/' ||
                SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 4, 1) ||
'/' ||
                SUBSTR( to_char(noticebib.numnoticebib, '999999999'), 5, 3) ||
'/' ||
                to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF
                from bnf.noticebib
                where numnoticebib = '30000001'"
                                        transformer="LogTransformer" 
logTemplate="In
entity requeteurNomsFichiersNotices" logLevel="debug"
                                        >
                                                <entity  
name="processorDocument"
                                                processor="XPathEntityProcessor"
                                                url="file:///D:/jed/noticesBib/$
{accesPCN.CHEMINRELATIF}"
                                                xsl="xslt/mnb/IXM_MNb.xsl"
                                                forEach="/record"

transformer="LogTransformer,fr.bnf.solr.BnfDateTransformer"
                                                logTemplate="Notice fichier: $
{accesPCN.CHEMINRELATIF}" logLevel="debug"
                                                datasource="accesPCN"
                                                >
I'm trying to inde
Cordialement,
-----------------------------------------------
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
-----------------------------------------------

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

Reply via email to