DIH doucments not indexed because of loss in xsl transformation.
Hello I'm indexing xml files with xpathEntityProcessor, and for some hundreads documents on 12 millions are not processed. When I tried to index only one of the KO documents it doesn't either index. So it's not a matter of big number of documents. We tried to do the xslt transformation externaly, to catch the xml transformed and to index it in SOLR, it worked. So the doc seems OK. I looked on the doc, it was big, so I commented a part, it has been indexed in solr with xsl transform. So I downloaded the dih code and I debugged the execution of these lines, which launch the xsl transformation, to see what was happening exactly SimpleCharArrayReader caw = new SimpleCharArrayReader(); xslTransformer.transform(new StreamSource(data), new StreamResult(caw)); data = caw.getReader(); It appeared that the caw missed data, so the xsltTransformer didn't work correctly. Digging further in TransformerImpl code, I see the content of my xml file in some buffer but somewhere something goes wrong, that I don't understand ( it's getting very tricky for me). xslTransformer is from class com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl Is there a mean to change the xslt transformer class, or is there a known limitation of size in this xmltransformer, which can be increased? I've work in solr 4.2 and then in solr 4.6. Thank in advance Regards Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Astérix à la BnF ! - du 16 octobre 2013 au 19 janvier 2014 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
using facet enum et fc in the same query.
Hello, I have a solr index (12 M docs, 45Go) with facets, and I'm trying to improve facet queries performances. 1/ I tried to use docvalue on facet fields, it didn't work well 2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more 15s to 2s for longest queries) 3/ I'm trying to use facet.method=enum. It's supposed to improve the performance for facets fileds with few differents values. (type of documents, things like that) My problem is that I don't know if there is a way to specifiy enum method for some facets (3 to 5000 different values), and fc method the some others (up to 12M different values) and the same query? Is it possible with something like MyFacet..facet.method=enum ? Thanks in advance for the answer. --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Participez à l'acquisition d'un Trésor national - Le manuscrit royal de François I er Avant d'imprimer, pensez à l'environnement.
RE: RE: using facet enum et fc in the same query.
First Thanks very much for your answers, and Alan's one >> I have a solr index (12 M docs, 45Go) with facets, and I'm trying to improve facet queries performances. >> 1/ I tried to use docvalue on facet fields, it didn't work well > That was surprising, as the normal result of switching to DocValues is positive. Can you elaborate on what you did and how it failed? When I said it failed, I just meant I was a little bit slower. >> 2/ I tried facet.threads=-1 in my queries, and worked perfectely (from more >> 15s to 2s for longest queries) > That tells us that your primary problem is not IO. If your usage is normally single-threaded > that can work, but it also means that you have a lot of CPU cores standing idle most of the > time. How many fields are you using for faceting and how many of them are large (more unique > values than the 5000 you mention)? The "slow" request corresponds to our website search query. It for our book catalog: some facets are for type of documents, author, title subjets, location of the book, dates... In this request we have now 35 facets. About unique value, for the "slow" query: 1 facet goes up to 4M unique values (authors), 1 facet has 250.000 uniques values 1 have 5 1 have 6700 4 have between 300 and 1000 5 have between 100 and 160 16 have less than 65 >> 3/ I'm trying to use facet.method=enum. It's supposed to improve the >> performance for facets fileds with few differents values. (type of >> documents, things like that) > Having a mix of facet methods seems like a fine idea, although my personal experience is that > enums gets slower than fc quite earlier than the 5000 unique values mark. As Alan states, > the call is f.myfacetfield.facet.method=enum (Remember the 'facet.'-part. See > https://wiki.apache.org/solr/SimpleFacetParameters#Parameters >for details). >Or you could try Sparse Faceting (Disclaimer: I am the author), which seems to fit your setup >very well: http://tokee.github.io/lucene-solr/ Right now we use solr 4.6, and we soon deliver our relsease, and I'm afraid I won't have time to try this time, but I can try for next release (next month I think). Thanks very much again Jerome Dupont jerome.dupont_at#bnf.fr Participez à l'acquisition d'un Trésor national - Le manuscrit royal de François I er Avant d'imprimer, pensez à l'environnement.
[SOLR 4.4 or 4.2] indexing with dih and solrcloud
Hello, I'm trying to index documents with Data import handler and solrcloud at the same time. (huge collection, need to make parallel indexing) First I had a dih configuration whichs works with solr standalone. (Indexing for two month every week) I've transformed my configuration to "cloudify" it with one shard at the begining (adding config file + launching with zkrun option) I see my solr admin interface with the cloud panels (tree view, 1 shard connected and active ...), so it seems to work. When I indexusing DIH, it looks like it was working, the entry xml files are read but no documents are stored in the index, exactly as I would have put commit argument to false. This is the answer of dih request { "responseHeader":{ "status":0, "QTime":32871}, "initArgs":[ "defaults",[ "config","mnb-data-config.xml"]], "command":"full-import", "mode":"debug", "documents":[], "verbose-output":[ "entity:noticebib",[ "entity:processorDocument",[], ... "entity:processorDocument",[], null,"--- row #1-", "CHEMINRELATIF","3/7/000/37000143.xml", null,"-", ... "status":"idle", "importResponse":"", "statusMessages":{ "Total Requests made to DataSource":"16", "Total Rows Fetched":"15", "Total Documents Skipped":"0", "Full Dump Started":"2013-08-29 12:08:48", "Total Documents Processed":"0", "Time taken":"0:0:32.684"}, In the logs (see above), I see PRE_UPDATE FINISH message And after, some debug messages about "Could not retrieve configuration" coming from zookeeper. So my question, what can be wrong in my config? _ something about synchro in zookeeper (could not retrieve message) _ A step missing in data import handler I don't see how to diagnose that point? DEBUG 2013-08-29 12:09:21,411 http-8080-1 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/000/37000190.xml DEBUG 2013-08-29 12:09:21,520 http-8080-1 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/000/37000190.xml DEBUG 2013-08-29 12:09:21,520 http-8080-1 fr.bnf.solr.BnfDateTransformer (696) - NN=37000190 INFO 2013-08-29 12:09:21,520 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (267) - Time taken = 0:0:32.684 DEBUG 2013-08-29 12:09:21,536 http-8080-1 org.apache.solr.update.processor.LogUpdateProcessor (178) - PRE_UPDATE FINISH {{params (optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5),defaults (config=mnb-data-config.xml)}} INFO 2013-08-29 12:09:21,536 http-8080-1 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/dataimportMNb params= {optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5} {} 0 32871 DEBUG 2013-08-29 12:09:21,583 http-8080-1 org.apache.solr.servlet.SolrDispatchFilter (388) - Closing out SolrRequest: {{params (optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5),defaults (config=mnb-data-config.xml)}} DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 SyncThread:0 org.apache.zookeeper.server.FinalRequestProcessor (88) - Processing request:: sessionid:0x140c98bbe43 type:getData cxid:0x39d zxid:0xfffe txntype:unknown reqpath:/overseer_elect/leader DEBUG 2013-08-29 12:09:21,833 SyncThread:0 org.apache.zookeeper.server.FinalRequestProcessor (160) - sessionid:0x140c98bbe43 type:getData cxid:0x39d zxid:0xfffe txntype:unknown reqpath:/overseer_elect/leader DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080) org.apache.zookeeper.client.ZooKeeperSaslClient (519) - Could not retrieve login configuration: java.lang.SecurityException: Impossible de trouver une configuration de connexion PS: At the begining, I was in solr 4.2.1 and I tried with 4.0.0, but I have the same problem. Re
Re :Re: [SOLR 4.4 or 4.2] indexing with dih and solrcloud
Hello again Finally, I found the problem. It seems that _ The indexation request was done with an http GET and not with POST, because I was lauching it from a favorite in my navigator. Launching indexation on my documents by the admin interface made indexation work. _ Antoher problem was that some documents are not indexed (in particular the firsts of the list) for some reason (due to our configuration), So when I was trying on the ten first documents, it couldn't owrk. Now I will try with 2 shards... Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
solr cloud and DIH, indexation runs only on one shard.
Hello again, I still trying to index a with solr cloud and dih. I can index but it seems that indexation is done on only 1 shard. (my goal was to parallelze that to go fast) This my conf: I have 2 tomcat instances, One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080) The other with the second shard. (port 9180) In my admin interface, I see 2 shards, each one is leader When I launch the dih, documents are indexed. But only the shard1 is working. http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-import&entity=noticebib&optimize=true&indent=true&clean=true&commit=true&verbose=false&debug=false&wt=json&rows=1000 In my first shard, I see messages coming from my indexation process: DEBUG 2013-09-03 11:48:57,801 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002118.xml DEBUG 2013-09-03 11:48:57,832 Thread-12 org.apache.solr.handler.dataimport.URLDataSource (92) - Accessing URL: file:/X:/3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 org.apache.solr.handler.dataimport.LogTransformer (58) - Notice fichier: 3/7/002/37002120.xml DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer (696) - NN=37002120 In the second instance, I just have this kind of logs, at it was receiving notifications from zookeeper of new updates INFO 2013-09-03 11:48:57,323 http-9180-7 org.apache.solr.update.processor.LogUpdateProcessor (198) - [noticesBIB] webapp=/solr-0.4.0-pfd path=/update params= {distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/&update.distrib=TOLEADER&wt=javabin&version=2} {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464), 37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817 (1445149264891084800), 37001819 (1445149264896327680), 37001837 (1445149264900521984), 37001861 (1445149264903667712), 37001869 (1445149264907862016), 37001963 (1445149264912056320)]} 0 41 I supposed there was a confusion between cores names and collection name, and I tried to change the name of the collection, but it solved nothing. When I come to dih interfaces, in shard1, I see indexation processing, and on shard 2 "no information available" Is there something specia to do to distributre indexation process? Should I run zookeeper on both instances (even if it's not mandatory? ... Regards Jerome Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
Re: solr cloud and DIH, indexation runs only on one shard.
It works I've done what you said: _ In my request to get list of documents, I add a where clause filtering on the select getting the documents to index: where noticebib.numnoticebib LIKE '%${dataimporter.request.suffixeNotice}'" _ And I called my dih on each shard with the parameter suffixeNotice=2 or suffixeNotice=1 Each shard indexed its part on the same time. (more or less 1000 do each one). When I execute a select on the collection, I get more or less 2000 documents. No my goad is to merge indexes, but that's another story. Another possiblity would have been to play with rows and start parameters, but it supooses 2 things _ to know the number of documents _ add an order by clause to make sure the subsets of document are disjoints (and even in that case, I'm not completly sure, because the source database can change) Thanks very much !! Jerôme Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 septembre 2013 Avant d'imprimer, pensez à l'environnement.
[DIH] Logging skipped documents
Hello, I have a question, I index documents and a small part them are skipped, (I am in onError="skip" mode) I'm trying to get a list of them, in order to analyse what's worng with these documents Is there a mean to get the list of skipped documents, and some more information (my onError="skip" is in an XPathEntityProcessor, the name of the file processed would be OK) Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Participez à la Grande Collecte 1914-1918 Avant d'imprimer, pensez à l'environnement.
error while indexing huge filesystem with data import handler and FileListEntityProcessor
Hello, We are trying to use data import handler and particularly on a collection which contains many file (one xml per document) Our configuration works for a small amount of files, but dataimport fails with OutofMemory Error when running it on 10M files (in several directories...) This is it the content of our config.xml: When we try it on a directory which contains 10 subdirectoies each subdir containing 1000 subdirectories, each one containing 1000 xml files (10M files, so), indexation process doesn't work anymore, We have a java.outofmemory excpetion (even with 512 Mo and 1GB memory) ERROR 2013-05-24 15:26:25,733 http-9145-2 org.apache.solr.handler.dataimport.DataImporter (96) - Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to java.lang.Exception at org.apache.solr.handler.dataimport.DocBuilder.execute (DocBuilder.java:266) at org.apache.solr.handler.dataimport.DataImporter.doFullImport (DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd (DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody (DataImportHandler.java:179) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) Monitoring the jvm with visualvm, I've seen that most of time is taken by the method FileListEntityProcessor.accept (called by getFolderFiles), so I assumed that the error occured when filling list of files to be indexed: Indeed the list of files is done by this method which called by getFolderFiles. Basically, the list of files to index is done by getFolderFiles, itself called at first call to nextRow(). The indexation itself starts only after that. org/apache/solr/handler/dataimport/FileListEntityProcessor.java private void [More ...] getFolderFiles(File dir, final List> fileDetails) { I found back the variable fileDetails which contains the list of my xml files. It contains 611345 entries (for approximatively 500 Mo of memory). And I have 10M xml files (more or less...). That why I think it's not finished yet. To get the entire list I guess I need something between 5 and 10 Go for my process. So I have several questions : _ Is it possible to have severalFileListEntityProcessor attached to only one XPathEntityProcessor in the data-config.xml : Like this I can do it in ten times, with my 10 directories of first level. _ Is there a roadmap to optimize this method, for example by not doing the list of all file in the first time, but each 1000 documents, for instance? _ Or to store the file list in a temporary file in order to save some memory? Regards, --- Jérôme Dupont --- Exposition Jean de Gonet, relieur - jusqu'au 21 juillet 2013 - BnF - François-Mitterrand / Galerie François 1 er Jean de Gonet dédicacera le catalogue de l'exposition le samedi 25 mai de 16h30 à 18 heures à l'entrée de l'exposition. Avant d'imprimer, pensez à l'environnement.
Re: Re: error while indexing huge filesystem with data import handler and FileListEntityProcessor
The configuraiton works with LineEntityProcessor, with few documents (havn (t test with many documents yet. For information this the config ... fields defintion file:///D:/jed/noticesBib/listeNotices.txt contains the follwing lines jed/noticesBib/3/4/307/34307035.xml jed/noticesBib/3/4/307/34307082.xml jed/noticesBib/3/4/307/34307110.xml jed/noticesBib/3/4/307/34307197.xml jed/noticesBib/3/4/307/34307350.xml jed/noticesBib/3/4/307/34307399.xml ... (Could have containes all the location with the beginning, but I wanted to test the concatenation of filename. That works fine, thanks for the help!! Next step, the same without using a file. (I'll write it in another post). Regards, Jérôme Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
[DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Hello, I want to use a index a huge list of xml file. _ Using FileListEntityProcessor causes an OutOfMemoryException (too many files...) _ I can do it using a LineEntityProcessor reading a list of files, generated externally, but I would prefer to generate the list in SOLR _ So to avoid to mantain a list of files, I'm trying to generate the list with an sql query, and to give the list of results to XPathEntityProcessor, which will read the file The query select DISTINCT... generate this result CHEMINRELATIF 3/0/000/3001 But the problem is that with the following configuration, no request do db is done, accoring to the message returned by DIH. "statusMessages":{ "Total Requests made to DataSource":"0", "Total Rows Fetched":"0", "Total Documents Processed":"0", "Total Documents Skipped":"0", "":"Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.", "Committed":"2013-05-30 10:23:30", "Optimized":"2013-05-30 10:23:30", And the log: INFO 2013-05-30 10:23:29,924 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (121) - Loading DIH Configuration: mnb-data-config.xml INFO 2013-05-30 10:23:29,957 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (224) - Data Configuration loaded successfully INFO 2013-05-30 10:23:29,969 http-8080-1 org.apache.solr.handler.dataimport.DataImporter (414) - Starting Full Import INFO 2013-05-30 10:23:30,009 http-8080-1 org.apache.solr.handler.dataimport.SimplePropertiesWriter (219) - Read dataimportMNb.properties INFO 2013-05-30 10:23:30,045 http-8080-1 org.apache.solr.handler.dataimport.DocBuilder (292) - Import completed successfully Did some has already done the kind of configuration, or is just not possible? The config: I'm trying to inde Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Hi, Thanks for your anwser, it made me go ahead. The name of the entity was not good, not consistent with schema Now the first entity works fine: the query is done to the database and returns the good result. The problem is that the second entity, which is a XPathEntityProcessor entity, doesn't read the file specified in url attribute, but tries to execute it as an sql query on my database. I tried to put a fake query (select 1 from dual) but it changes nothing. It's like the XPathEntityProcessor entity behaved like an SqlEntityProcessor, using url attribute instead of query attrbute. I've forgotten to say which version I use: SOLR 4.2.1 (can be changed, it's just the beginning of the developpement) See next the config, and the return message: The verbose output: "verbose-output":[ "entity:noticebib",[ "query","select DISTINCT SUBSTR( to_char(noticebib.numnoticebib, '9'), 3, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib, '9'), 4, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib, '9'), 5, 3) || '/' ||to_char(noticebib.numnoticebib) || '.xml' as CHEMINRELATIF from bnf.noticebibwhere numnoticebib = '3001'", "time-taken","0:0:0.141", null,"--- row #1-", "CHEMINRELATIF","3/0/000/3001.xml", null,"-", "entity:processorDocument",[ "document#1",[ "query","file:///D:/jed/noticesbib/3/0/000/3001.xml", "EXCEPTION","org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: file:///D:/jed/noticesbib/3/0/000/3001.xml Processing Document # 1\r\n\tat org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow (DataImportHandlerException.java:71)\r\n\tat ... oracle.jdbc.driver.OracleStatementWrapper.execute (OracleStatementWrapper.java:1203)\r\n\tat org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator. (JdbcDataSource.java:246)\r\n\t... 32 more\r\n", "time-taken","0:0:0.124", This is the configuration Cordialement, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
RE: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor
Thanks very much, it works, with dataSource (capital S) !!! Finally, I didn't have to define a "CHEMINRELATIF" field in the configuration, it's working without it. This is the definive working configuration: Thanks again! --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Exposition Guy Debord, un art de la guerre - du 27 mars au 13 juillet 2013 - BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à l'environnement.
Issue with dataimport xml validation with dtd and jetty: conflict of use for user.dir variable
Hello, I use solr and dataimport to index xml files with a dtd. The dtd is referenced like this Previously we were using solr4 in a tomcat container. During the import process, solr tries to validate the xml file with the dtd. To find it we were defining -Duser.dir=pathToDtD and solr could find te dtd and validation was working Now, we are migrating to solr7 (and jetty embedded) When we start solr with -a "-Duser.dir=pathToDtd", solr doesn't start and returns an error: Cannot find jetty main class So I removed the a "-Duser.dir=pathToDtd" option, and solr starts. BUT Now solr cannot anymore open xml file, because it doesn't find the dtd during validation stage. Is there a way to: - activate an xml catalog file to indicate where the dtd is? (Seems it would be the better way, fat I didn't find how to do) - disable dtd validation Regards, --- Jérôme Dupont Bibliothèque Nationale de France Département des Systèmes d'Information Tour T3 - Quai François Mauriac 75706 Paris Cedex 13 téléphone: 33 (0)1 53 79 45 40 e-mail: jerome.dup...@bnf.fr --- Pass BnF lecture/culture : bibliothèques, expositions, conférences, concerts en illimité pour 15 € / an – Acheter en ligne Avant d'imprimer, pensez à l'environnement.