DIH doucments not indexed because of loss in xsl transformation.

2013-12-10 Thread jerome . dupont

Hello

I'm indexing xml files with xpathEntityProcessor, and for some hundreads
documents on 12 millions are not processed.

When I tried to index only one of the KO documents it doesn't either index.
So it's not a matter of big number of documents.

We tried to do the xslt transformation externaly, to catch the xml
transformed and to index it in SOLR, it worked.
So the doc seems OK.
I looked on the doc, it was big, so I commented a part, it has been indexed
in solr with xsl transform.


So I downloaded the dih code and I debugged the execution of these lines,
which launch the xsl transformation, to see what was happening exactly

  SimpleCharArrayReader caw = new SimpleCharArrayReader();
  xslTransformer.transform(new StreamSource(data),
  new StreamResult(caw));
  data = caw.getReader();

It appeared that the caw missed data, so the xsltTransformer didn't work
correctly.
Digging further in TransformerImpl code, I see  the content of my xml file
in some buffer but  somewhere something goes wrong, that I don't understand
( it's getting very tricky for me).

xslTransformer is from class
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl

Is there a mean to change the xslt transformer class,
or is there a known limitation of size in this xmltransformer, which can be
increased?

I've work in solr 4.2 and then in solr 4.6.

Thank in advance

Regards
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---

Exposition  Astérix à la BnF !  - du 16 octobre 2013 au 19 janvier 2014 - BnF - 
François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

using facet enum et fc in the same query.

2014-09-22 Thread jerome . dupont
Hello, 

I have a solr index (12 M docs, 45Go) with facets, and I'm trying to 
improve facet queries performances.
1/ I tried to use docvalue on facet fields, it didn't work well
2/ I tried facet.threads=-1 in my querie, and worked perfectely (from more 
15s  to 2s for longest queries)

3/ I'm trying to use facet.method=enum. It's supposed to improve the 
performance for facets fileds with few differents values. (type of 
documents, things like that)

My problem is that I don't know if there is a way to specifiy enum method 
for some  facets (3 to 5000 different values), and fc method the some 
others (up to 12M different values) and the same query?

Is it possible with something like MyFacet..facet.method=enum

?

Thanks in advance for the answer.

---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---


Participez à l'acquisition d'un Trésor national - Le manuscrit royal de 
François I er Avant d'imprimer, pensez à l'environnement. 

RE: RE: using facet enum et fc in the same query.

2014-09-23 Thread jerome . dupont
First Thanks very much for your answers, and Alan's one

>> I have a solr index (12 M docs, 45Go) with facets, and I'm trying to 
improve facet queries performances.
>> 1/ I tried to use docvalue on facet fields, it didn't work well

> That was surprising, as the normal result of switching to DocValues is 
positive. Can you elaborate on what you did and how it failed?

When I said it failed, I just meant I was a little bit slower.


>> 2/ I tried facet.threads=-1 in my queries, and worked perfectely (from 
more
>> 15s  to 2s for longest queries)

> That tells us that your primary problem is not IO. If your usage is 
normally single-threaded
> that can work, but it also means that you have a lot of CPU cores 
standing idle most of the
> time. How many fields are you using for faceting and how many of them 
are large (more unique
> values than the 5000 you mention)?

The "slow" request corresponds to our website search query. It for our 
book catalog: some facets are for type of documents, author, title 
subjets, location of the book, dates...

In this request we have now 35 facets.
About unique value, for the "slow" query:
1 facet goes up to 4M unique values (authors),
1 facet has 250.000 uniques values
1 have 5
1 have 6700
4 have between 300 and 1000
5 have between 100 and 160
16 have less than 65


>> 3/ I'm trying to use facet.method=enum. It's supposed to improve the
>> performance for facets fileds with few differents values. (type of
>> documents, things like that)

> Having a mix of facet methods seems like a fine idea, although my 
personal experience is that
> enums gets slower than fc quite earlier than the 5000 unique values 
mark. As Alan states,
> the call is f.myfacetfield.facet.method=enum (Remember the 
'facet.'-part. See > 
https://wiki.apache.org/solr/SimpleFacetParameters#Parameters
>for details).

>Or you could try Sparse Faceting (Disclaimer: I am the author), which 
seems to fit your setup
>very well: http://tokee.github.io/lucene-solr/


Right now we use solr 4.6, and we soon deliver our relsease, and I'm 
afraid I won't have time to try  this time, but I can try for next release 
(next month I think).

Thanks very much again
Jerome
Dupont
jerome.dupont_at#bnf.fr
Participez à l'acquisition d'un Trésor national - Le manuscrit royal de 
François I er Avant d'imprimer, pensez à l'environnement. 

[SOLR 4.4 or 4.2] indexing with dih and solrcloud

2013-08-29 Thread jerome . dupont

Hello,

I'm trying to index documents with Data import handler and solrcloud at the
same time. (huge collection, need to make parallel indexing)

First I had a dih configuration whichs works with solr standalone.
(Indexing for two month every week)

I've transformed my configuration to "cloudify" it with one shard at the
begining (adding config file + launching with zkrun option)
I see my solr admin interface with the cloud panels (tree view, 1 shard
connected and active ...), so it seems to work.

When I indexusing DIH, it looks like it was working, the entry xml files
are read but no documents are stored in the index, exactly as I would have
put commit argument to false.

This is the answer of dih request
{
  "responseHeader":{
"status":0,
"QTime":32871},
  "initArgs":[
"defaults",[
  "config","mnb-data-config.xml"]],
  "command":"full-import",
  "mode":"debug",
  "documents":[],
  "verbose-output":[
"entity:noticebib",[
  "entity:processorDocument",[],
...
  "entity:processorDocument",[],
  null,"--- row #1-",
  "CHEMINRELATIF","3/7/000/37000143.xml",
  null,"-",
...
"status":"idle",
  "importResponse":"",
  "statusMessages":{
"Total Requests made to DataSource":"16",
"Total Rows Fetched":"15",
"Total Documents Skipped":"0",
"Full Dump Started":"2013-08-29 12:08:48",
"Total Documents Processed":"0",
"Time taken":"0:0:32.684"},

In the logs (see above), I see PRE_UPDATE FINISH message
And after, some debug messages about "Could not retrieve configuration"
coming from zookeeper.

So my question, what can be wrong in my config?
_ something about synchro in zookeeper (could not retrieve message)
_ A step missing in data import handler
I don't see how to diagnose that point?

DEBUG 2013-08-29 12:09:21,411 http-8080-1
org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
file:/X:/3/7/000/37000190.xml
DEBUG 2013-08-29 12:09:21,520 http-8080-1
org.apache.solr.handler.dataimport.LogTransformer  (58) - Notice fichier:
3/7/000/37000190.xml
DEBUG 2013-08-29 12:09:21,520 http-8080-1 fr.bnf.solr.BnfDateTransformer
(696) - NN=37000190
INFO 2013-08-29 12:09:21,520 http-8080-1
org.apache.solr.handler.dataimport.DocBuilder  (267) - Time taken =
0:0:32.684
DEBUG 2013-08-29 12:09:21,536 http-8080-1
org.apache.solr.update.processor.LogUpdateProcessor  (178) - PRE_UPDATE
FINISH {{params
(optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5),defaults
(config=mnb-data-config.xml)}}
INFO 2013-08-29 12:09:21,536 http-8080-1
org.apache.solr.update.processor.LogUpdateProcessor  (198) - [noticesBIB]
webapp=/solr-0.4.0-pfd path=/dataimportMNb params=
{optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5}
 {} 0 32871
DEBUG 2013-08-29 12:09:21,583 http-8080-1
org.apache.solr.servlet.SolrDispatchFilter  (388) - Closing out
SolrRequest: {{params
(optimize=true&indent=true&start=10&commit=true&verbose=true&entity=noticebib&command=full-import&debug=true&wt=json&rows=5),defaults
(config=mnb-data-config.xml)}}
DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080)
org.apache.zookeeper.client.ZooKeeperSaslClient  (519) - Could not retrieve
login configuration: java.lang.SecurityException: Impossible de trouver une
configuration de connexion
DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080)
org.apache.zookeeper.client.ZooKeeperSaslClient  (519) - Could not retrieve
login configuration: java.lang.SecurityException: Impossible de trouver une
configuration de connexion
DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080)
org.apache.zookeeper.client.ZooKeeperSaslClient  (519) - Could not retrieve
login configuration: java.lang.SecurityException: Impossible de trouver une
configuration de connexion
DEBUG 2013-08-29 12:09:21,833 SyncThread:0
org.apache.zookeeper.server.FinalRequestProcessor  (88) - Processing
request:: sessionid:0x140c98bbe43 type:getData cxid:0x39d
zxid:0xfffe txntype:unknown reqpath:/overseer_elect/leader
DEBUG 2013-08-29 12:09:21,833 SyncThread:0
org.apache.zookeeper.server.FinalRequestProcessor  (160) -
sessionid:0x140c98bbe43 type:getData cxid:0x39d zxid:0xfffe
txntype:unknown reqpath:/overseer_elect/leader
DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080)
org.apache.zookeeper.client.ZooKeeperSaslClient  (519) - Could not retrieve
login configuration: java.lang.SecurityException: Impossible de trouver une
configuration de connexion
DEBUG 2013-08-29 12:09:21,833 main-SendThread(127.0.0.1:9080)
org.apache.zookeeper.client.ZooKeeperSaslClient  (519) - Could not retrieve
login configuration: java.lang.SecurityException: Impossible de trouver une
configuration de connexion


PS: At the begining, I was in solr 4.2.1 and I tried with 4.0.0, but I have
the same problem.

Re

Re :Re: [SOLR 4.4 or 4.2] indexing with dih and solrcloud

2013-08-29 Thread jerome . dupont

Hello again

Finally, I found the problem.
It seems that
_ The indexation request was done with an http GET and not with POST,
because I was lauching it from a favorite in my navigator.
Launching indexation on my documents by the admin interface made indexation
work.
_ Antoher problem was that some documents are not indexed (in particular
the firsts of the list) for some reason (due to our configuration), So when
I was trying on the ten first documents, it couldn't owrk.

Now I will try with 2 shards...

Jerome


Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
septembre 2013 Avant d'imprimer, pensez à l'environnement. 

solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread jerome . dupont

Hello again,

I still trying to index a with solr cloud and dih. I can index but it seems
that indexation is done on only 1 shard. (my goal was to parallelze that to
go fast)
This my conf:
I have 2 tomcat instances,
One with zookeeper embedded in solr 4.4.0 started and 1 shard (port 8080)
The other with the second shard. (port 9180)
In my admin interface, I see 2 shards, each one is leader


When I launch the dih, documents are indexed. But only the shard1 is
working.
http://localhost:8080/solr-0.4.0-pfd/noticesBIBcollection/dataimportMNb?command=full-import&entity=noticebib&optimize=true&indent=true&clean=true&commit=true&verbose=false&debug=false&wt=json&rows=1000


In my first shard, I see messages coming from my indexation process:
DEBUG 2013-09-03 11:48:57,801 Thread-12
org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
file:/X:/3/7/002/37002118.xml
DEBUG 2013-09-03 11:48:57,832 Thread-12
org.apache.solr.handler.dataimport.URLDataSource  (92) - Accessing URL:
file:/X:/3/7/002/37002120.xml
DEBUG 2013-09-03 11:48:57,966 Thread-12
org.apache.solr.handler.dataimport.LogTransformer  (58) - Notice fichier:
3/7/002/37002120.xml
DEBUG 2013-09-03 11:48:57,966 Thread-12 fr.bnf.solr.BnfDateTransformer
(696) - NN=37002120

In the second instance, I just have this kind of logs, at it was receiving
notifications from zookeeper of new updates
INFO 2013-09-03 11:48:57,323 http-9180-7
org.apache.solr.update.processor.LogUpdateProcessor  (198) - [noticesBIB]
webapp=/solr-0.4.0-pfd path=/update params=
{distrib.from=http://172.20.48.237:8080/solr-0.4.0-pfd/noticesBIB/&update.distrib=TOLEADER&wt=javabin&version=2}
 {add=[37001748 (1445149264874307584), 37001757 (1445149264879550464),
37001764 (1445149264883744768), 37001786 (1445149264887939072), 37001817
(1445149264891084800), 37001819 (1445149264896327680), 37001837
(1445149264900521984), 37001861 (1445149264903667712), 37001869
(1445149264907862016), 37001963 (1445149264912056320)]} 0 41

I supposed there was a confusion between cores names and collection name,
and I tried to change the name of the collection, but it solved nothing.
When I come to dih interfaces, in shard1, I see indexation processing, and
on shard 2 "no information available"

Is there something specia to do to distributre indexation process?
Should I run zookeeper on both instances (even if it's not mandatory?
...
Regards
Jerome



Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
septembre 2013 Avant d'imprimer, pensez à l'environnement. 

Re: solr cloud and DIH, indexation runs only on one shard.

2013-09-03 Thread jerome . dupont

It works

I've done what you said:
_ In my request to get list of documents, I add a where clause filtering on
the select getting the documents to index:
where noticebib.numnoticebib LIKE '%${dataimporter.request.suffixeNotice}'"
_ And I called my dih on each shard with the parameter suffixeNotice=2 or
suffixeNotice=1

Each shard indexed its part on the same time. (more or less 1000 do each
one).

When I execute a select on the collection, I get more or less 2000
documents.

No my goad is to merge indexes, but that's another story.

Another possiblity would have been to play with rows and start parameters,
but it supooses 2 things
_ to know the number of documents
_ add an order by clause to make sure the subsets of document are disjoints
(and even in that case, I'm not completly sure, because the source database
can change)

Thanks very much !!

Jerôme



Fermeture annuelle des sites François-Mitterrand et Richelieu du 2 au 15 
septembre 2013 Avant d'imprimer, pensez à l'environnement. 

[DIH] Logging skipped documents

2013-09-23 Thread jerome . dupont

Hello,

I have a question, I index documents and a small part them are skipped, (I
am in onError="skip" mode)
I'm trying to get a list of them, in order to analyse what's worng with
these documents
Is there a mean to get the list of skipped documents, and some more
information (my onError="skip" is in an XPathEntityProcessor, the name of
the file processed would be OK)


Cordialement,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---



Participez à la Grande Collecte 1914-1918 Avant d'imprimer, pensez à 
l'environnement. 

error while indexing huge filesystem with data import handler and FileListEntityProcessor

2013-05-24 Thread jerome . dupont

Hello,


We are trying to use data import handler and particularly on a collection
which contains many file (one xml per document)

Our configuration works  for a small amount of files, but dataimport fails
with OutofMemory Error when running it on 10M files (in several
directories...)

This is it the content of our config.xml:






When we try it on a directory which contains 10 subdirectoies each subdir
containing 1000 subdirectories, each one containing 1000 xml files (10M
files, so), indexation process doesn't work anymore,

We have a java.outofmemory excpetion (even with 512 Mo and 1GB memory)
ERROR 2013-05-24 15:26:25,733 http-9145-2
org.apache.solr.handler.dataimport.DataImporter  (96) - Full Import
failed:java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to
java.lang.Exception
at org.apache.solr.handler.dataimport.DocBuilder.execute
(DocBuilder.java:266)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport
(DataImporter.java:422)
at org.apache.solr.handler.dataimport.DataImporter.runCmd
(DataImporter.java:487)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
(DataImportHandler.java:179)
at org.apache.solr.handler.RequestHandlerBase.handleRequest
(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)

Monitoring the jvm with visualvm, I've seen that most of time is taken by
the method FileListEntityProcessor.accept (called by getFolderFiles), so I
assumed that the error occured when filling list of files to be indexed:
Indeed the list of files is done by this method which called by
getFolderFiles.

Basically, the list of files to index is done  by getFolderFiles, itself
called at first call to nextRow(). The indexation itself starts only after
that.
org/apache/solr/handler/dataimport/FileListEntityProcessor.java
  private void  [More ...] getFolderFiles(File dir, final List> fileDetails) {

I found back the variable fileDetails which contains the list of my xml
files. It contains 611345 entries (for approximatively 500 Mo of memory).
And I have 10M xml files (more or less...). That why I think it's not
finished yet.
To get the entire list I guess I need something between 5 and 10 Go for my
process.

So I have several questions :
_ Is it possible to have severalFileListEntityProcessor attached to only
one  XPathEntityProcessor in the data-config.xml : Like this I can do it in
ten times, with my 10 directories of first level.
_ Is there a roadmap to optimize this method, for example by not doing the
list of all file in  the first time, but each 1000 documents, for instance?
_ Or to store the file list in a temporary file in order to save some
memory?

Regards,
---
Jérôme Dupont
---

Exposition  Jean de Gonet, relieur  - jusqu'au 21 juillet 2013 - BnF - 
François-Mitterrand / Galerie François 1 er 
Jean de Gonet dédicacera le catalogue de l'exposition le samedi 25 mai de 16h30 
à 18 heures à l'entrée de l'exposition. Avant d'imprimer, pensez à 
l'environnement. 

Re: Re: error while indexing huge filesystem with data import handler and FileListEntityProcessor

2013-05-29 Thread jerome . dupont


The configuraiton works with LineEntityProcessor, with few documents (havn
(t test with many documents yet.
For information this the config








... fields defintion

file:///D:/jed/noticesBib/listeNotices.txt contains the follwing lines
jed/noticesBib/3/4/307/34307035.xml
jed/noticesBib/3/4/307/34307082.xml
jed/noticesBib/3/4/307/34307110.xml
jed/noticesBib/3/4/307/34307197.xml
jed/noticesBib/3/4/307/34307350.xml
jed/noticesBib/3/4/307/34307399.xml
...
(Could have containes all the location with the beginning, but I wanted to
test the concatenation of filename.

That works fine, thanks for the help!!

Next step, the same without using a file. (I'll write it in another post).

Regards,
Jérôme

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

[DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

2013-05-30 Thread jerome . dupont

Hello,

I want to use a index a huge list of xml file.
_ Using FileListEntityProcessor causes an OutOfMemoryException (too many
files...)
_ I can do it using a LineEntityProcessor reading a list of files,
generated externally, but I would prefer to generate the list in SOLR
_ So to avoid to mantain a list of files, I'm trying to generate the list
with an sql query, and to give the list of results to XPathEntityProcessor,
which will read the file

The query select DISTINCT... generate this result
CHEMINRELATIF
3/0/000/3001

But the problem is that with the following configuration, no request do db
is done, accoring to the message returned by DIH.

 "statusMessages":{
"Total Requests made to DataSource":"0",
"Total Rows Fetched":"0",
"Total Documents Processed":"0",
"Total Documents Skipped":"0",
"":"Indexing completed. Added/Updated: 0 documents. Deleted 0
documents.",
"Committed":"2013-05-30 10:23:30",
"Optimized":"2013-05-30 10:23:30",

And the log:
INFO 2013-05-30 10:23:29,924 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (121) - Loading DIH
Configuration: mnb-data-config.xml
INFO 2013-05-30 10:23:29,957 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (224) - Data Configuration
loaded successfully
INFO 2013-05-30 10:23:29,969 http-8080-1
org.apache.solr.handler.dataimport.DataImporter  (414) - Starting Full
Import
INFO 2013-05-30 10:23:30,009 http-8080-1
org.apache.solr.handler.dataimport.SimplePropertiesWriter  (219) - Read
dataimportMNb.properties
INFO 2013-05-30 10:23:30,045 http-8080-1
org.apache.solr.handler.dataimport.DocBuilder  (292) - Import completed
successfully


Did some has already done the kind of configuration, or is just not
possible?

The config:





I'm trying to inde
Cordialement,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

Re: Re: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

2013-05-30 Thread jerome . dupont

Hi,

Thanks for your anwser, it made me go ahead.
The name of the entity was not good, not consistent with schema
Now the first entity works fine: the query is done to the database and
returns the good result.
The problem is that the second entity, which is a XPathEntityProcessor
entity, doesn't read the file specified in url attribute, but tries to
execute it as an sql query on my database.

I tried to put a fake query (select 1 from dual) but it changes nothing.
It's like the XPathEntityProcessor entity behaved like an
SqlEntityProcessor, using url attribute instead of query attrbute.

I've forgotten to say which version I use: SOLR 4.2.1 (can be changed, it's
just the beginning of the developpement)
See next the config, and the return message:


The verbose output:

  "verbose-output":[
"entity:noticebib",[
  "query","select DISTINCT   SUBSTR( to_char(noticebib.numnoticebib,
'9'), 3, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib,
'9'), 4, 1) || '/' ||SUBSTR( to_char(noticebib.numnoticebib,
'9'), 5, 3) || '/' ||to_char(noticebib.numnoticebib) || '.xml'
as CHEMINRELATIF   from bnf.noticebibwhere numnoticebib = '3001'",
  "time-taken","0:0:0.141",
  null,"--- row #1-",
  "CHEMINRELATIF","3/0/000/3001.xml",
  null,"-",
  "entity:processorDocument",[
"document#1",[
  "query","file:///D:/jed/noticesbib/3/0/000/3001.xml",

"EXCEPTION","org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: file:///D:/jed/noticesbib/3/0/000/3001.xml
Processing Document # 1\r\n\tat
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow
(DataImportHandlerException.java:71)\r\n\tat ...
oracle.jdbc.driver.OracleStatementWrapper.execute
(OracleStatementWrapper.java:1203)\r\n\tat
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.
(JdbcDataSource.java:246)\r\n\t... 32 more\r\n",
  "time-taken","0:0:0.124",


This is the configuration
















Cordialement,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---

Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

RE: [DIH] Using SqlEntity to get a list of files and read files in XpathEntityProcessor

2013-05-31 Thread jerome . dupont

Thanks very much, it works, with dataSource (capital S) !!!
Finally, I didn't have to define a "CHEMINRELATIF" field in the
configuration, it's working without it.

This is the definive working configuration:












Thanks again!

---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---


Exposition  Guy Debord, un art de la guerre  - du 27 mars au 13 juillet 2013 - 
BnF - François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement. 

Issue with dataimport xml validation with dtd and jetty: conflict of use for user.dir variable

2019-02-08 Thread jerome . dupont
Hello,

I use solr and dataimport to index xml files with a dtd.
The dtd is referenced like this


Previously we were using solr4 in a tomcat container.
During the import process, solr tries to validate the xml file with the 
dtd.
To find it we were defining -Duser.dir=pathToDtD and solr could find te 
dtd and validation was working

Now, we are migrating to solr7 (and jetty embedded)
When we start solr  with -a "-Duser.dir=pathToDtd", solr doesn't start and 
returns an error: Cannot find jetty main class

So I removed the a "-Duser.dir=pathToDtd" option, and solr starts. 
BUT
Now solr cannot anymore open xml file, because it doesn't find the dtd 
during validation stage.

Is there a way to:
- activate an xml catalog file to indicate where the dtd is? (Seems it 
would be the better way, fat I didn't find how to do)
- disable dtd validation 

Regards,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---

Pass BnF lecture/culture : bibliothèques, expositions, conférences, concerts en 
illimité pour 15 € / an  –  Acheter en ligne Avant d'imprimer, pensez à 
l'environnement.