I would seriously consider moving away from DIH to SolrJ if you want
to tweak on this level, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

One other alternative is to incorporate a ScriptUpdateProcessor in
your update chain to intercept these on the way in to being indexed
and "do something" to fix it up.

ConcurrentUpdateSolrServer shouldn't "just quit", I'd guess something in DIH.

Best,
Erick
On Thu, Sep 6, 2018 at 12:52 AM deniz <denizdurmu...@gmail.com> wrote:
>
> I am trying to write a wrapper for DIH, so i can leverage the field type
> guessing while importing the sql data.
>
> the query is supposed to retrieve 400K+ documents. in the test data in db,
> there are dirty date fields, which has data like '1966-00-00' or
> '1987-10-00' as well.
>
> I am running the code below:
>
>  public void dataimport(ConcurrentUpdateSolrClient updateClient, String
> importSql) {
>
>         try {
>
>             Connection conn = DriverManager.getConnection("connection
> string","user","pass");
>             Statement stmt =
> conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,
> ResultSet.CONCUR_READ_ONLY);
>             stmt.setFetchSize(Integer.MIN_VALUE);
>             ResultSet rs = stmt.executeQuery(importsql);
>             ResultSetMetaData resultSetMetaData = rs.getMetaData();
>             List<SolrFieldObject> fields = new ArrayList<>();
>             for(int index=1; index < resultSetMetaData.getColumnCount();
> index++){
>                 fields.add(new
> SolrFieldObject(resultSetMetaData.getColumnLabel(index),
> resultSetMetaData.getColumnClassName(index)));
>             }
>             while(rs.next()){
>                 SolrInputDocument solrInputDocument = new
> SolrInputDocument();
>                 for(SolrFieldObject field : fields){
>                     try{
>                         Object dataObject = rs.getString(field.name());
>                         Optional.ofNullable(dataObject).ifPresent(
>                                 databaseInfo ->{
>                                 solrInputDocument.addField(field.name(),
> String.valueOf(databaseInfo));
>                                 }
>                         );
>                     }catch(Exception e){
>                         e.printStackTrace();
>                     }
>
>                 }
>                 try{
>                      UpdateRequest updateRequest = new UpdateRequest();
>                      updateRequest.setCommitWithin(10000);
>                     try{
>                       updateRequest.add(solrInputDocument);
>                       updateRequest.process(updateClient);
>
>                     }catch(Exception e){
>                       e.printStackTrace();
>                     }
>                 }catch(Exception e){
>                     System.out.println("Inner -> " + e.getMessage());
>                 }
>             }
>             stmt.close();
>             conn.close();
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
> The code is working fine, except that it is randomly stopping with the logs
> like 'Error adding field 'day'='1976-00-00' msg=Invalid Date
> String:'1976-00-00' on random documents. Although there are many other
> documents with invalid dates, those are logged as errors on the server side,
> but client works fine and continues to push other document, until it stops
> on random document with the given error.
>
> Are there any error threshold value that makes the concurrent update client
> stop after some time? or there are some other points I am missing while
> dealing with this kind of updates?
>
>
>
> -----
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to