I am trying to write a wrapper for DIH, so i can leverage the field type
guessing while importing the sql data. 

the query is supposed to retrieve 400K+ documents. in the test data in db,
there are dirty date fields, which has data like '1966-00-00' or
'1987-10-00' as well. 

I am running the code below:

 public void dataimport(ConcurrentUpdateSolrClient updateClient, String
importSql) {

        try {
            
            Connection conn = DriverManager.getConnection("connection
string","user","pass");
            Statement stmt =
conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
            stmt.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = stmt.executeQuery(importsql);
            ResultSetMetaData resultSetMetaData = rs.getMetaData();
            List<SolrFieldObject> fields = new ArrayList<>();
            for(int index=1; index < resultSetMetaData.getColumnCount();
index++){
                fields.add(new
SolrFieldObject(resultSetMetaData.getColumnLabel(index),
resultSetMetaData.getColumnClassName(index)));
            }
            while(rs.next()){
                SolrInputDocument solrInputDocument = new
SolrInputDocument();
                for(SolrFieldObject field : fields){
                    try{
                        Object dataObject = rs.getString(field.name());
                        Optional.ofNullable(dataObject).ifPresent(
                                databaseInfo ->{
                                solrInputDocument.addField(field.name(),
String.valueOf(databaseInfo)); 
                                }
                        );
                    }catch(Exception e){
                        e.printStackTrace();
                    }

                }
                try{
                     UpdateRequest updateRequest = new UpdateRequest();
                     updateRequest.setCommitWithin(10000);
                    try{
                      updateRequest.add(solrInputDocument);
                      updateRequest.process(updateClient);

                    }catch(Exception e){
                      e.printStackTrace();
                    }
                }catch(Exception e){
                    System.out.println("Inner -> " + e.getMessage());
                }
            }
            stmt.close();
            conn.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

The code is working fine, except that it is randomly stopping with the logs
like 'Error adding field 'day'='1976-00-00' msg=Invalid Date
String:'1976-00-00' on random documents. Although there are many other
documents with invalid dates, those are logged as errors on the server side,
but client works fine and continues to push other document, until it stops
on random document with the given error.

Are there any error threshold value that makes the concurrent update client
stop after some time? or there are some other points I am missing while
dealing with this kind of updates? 



-----
Zeki ama calismiyor... Calissa yapar...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to