Hello, I was doing some more testing but I could not find a definitive reason for this behavior. The following is my transformer:
public Map<String, Object> transformRow(Map<String, Object> row, Context context) { List<Map<String, String>> fields = context.getAllEntityFields(); for (Map<String, String> field : fields) { // Check if this field has blob="true" specified in the data-config.xml String blob = field.get("blob"); if ("true".equals(blob)) { String columnName = field.get("column"); // Get the field's value from the current row Blob data = (Blob) row.get(columnName); // Transform the blob and store back into the same column if (data != null) { row.put(columnName, process(data)); } else { log.error("Blob is null."); } } } return row; } Note: The function "process" is the function that actually takes care of the whole transformation. What I noticed is that the "row" variable only has the ID, probably due to this: deltaQuery="select ID from TABLE1 where (LASTMODIFIED > to_date('${dataimporter.last_index_time}', 'yyyy-mm-dd HH24:MI:SS'))" However, even if I change it to a "select * " statement, I get everything except the column that contains the blob (it is returned as null). Something tells me that the data-config may be incorrect. I cannot explain how this works for full-imports and not delta-imports. I hope that I explained this issue properly. I am really stuck on this. Any help would be highly appreciated. ------------------------------------------------------------------------------ ahammad wrote: > > I have a Solr core that retrieves data from an Oracle DB. The DB table has > a few columns, one of which is a Blob that represents a PDF document. In > order to retrieve the actual content of the PDF file, I wrote a Blob > transformer that converts the Blob into the PDF file, and subsequently > reads it using PDFBox. The blob is contained in a DB column called > DOCUMENT, and the data goes into a Solr field called fileContent, which is > required. > > This works fine when doing full imports, but it fails for delta imports. I > debugged my transformer, and it appears that when it attempts to fetch the > blob stored in the column, it gets nothing back (i.e. null). Because the > data is essentially null, it cannot retrieve anything, and cannot store > anything into Solr. As a result, the document does not get imported. I am > not sure what the problem is, because this only occurs with delta imports. > > Here is my data-config file: > > <dataConfig> > <dataSource driver="oracle.jdbc.driver.OracleDriver" url="address" > user="user" password="pass"/> > <document name="table1"> > <entity name="TABLE1" pk="ID" query="select * from TABLE1" > deltaImportQuery="select * from TABLE1 where ID > ='${dataimporter.delta.ID}'" > deltaQuery="select ID from TABLE1 where (LASTMODIFIED > > to_date('${dataimporter.last_index_time}', 'yyyy-mm-dd HH24:MI:SS'))" > > transformer="BlobTransformer"> > <field column="ID" name="id" /> > <field column="TITLE" name="title" /> > <field column="FILENAME" name="filename" /> > <field column="DOCUMENT" name="fileContent" > blob="true"/> > <field column="LASTMODIFIED" > name="lastModified" /> > </entity> > </document> > </dataConfig> > > > > Thanks. > -- View this message in context: http://lucene.472066.n3.nabble.com/Issue-with-delta-import-not-finding-data-in-a-column-tp788993p812511.html Sent from the Solr - User mailing list archive at Nabble.com.