I want to index PDF (and other rich) documents. I am using the DataImportHandler.
Here is how my schema.xml looks: ......... ......... <field name="title" type="text" indexed="true" stored="true" multiValued="false"/> <field name="description" type="text" indexed="true" stored="true" multiValued="false"/> <field name="date_published" type="string" indexed="false" stored="true" multiValued="false"/> <field name="link" type="string" indexed="true" stored="true" multiValued="false" required="false"/> <dynamicField name="attr_*" type="textgen" indexed="true" stored="true" multiValued="false"/> ........ ........ <uniqueKey>link</uniqueKey> As you can see I have set link as the unique key so that when the indexing happens documents are not duplicated again. Now I have the file paths stored in a database and I have set the DataImportHandler to get a list of all the file paths and index each document. To test it I used the tutorial.pdf file that comes with example docs in Solr. The problem is of course this pdf document won't have a field 'link'. I am thinking of way how I can manually set the file path as link when indexing these documents. I tried the data-config settings as below, <entity name="fileItems" rootEntity="false" dataSource="dbSource" query="select path from file_paths"> <entity name="tika-test" processor="TikaEntityProcessor" url="${fileItems.path}" dataSource="fileSource"> <field column="title" name="title" meta="true"/> <field column="Creation-Date" name="date_published" meta="true"/> <entity name="filePath" dataSource="dbSource" query="SELECT path FROM file_paths as link where path = '${fileItems.path}'"> <field column="link" name="link"/> </entity> </entity> </entity> where I create a sub-entity which queries for the path name and makes it return the results in a column titled 'link'. But I still see this error: WARNING: Error creating document : SolrInputDocument[{date_published=date_published(1.0)={2011-06-23T12:47:45Z}, title=title(1.0)={Solr tutorial}}] org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: link Is there anyway for me to create a field called link for the pdf documents? This was already asked http://lucene.472066.n3.nabble.com/Trouble-with-exception-Document-Null-missing-required-field-DocID-td1641048.html here before but the solution provided uses ExtractRequestHandler but I want to use it through the DataImportHandler. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-PDF-documents-with-no-UniqueKey-tp3173272p3173272.html Sent from the Solr - User mailing list archive at Nabble.com.