Index pdf files.
Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler. Im using Solr-4.3.0. I followed the steps given in this post http://lucene.472066.n3.nabble.com/indexing-with-DIH-and-with-problems-td3731129.html However, I get the following error - Full Import failed:java.lang.NoClassDefFoundError: org/apache/tika/parser/Parser Please help! Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index pdf files.
Hi Thanks a lot. I did what you said. Now I'm getting the following error. Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 -- View this message in context: http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index pdf files.
I figured it out. It was a problem with the regular expression i used in data-config.xml . -- View this message in context: http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074304.html Sent from the Solr - User mailing list archive at Nabble.com.
Unique key error while indexing pdf files
Hi Im trying to index pdf files in solr 4.3.0 using the data import handler. *My request handler - * data-config1.xml *My data-config1.xml * Now When i try and index the files i get the following error - org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:88) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:517) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) This problem can be solved easily in case of database indexing but i dont know how to go about the unique key of a document. how do i define the id field (unique key) of a pdf file. how do i solve this problem? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique key error while indexing pdf files
Im new to solr. Im just trying to understand and explore various features offered by solr and their implementations. I would be very grateful if you could solve my problem with any example of your choice. I just want to learn how i can index pdf documents using data import handler. -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074327.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique key error while indexing pdf files
Can you please suggest a way (with example) of assigning this unique key to a pdf file? -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074588.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique key error while indexing pdf files
Okay. Can you please suggest a way (with an example) of assigning this unique key to a pdf file. Say, a unique number to each pdf file. How do i achieve this? -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique key error while indexing pdf files
Yes. The absolute path is unique. -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074620.html Sent from the Solr - User mailing list archive at Nabble.com.
Removal of unique key - Query Elevation Component
I want to index pdf files in solr 4.3.0 using the data import handler. I have done the following: My request handler - data-config.xml My data-config.xml Now when i tried to index the documents i got the following error org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id Because i dont want any uniquekey in my case i disabled it as follows : In solrconfig.xml i commented out - pick a fieldType to analyze queries string elevate.xml In schema.xml i commented out id and added and in elevate.xml i made the following changes When i do this the indexing takes place but the indexed docs contain an author,s_author and id field. The document should contain author,text,title and id field (as defined in my data-config.xml). Please help me out. Am i doing anything wrong? and from where did this s_author field come? arora arc arora arc 4f65332d-49d9-497a-b88b-881da618f571 -- View this message in context: http://lucene.472066.n3.nabble.com/Removal-of-unique-key-Query-Elevation-Component-tp4074624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Removal of unique key - Query Elevation Component
Thanks! The author_s issue has been resolved. Why are the other fields not getting indexed ? -- View this message in context: http://lucene.472066.n3.nabble.com/Removal-of-unique-key-Query-Elevation-Component-tp4074624p4074636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unique key error while indexing pdf files
Yes. The absolute path is unique. How do i implement it? can you please explain? -- View this message in context: http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074638.html Sent from the Solr - User mailing list archive at Nabble.com.
Extract file name (without extension) while indexing using Data Import Handler in Solr
Im successfully able to index pdf,doc,ppt,etc files using the Data Import Handler in solr 4.3.0 . My data-config.xml looks like this - However in the fileName field i want to insert the pure file name without the extension. Eg - Instead of 'HelloWorld.txt' I want only 'HelloWorld' to be inserted in the fileName field. How do I achieve this? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Extract-file-name-without-extension-while-indexing-using-Data-Import-Handler-in-Solr-tp4074991.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing database in Solr using Data Import Handler
Im trying to index MySql database using Data Import Handler in solr. I have made two tables. The first table holds the metadata of a file. create table filemetadata ( id varchar(20) primary key , filename varchar(50), path varchar(200), size varchar(10), author varchar(50) ) ; The second table contains the "favourite" info about a particular file in the above table. create table filefav ( fid varchar(20) primary key , id varchar(20), favouritedby varchar(300), favouritedtime varchar(10), FOREIGN KEY (id) REFERENCES filemetadata(id) ) ; As you can see "id" is a foreign key. To index this i have written the following data-config.xml - Everything is working but the "favouritedby1" field is not getting indexed , ie, that field does not exist when i run the *:* query. Can you please help me out? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-database-in-Solr-using-Data-Import-Handler-tp4077180.html Sent from the Solr - User mailing list archive at Nabble.com.
Index mysql database using data import handler in solr
I want to index mysql database in solr using the Data Import Handler. I have made two tables. The first table holds the metadata of a file. create table filemetadata ( id varchar(20) primary key , filename varchar(50), path varchar(200), size varchar(10), author varchar(50) ) ; +---+-+-+--+-+ | id | filename | path | size | author | +---+-+-+--+-+ | 1| abc.txt| c:\files | 2kb | eric | +---+-+-+--+-+ | 2| xyz.docx | c:\files | 5kb | john | +---+-+-+--+-+ | 3| pqr.txt|c:\files| 10kb | mike | +---+-+-+--+-+ The second table contains the "favourite" info about a particular file in the above table. create table filefav ( fid varchar(20) primary key , id varchar(20), favouritedby varchar(300), favouritedtime varchar(10), FOREIGN KEY (id) REFERENCES filemetadata(id) ) ; ++--+-++ | fid| id | favouritedby | favouritedtime | ++--+-++ | 1 | 1 | ross | 22:30 | ++--+-++ | 2 | 1 | josh | 12:56 | ++--+-++ | 3 | 2 | johny | 03:03 | ++--+-++ | 4 | 2 | sean | 03:45 | ++--+-++ here "id' is a foreign key. The second table is showing which person has marked which document as his/her favourite. Eg the file abc.txt represented by id = 1 has been marked favourite (see column favouritedby) by ross and josh. I want to index the the files as follows: each document should have the following fields id - to be taken from the first table filemetadata filename - to be taken from the first table filemetadata path - to be taken from the first table filemetadata size - to be taken from the first table filemetadata author - to be taken from the first table filemetadata Favouritedby - this field should contain the names of all the people from table 2 filefav (from the favouritedby column) who like that particular file. eg after indexing doc 1 should have id = 1 filename = abc.txt path = c:\files size = 2kb author = eric favourited by - ross , josh How Do I achieve this? I have written a data-config.xml (which is not giving the desired result) as follows Can anyone explain how do i achieve this? -- View this message in context: http://lucene.472066.n3.nabble.com/Index-mysql-database-using-data-import-handler-in-solr-tp4077205.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Oracle Database in Solr using Data Import Handler
Im trying to Index oracle database 10g XE using Solr's Data Import Handler. My data-config.xml looks like this My schema.xml looks like this - Now when I try to index it, Solr is not able to read the columns of the table and therefore indexing fails. it says that the document is missing the unique key id which ,as you can see, is clearly present in document. Also, generally in the log when such an exception is thrown it is clearly shown that what all fields were picked up by the document. However in this case, No fields are being read. But if i change my query then everything works perfectly. The modified data-config.xml - Why is this happening? how do i solve it? how does giving an alias affect indexing process? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Oracle-Database-in-Solr-using-Data-Import-Handler-tp4079649.html Sent from the Solr - User mailing list archive at Nabble.com.
Timestamp compatibility while performing delta import in solr
Im new to solr.I have successfully indexed oracle 10g xe database. Im trying to perform delta import on the same. The Delta query required a comparison of last_modified column of the table with ${dih.last_index_time}. However in my application I do not have such a column . Also, i cannot add this column. Therefore i used 'scn_to_timestamp(ora_rowscn)' to give the value of the required timestamps. This query returns the value of type timestamp in the following format 24-JUL-13 12.42.32.0 PM and dih.last_index_time is in the format 2013-07-24 12:18:03. So, I changed the format of dih.last_index_time as to_timestamp('${dih.last_index_time}', '/MM/DD HH:MI:SS'). My Data-config looks like this - However,This is not working and im getting the following error - Unable to execute query: SELECT * FROM PRODUCT WHERE PID= Processing Document # 1 Caused by: java.sql.SQLException: ORA-00936: missing expression Please help me out!!! -- View this message in context: http://lucene.472066.n3.nabble.com/Timestamp-compatibility-while-performing-delta-import-in-solr-tp4079982.html Sent from the Solr - User mailing list archive at Nabble.com.
Auto Indexing in Solr
Hi Im using Solr 4's Data Import Utility to index Oracle 10g XE database. Im using full imports as well as delta imports. I want these processes to be automatic. (Eg: The import processes can be timed or should be executed as soon any data in the database is modified). I searched for the same online and I heard people talk about CRON and scripts. However, Im not able to figure out how to implement it. Can you please provide a tutorial like explanation? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto Indexing in Solr
I have to execute this command for full-import http://localhost:8983/solr/dataimport?command=full-import Can you explain how do i use the java timer to fire this HTTP request. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233p4080278.html Sent from the Solr - User mailing list archive at Nabble.com.