What is the difference between SolrCell based Tika and Tika in Nuch?

2015-03-21 Thread zhangxin0804
Hi All, I am new to Solr. I have a question as follows: Is there any difference between extract metadata using Tika in Nutch and extract metadata using SolrCell based Tika? I used these two ways to extract metada from PDF files and PNG files, but they almost same. Can anyone tell me abou

Re: SOLR indexing strategy

2015-03-21 Thread varun sharma
Its more of a financial message where for each customer there are various fields that specify various aspects of the transaction  On Friday, 20 March 2015 8:09 PM, Priceputu Cristian wrote: Why would you need 1000 fields for ? C On Fri, Mar 20, 2015 at 1:12 PM, varun sharma wrote

Re: SOLR indexing strategy

2015-03-21 Thread varun sharma
1. All fields should be retrievable and are populated for each row , may be with default values for some.2. Out of 1000 fields , 10-15 are need to be indexed. In our current proprietary  solution , index as well as data files(compressed) reside together on SAN storage , and based on date range d

Re: SOLR indexing strategy

2015-03-21 Thread Jack Krupansky
Don't you have a number of "types" of transactions, where some fields may be common to all transactions, but with plenty of fields that are not common to all transactions? The point is that if the number of fields that need to be populated for each document type is relatively low, it becomes much m

Re: What is the difference between SolrCell based Tika and Tika in Nuch?

2015-03-21 Thread Erick Erickson
Well, they could be different versions of Tika, don't know. You can tell this from the respective jars in the two projects. But more importantly, _how_ the fields from Nutch-based Tika maps into Solr fields and how they're mapped in SolrCel may be different, but this would be because your configur

Re: What is the difference between SolrCell based Tika and Tika in Nuch?

2015-03-21 Thread Furkan KAMACI
Hi, Which versions of Solr and Nutch do you use? Nutch and Solr supports Tika 1.7 at their recent versions. Kind Regards, Furkan KAMACI On Sat, Mar 21, 2015 at 6:46 PM, Erick Erickson wrote: > Well, they could be different versions of Tika, don't know. You can > tell this from the respective j

Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-21 Thread Martin Wunderlich
Hi all, I am trying to create a data import handler (DIH) to import XML files. The source XML should be transformed using XSLT into the standard Solr import format. I have tested the XSLT and successfully imported data using the Java-based simple import tool. However, when I try to import the

Re: Need help using DIH with FileListEntityProcessor with XPathEntityProcessor

2015-03-21 Thread Alexandre Rafalovitch
What do you mean using DIH with XSLT together? DIH uses a basic XPath parser, but not full XSLT. So, it's not very clear what the question actually means. How did you configure it all? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.co