Solr 4- You mean the Solr 'trunk' source or the Solr 1.4.1 release? The 1.4.1 release does not have the TikaEntityProcessor, only the /extract code.
The Solr 3.x branch and the trunk have the TikaEP. I use the 3.x branch and, well, the TikaEP has a few problems but can be hacked around. Whatever version of Tika is in the Solr release, it will only work with that Tika. Lance On Sun, Nov 28, 2010 at 10:33 PM, Darx Oman <darxo...@gmail.com> wrote: > thanx Alexey > I downloaded Solr 4 and implemented the TikaEntityProcessor, it worked fine > with Tika 0.6. > didn't work with Tika 0.7 nor Tika 0.8 SNAPSHOT > > > On Sat, Nov 27, 2010 at 4:05 AM, Alexey Serba <ase...@gmail.com> wrote: > >> > 1- How to combine data from DIH and content extracted from file >> system >> > document into one document in the index? >> http://wiki.apache.org/solr/TikaEntityProcessor >> You can have one sql entity that retrieves metadata from database and >> another nested entity that parses binary file into additional fields >> in the document. >> >> > 2- Should I move the per-user permissions into a separate index? >> What >> > technique to implement? >> I would start with keeping permissions in the same index as the actual >> content. >> >> >> On Tue, Nov 23, 2010 at 11:35 AM, Darx Oman <darxo...@gmail.com> wrote: >> > Hi guys >> > >> > I'm kind of new to solr and I'm wondering how to configure solr to best >> > fulfills my requirements. >> > >> > Requirements are as follow: >> > >> > I have 2 data sources: database and file system documents. Every document >> in >> > the file system has related information stored in the database. Both the >> > file content and the related database fields must be indexed. Along with >> > the DB data is per-user permissions for every document. I'm using DIH >> for >> > the DB and Tika for the file System. The documents contents nearly never >> > change, while the DB data especially the permissions changes very >> > frequently. Total number of documents roughly around 2M and each document >> is >> > about 500KB. >> > >> > 1- How to combine data from DIH and content extracted from file >> system >> > document into one document in the index? >> > >> > 2- Should I move the per-user permissions into a separate index? >> What >> > technique to implement? >> > >> > -- Lance Norskog goks...@gmail.com