thanx Alexey I downloaded Solr 4 and implemented the TikaEntityProcessor, it worked fine with Tika 0.6. didn't work with Tika 0.7 nor Tika 0.8 SNAPSHOT
On Sat, Nov 27, 2010 at 4:05 AM, Alexey Serba <ase...@gmail.com> wrote: > > 1- How to combine data from DIH and content extracted from file > system > > document into one document in the index? > http://wiki.apache.org/solr/TikaEntityProcessor > You can have one sql entity that retrieves metadata from database and > another nested entity that parses binary file into additional fields > in the document. > > > 2- Should I move the per-user permissions into a separate index? > What > > technique to implement? > I would start with keeping permissions in the same index as the actual > content. > > > On Tue, Nov 23, 2010 at 11:35 AM, Darx Oman <darxo...@gmail.com> wrote: > > Hi guys > > > > I'm kind of new to solr and I'm wondering how to configure solr to best > > fulfills my requirements. > > > > Requirements are as follow: > > > > I have 2 data sources: database and file system documents. Every document > in > > the file system has related information stored in the database. Both the > > file content and the related database fields must be indexed. Along with > > the DB data is per-user permissions for every document. I'm using DIH > for > > the DB and Tika for the file System. The documents contents nearly never > > change, while the DB data especially the permissions changes very > > frequently. Total number of documents roughly around 2M and each document > is > > about 500KB. > > > > 1- How to combine data from DIH and content extracted from file > system > > document into one document in the index? > > > > 2- Should I move the per-user permissions into a separate index? > What > > technique to implement? > > >