Christian,
This is interesting. I have been always thinking that Solr shouldn't
be in the business of parsing; it's responsibility of the Solr client.
But
what Peter suggested, adding a parsing capability to the Solr
as a request handler does make sense.
One thing that I noticed this approach ca
I cant find the documentation, but I believe apache's max url is 8192,
so I would assume a lot of other apps like tomcat and jetty would be
similar. I havn't run into any problems yet.
Maybe shoot Eric an email and see if he would be interested in
adapting the code to take XML as well so that you
On 8/21/07, Vish D. <[EMAIL PROTECTED]> wrote:
>
> On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote:
> >
> > I am a little confused how you have things setup, so these meta data
> > files contain certain information and there may or may not be a pdf,
> > xls, doc that it is associated with?
>
>
>
On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote:
>
> I am a little confused how you have things setup, so these meta data
> files contain certain information and there may or may not be a pdf,
> xls, doc that it is associated with?
Yes, you have it right.
If that is the case, if it were me I w
I am a little confused how you have things setup, so these meta data
files contain certain information and there may or may not be a pdf,
xls, doc that it is associated with?
If that is the case, if it were me I would write something to parse
the meta data files, and if there is a binary file asso
Pete,
Thanks for the great explanation.
Thinking it through my process, I am not sure how to use it:
I have a bunch of docs that pretty much contain a lot of meta-data, some
which include full-text files (.pdf, .ppt, etc...). I use these docs
correctly to index/update into Solr. The next step no
Installing the patch requires downloading the latest solr via
subversion and applying the patch to the source. Eric has updated his
patch with various revisions of subversion. To make sure it will
compile I suggest getting the revision he lists.
As for using the features of this patch. This is
There seems to be some code out for Tika now (not packaged/announced yet,
but...). Could someone please take a look at it and see if that could fit
in? I am eagerly waiting for a reply back from tika-dev, but no luck yet.
http://svn.apache.org/repos/asf/incubator/tika/trunk/src/main/java/org/apach
Christian,
Eric Pugh created implemented this functionality for a project we were
doing and has released to code on JIRA. We have had very good results
with it. If I can be of any help using it beyond the Java code itself
let me know. The last revision I used with it was 552853, so if the
build
Hi Solr Users,
i have set up a Solr-Server with a custom Schema.
Now i have updated the index with some content form
xml-files.
Now i try to update the contents of a folder.
The folder consits of various document-types
(pdf,doc,xls,...).
Is there anywhere an howto how can i parse the
documents,
10 matches
Mail list logo