Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
ll so that you can just include the file location in the xml. - Pete On 8/21/07, Vish D. <[EMAIL PROTECTED]> wrote: > On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > > > I am a little confused how you have things setup, so these meta data > > files contain cert

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
What do you think? I am not a true Java developer, so not sure if I could > do it myself, but only hope that someone else on the project could ;-)... > > Rao > > On 8/21/07, Peter Manis <[EMAIL PROTECTED]> wrote: > > > > Installing the patch requires downloading the l

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
> seen too many duplicated efforts, even in Apache projects alone, and this is > one step close to fixing it (other than Tika, which isnt' 'complete' yet). > Are there any plans on releasing this patch with Solr dist? Or, any > instructions on using/installing the patch its

Re: Indexing Doc, PDF, ... from filesystem (Newbie Question)

2007-08-21 Thread Peter Manis
Christian, Eric Pugh created implemented this functionality for a project we were doing and has released to code on JIRA. We have had very good results with it. If I can be of any help using it beyond the Java code itself let me know. The last revision I used with it was 552853, so if the build

Re: Indexing large documents

2007-08-20 Thread Peter Manis
TED]> wrote: > Well, I am using the java textmining library to extract text from documents, > then i do a post to solr > I do not have an error log, i only have *.request.log files in the logs > directory > > Thanks > > On 8/20/07, Peter Manis <[EMAIL PROTECTED]> w

Re: Indexing large documents

2007-08-20 Thread Peter Manis
Fouad, I would check the error log or console for any possible errors first. They may not show up, it really depends on how you are processing the word document (custom solr, feeding the text to it, etc). We are using a custom version of solr with PDF, DOC, XLS, etc text extraction and I have suc

Re: Re[4]: Start up script for solr?

2007-08-19 Thread Peter Manis
Sorry about that, I left out the 2nd dash when I added it to the blog. Glad it is working now On 8/19/07, Jack L <[EMAIL PROTECTED]> wrote: > Actually it's --stop. Thanks! > > > Interesting, it worked fine on the server. Try moving the -stop at > > the end of the line to just before the -jar. >

Re: Re[2]: Start up script for solr?

2007-08-19 Thread Peter Manis
Interesting, it worked fine on the server. Try moving the -stop at the end of the line to just before the -jar. - Pete On 8/19/07, Jack L <[EMAIL PROTECTED]> wrote: > Hello Peter, > > Many thanks! > > solr.start works fine but I'm getting an error with solr.stop and solr is not > being stopped:

Re: Start up script for solr?

2007-08-19 Thread Peter Manis
I forgot to mention, that is for a RHEL box, but can easily be adapted. It will work like the standard scripts for RHEL /etc/init.d/solr start /etc/init.d/solr stop /etc/init.d/solr restart or you can just run the solr.start and solr.stop scripts individually On 8/19/07, Peter Manis <[EM

Re: Start up script for solr?

2007-08-19 Thread Peter Manis
I blogged about it last month, here ya go. http://www.digital39.com/programming/solr-chkconfig-and-startstop-scripts/2007/07/304/ - Pete On 8/19/07, Jack L <[EMAIL PROTECTED]> wrote: > Hi, > > Sorry that this is not strictly a solr specific question - > > I wonder if anyone has a script to star

Re: most popular/most commonly accessed records

2007-07-06 Thread Peter Manis
Maybe create a snippet of code in the page of the video information that if the page was accessed from search results it will increment a counter within a database (sqlite, mysql, etc). You can then update solr every so often (daily, hourly, twice a day, etc) and include the hits. This would the

Re: Indexing HTML and other doc types

2007-07-05 Thread Peter Manis
I guess I misread your original question. I believe Nutch would be the choice for crawling, however I do not know about its abilities for indexing other document types. If you needed to index multiple document types such as PDF, DOC, etc and Nutch does not provide functionality to do so you woul

Re: Indexing HTML and other doc types

2007-07-04 Thread Peter Manis
ll be fixed in a new few revisions. . Peter Manis On 7/3/07, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote: Solr looks very good for indexing and searching strcutured data. But I noticed there is no tool in the Solr distribution with which documents of other doc types can be indexed. Are