: What's a GA release?
http://en.wikipedia.org/wiki/Software_release_life_cycle#General_availability
-Hoss
--
http://lucenerevolution.org/ ... October 7-8, Boston
http://bit.ly/stump-hoss ... Stump The Chump!
> On 9/23/2010 6:52 AM, mehdi.es...@gmail.com
> wrote:
> >> Hi,
> >> I have exactly the same problem than the one you
> submitted in this link
> http://lucene.472066.n3.nabble.com/Data-Import-Handler-Rich-Format-Documents-td905478.html
> and I would like to ask
y name the
parser for the file format you want to use.
https://issues.apache.org/jira/browse/SOLR-2116
Tod wrote:
On 9/23/2010 6:52 AM, mehdi.es...@gmail.com wrote:
Hi,
I have exactly the same problem than the one you submitted in this
link
http://lucene.472066.n3.nabble.com/Data-Import-Ha
On 9/23/2010 6:52 AM, mehdi.es...@gmail.com wrote:
Hi,
I have exactly the same problem than the one you submitted in this link
http://lucene.472066.n3.nabble.com/Data-Import-Handler-Rich-Format-Documents-td905478.html
and I would like to ask you if you got a solution for that.
I started to
On 6/28/2010 8:28 AM, Alexey Serba wrote:
Ok, I'm trying to integrate the TikaEntityProcessor as suggested. �I'm using
Solr Version: 1.4.0 and getting the following error:
java.lang.ClassNotFoundException: Unable to load BinURLDataSource or
org.apache.solr.handler.dataimport.BinURLDataSource
It
> Ok, I'm trying to integrate the TikaEntityProcessor as suggested. I'm using
> Solr Version: 1.4.0 and getting the following error:
>
> java.lang.ClassNotFoundException: Unable to load BinURLDataSource or
> org.apache.solr.handler.dataimport.BinURLDataSource
It seems that DIH-Tika integration is
On 6/18/2010 2:42 PM, Chris Hostetter wrote:
: > I don't think DIH can do that, but who knows, let's see what others say.
: Looks like the ExtractingRequestHandler uses Tika as well. I might just use
: this but I'm wondering if there will be a large performance difference between
: using it to
You are right. It seems TikaEntityProcessor is exactly the tool you
need in this case.
Alex
On Sat, Jun 19, 2010 at 2:59 AM, Chris Hostetter
wrote:
> : I think you can use existing ExtractingRequestHandler to do the job,
> : i.e. add child entity to your DIH metadata
>
> why would you do this in
: I think you can use existing ExtractingRequestHandler to do the job,
: i.e. add child entity to your DIH metadata
why would you do this instead of using the TikaEntityProcessor as i
already suggested in my earlier mail?
-Hoss
I think you can use existing ExtractingRequestHandler to do the job,
i.e. add child entity to your DIH metadata
http://localhost:8983/solr/update/extract?extractOnly=true&wt=xml&indent=on&stream.url=${metadata.url}";
dataSource="solr">
That's not working example, just basic
On 6/18/2010 2:42 PM, Chris Hostetter wrote:
: > I don't think DIH can do that, but who knows, let's see what others say.
: Looks like the ExtractingRequestHandler uses Tika as well. I might just use
: this but I'm wondering if there will be a large performance difference between
: using it to
On Fri, Jun 18, 2010 at 2:42 PM, Chris Hostetter
wrote:
> I'm confused ... You're using DIH, and some of your fields are URLs to
> documents that you want to parse with Tika?
>
> Why would you need a custom Transformer?
Yeah, I can definitely vouch that DIH can handle this without
additional codi
: > I don't think DIH can do that, but who knows, let's see what others say.
: Looks like the ExtractingRequestHandler uses Tika as well. I might just use
: this but I'm wondering if there will be a large performance difference between
: using it to batch content in over rolling my own Transform
On 6/18/2010 11:24 AM, Otis Gospodnetic wrote:
Tod,
I don't think DIH can do that, but who knows, let's see what others say.
Yes, Nutch uses TIKA, too.
Otis
Looks like the ExtractingRequestHandler uses Tika as well. I might just
use this but I'm wondering if there will be a large performan
t; To: solr-user@lucene.apache.org
> Sent: Fri, June 18, 2010 10:20:34 AM
> Subject: Re: Data Import Handler Rich Format Documents
>
> On 6/18/2010 9:12 AM, Otis Gospodnetic wrote:
> Tod,
>
> You
> didn't mention Tika, which makes me think you are not aware of it...
> You
lazy and trying
to see if a method of doing this has been incorporated into the latest
Solr release so I can avoid coding for it.
- Original Message
From: Tod
To: solr-user@lucene.apache.org
Sent: Fri, June 18, 2010 8:51:02 AM
Subject: Data Import Handler Rich Format Document
.org
> Sent: Fri, June 18, 2010 8:51:02 AM
> Subject: Data Import Handler Rich Format Documents
>
> I have a database containing Metadata from a content management system.
> Part of that data includes a URL pointing to the actual published document
> which
> can be an HTML fi
I have a database containing Metadata from a content management system.
Part of that data includes a URL pointing to the actual published
document which can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc.
I'm already indexing the Metadata and that provides a lot of value. The
custom
18 matches
Mail list logo