date:20070207

crawler feed?

2007-02-07 Thread rubdabadub

Hi: Are there relatively stand-alone crawler that are suitable/customizable for Solr? has anyone done any trials.. I have seen some discussion about coocon crawler.. was that successfull? Regards

Re: crawler feed?

2007-02-07 Thread Thorsten Scherler

On Wed, 2007-02-07 at 11:09 +0100, rubdabadub wrote: > Hi: > > Are there relatively stand-alone crawler that are > suitable/customizable for Solr? has anyone done any trials.. I have > seen some discussion about coocon crawler.. was that successfull? http://wiki.apache.org/solr/SolrForrest I am

Re: crawler feed?

2007-02-07 Thread rubdabadub

Thorsten: Thank you very much for the update. On 2/7/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: On Wed, 2007-02-07 at 11:09 +0100, rubdabadub wrote: > Hi: > > Are there relatively stand-alone crawler that are > suitable/customizable for Solr? has anyone done any trials.. I have > seen som

Re: Debugging Solr memory usage/heap problems

2007-02-07 Thread Otis Gospodnetic

To help find leaks I had good luck with jmap and even jhat in Java 1.6. Otis - Original Message From: Graham Stead <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, February 7, 2007 12:49:47 AM Subject: RE: Debugging Solr memory usage/heap problems Thanks, Chris. I w

Re: Analyzers and Tokenizers?

2007-02-07 Thread Bill Au

FYI, I have added that link into the Wiki. Bill On 2/6/07, rubdabadub <[EMAIL PROTECTED]> wrote: Thanks Thorsten! On 2/6/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > On Tue, 2007-02-06 at 17:27 +0100, rubdabadub wrote: > > Hi: > > > > Are there more filters/tokenizers then the ones ment

Re: crawler feed?

2007-02-07 Thread Sami Siren

rubdabadub wrote: > Hi: > > Are there relatively stand-alone crawler that are > suitable/customizable for Solr? has anyone done any trials.. I have > seen some discussion about coocon crawler.. was that successfull? There's also integration path available for Nutch[1] that i plan to integrate aft

facet optimizing

2007-02-07 Thread Gunther, Andrew

Any suggestions on how to optimize the loading of facets? My index is roughly 35,000 and I am asking solr to return 6 six facet fields on every query. On large result sets with facet params set to false searching is zippy, but when set to true, and facet fields designated, it takes some time to

Re: facet optimizing

2007-02-07 Thread Erik Hatcher

How many unique values do you have for those 6 fields? And are those fields multiValued or not? Single valued facets are much faster (though not realistic in my domain). Lots of values per field do not good facets make. Erik On Feb 7, 2007, at 11:10 AM, Gunther, Andrew wrote:

Re: crawler feed?

2007-02-07 Thread rubdabadub

This is really interesting. You mean to say i could give the patch a try now i.e. the patch in the blog post :-) I am looking forward to it. I hope it will be standalone i.e. you don't need "the whole nutch" to get a standalone crawler working.. I am not sure if this is how you planned. Regards

RE: facet optimizing

2007-02-07 Thread Gunther, Andrew

Yes most all terms are multi-valued which I can't avoid. Since the data is coming from a library catalogue I am translating a subject field to make a subject facet. That facet alone is the biggest, hovering near 39k. If I remove this facet.field things return faster. So am I to assume that this p

cache warming optmization

2007-02-07 Thread Erik Hatcher

I'm interested in improving my existing custom cache warming by being selective about what updates rather than rebuilding completely. How can I tell what documents were updated/added/deleted from the old cache to the new IndexSearcher? Thanks, Erik

Re: crawler feed?

2007-02-07 Thread rubdabadub

Hi: Just want to say that my tiny experiment with Sami's Solr/Nutch integration worked :-!) Super thanks for the pointer. Which leads me to write the following.. It would be great if I could use this in my current project. This way I can eliminate my current python based aggregator/crawler whic

Re: cache warming optmization

2007-02-07 Thread Walter Underwood

On 2/7/07 10:04 AM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote: > I'm interested in improving my existing custom cache warming by being > selective about what updates rather than rebuilding completely. > > How can I tell what documents were updated/added/deleted from the old > cache to the new Inde

Re: facet optimizing

2007-02-07 Thread Andrew Nagy

Gunther, Andrew wrote: Yes most all terms are multi-valued which I can't avoid. Since the data is coming from a library catalogue I am translating a subject field to make a subject facet. That facet alone is the biggest, hovering near 39k. If I remove this facet.field things return faster. So a

Re: facet optimizing

2007-02-07 Thread Mike Klaas

On 2/7/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote: Yes most all terms are multi-valued which I can't avoid. Since the data is coming from a library catalogue I am translating a subject field to make a subject facet. That facet alone is the biggest, hovering near 39k. If I remove this facet.f

Re: cache warming optmization

2007-02-07 Thread Chris Hostetter

: I'm interested in improving my existing custom cache warming by being : selective about what updates rather than rebuilding completely. : : How can I tell what documents were updated/added/deleted from the old : cache to the new IndexSearcher? cache warming in Solr is based mainly arround the i

Re: facet optimizing

2007-02-07 Thread Chris Hostetter

: Andrew, I haven't yet found a successful way to implement the SOLR : faceting for library catalog data. I developed my own system, so for Just to clarify: the "out of hte box" faceting support Solr has at the moment is very deliberately refered to as "SimpleFacets" ... it's intended to solve S

Re: cache warming optmization

2007-02-07 Thread karl wettin

7 feb 2007 kl. 19.04 skrev Erik Hatcher: I'm interested in improving my existing custom cache warming by being selective about what updates rather than rebuilding completely. I know it is not Solr, but I've made great progress on my cache that updates affected results only, on insert and d

Re: facet optimizing

2007-02-07 Thread Yonik Seeley

On 2/7/07, Gunther, Andrew <[EMAIL PROTECTED]> wrote: Any suggestions on how to optimize the loading of facets? My index is roughly 35,000 35,000 documents? That's not that big. and I am asking solr to return 6 six facet fields on every query. On large result sets with facet params set to

Re: facet optimizing

2007-02-07 Thread Erik Hatcher

On Feb 7, 2007, at 4:42 PM, Yonik Seeley wrote: Solr relies on the filter cache for faceting, and if it's not big enough you're going to get a near 0% hit rate. Check the statistics page and make sure there aren't any evictions after you do a query with facets. If there are, make the cache lar

Re: crawler feed?

2007-02-07 Thread Thorsten Scherler

On Wed, 2007-02-07 at 18:03 +0200, Sami Siren wrote: > rubdabadub wrote: > > Hi: > > > > Are there relatively stand-alone crawler that are > > suitable/customizable for Solr? has anyone done any trials.. I have > > seen some discussion about coocon crawler.. was that successfull? > > There's also

Re: facet optimizing

2007-02-07 Thread Ryan McKinley

Are there any simple automatic test we can run to see what fields would support fast faceting? Is it just that the cache size needs to be bigger then the number of distinct values for a field? If so, it would be nice to add an /admin page that lists each field, the distinct value count and a gre

Re: facet optimizing

2007-02-07 Thread Chris Hostetter

: Is it just that the cache size needs to be bigger then the number of : distinct values for a field? basically yes, but the cache is going to be used for all filters -- not just those for a single facet (so your cache might be big enough that faceting on fieldA or fieldB is fine, but if you face

RE: facet optimizing

2007-02-07 Thread Binkley, Peter

In the library subject heading context, I wonder if a layered approach would bring performance into the acceptable range. Since Library of Congress Subject Headings break into standard parts, you could have first-tier facets representing the main heading, second-tier facets with the main heading an

RE: facet optimizing

2007-02-07 Thread Chris Hostetter

: headings from a given result set, you'd first test all the first-tier : facets like "Body, Human", then where warranted test the associated : second-tier facets like "Body, Human--Social aspects.". If the : first-tier facets represent a small enough subset of the set of subject : headings as a w

Re: facet optimizing

2007-02-07 Thread rubdabadub

Hi: when you start talking about really large data sets, with an extremely large vloume of unique field values for fields you want to facet on, then "generic" solutions stop being very feasible, and you have to start ooking at solutions more tailored to your dataset. at CNET, when dealing with

Re: facet optimizing

2007-02-07 Thread Yonik Seeley

On 2/7/07, Binkley, Peter <[EMAIL PROTECTED]> wrote: In the library subject heading context, I wonder if a layered approach would bring performance into the acceptable range. Since Library of Congress Subject Headings break into standard parts, you could have first-tier facets representing the ma

Re: facet optimizing

2007-02-07 Thread Erik Hatcher

Yonik - I like the way you think Yeah! It's turtles (err, trees) all the way down. Erik /me Pulling the Algorithms book off my shelf so I can vaguely follow along. On Feb 7, 2007, at 8:22 PM, Yonik Seeley wrote: On 2/7/07, Binkley, Peter <[EMAIL PROTECTED]> wrote: In the

Re: facet optimizing

2007-02-07 Thread Yonik Seeley

On 2/7/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: Yonik - I like the way you think Yeah! It's turtles (err, trees) all the way down. Heh... I'm still thinking/brainstorming about it... it only helps if you can effectively prune though. Each node in the tree could also keep the max d

crawler feed?

Re: crawler feed?

Re: crawler feed?

Re: Debugging Solr memory usage/heap problems

Re: Analyzers and Tokenizers?

Re: crawler feed?

facet optimizing

Re: facet optimizing

Re: crawler feed?

RE: facet optimizing

cache warming optmization

Re: crawler feed?

Re: cache warming optmization

Re: facet optimizing

Re: facet optimizing

Re: cache warming optmization

Re: facet optimizing

Re: cache warming optmization

Re: facet optimizing

Re: facet optimizing

Re: crawler feed?

Re: facet optimizing

Re: facet optimizing

RE: facet optimizing

RE: facet optimizing

Re: facet optimizing

Re: facet optimizing

Re: facet optimizing

Re: facet optimizing

29 matches

Site Navigation

Mail list logo

Footer information