Re: How to get a stack trace

2009-07-31 Thread Otis Gospodnetic
Nicolae, You may be able to figure things out from the heap dump. You'll need to start the JVM like this, for example: java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap ... Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hado

Re: 99.9% uptime requirement

2009-07-31 Thread Otis Gospodnetic
Robi, Solr is indeed very stable. However, it can crash and I've seen it crash. Or rather, I should say I've seen the JVM that runs Solr crash. For instance, if you have a servlet container with a number of webapps, one of which is Solr, and one of which has a memory leak, I believe all weba

Re: dealing with duplicates

2009-07-31 Thread Otis Gospodnetic
Joe, Maybe we can take a step back first. Would it be better if your index was cleaner and didn't have flagged duplicates in the first place? If so, have you tried using http://wiki.apache.org/solr/Deduplication ? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene,

Re: Questions on FieldValueCache

2009-07-31 Thread Otis Gospodnetic
Stephen, Yes, *:* will work, or at least it did last time I tried it a few months ago. This should quickly warm up your OS disk cache. Yes, if searcher warming takes too long, you may need to commit less frequently to avoid searcher overlap. Otis -- Sematext is hiring -- http://sematext.com/a

Solr on Google App Engine

2009-07-31 Thread Allahbaksh Asadullah
Hi All, Is there any plan to have Solr supported on Google App Engine. I saw a patch for SolrJ submitted by Noble Paul. I think it would be good if we can support Solr on App Engine. Warm Regards, Allahbaksh

Re: Problem After Modifying CoreContainer

2009-07-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you take a look at https://issues.apache.org/jira/browse/SOLR-1293 which already handles this On Sat, Aug 1, 2009 at 2:40 AM, danben wrote: > > And, re-examining the URL, this is clearly my fault for improper use of > SolrJ.  Please ignore. > > > danben wrote: >> >> Hi, >> >> I'm developing a

Re: More like *these*? (recommendation system)

2009-07-31 Thread Grant Ingersoll
You might also look at Mahout, and specifically Taste: http://lucene.apache.org/mahout/taste.html . Of course, it is a far different approach from MLT. -Grant On Jul 31, 2009, at 8:08 AM, Andrew Ingram wrote: Hi all, I'm trying various methods of building a user-specific product recommendati

Re: Questions on FieldValueCache

2009-07-31 Thread Stephen Duncan Jr
On Fri, Jul 31, 2009 at 5:23 PM, Yonik Seeley wrote: > > Ok, so that was the curiosity question. More critical: > > > > When we first ask for facets for multi-valued fields, it can take up to > 25 > > seconds to get the response, although after that it's very fast (1.5 > seconds > > or less even

Re: Questions on FieldValueCache

2009-07-31 Thread Yonik Seeley
On Fri, Jul 31, 2009 at 5:06 PM, Stephen Duncan Jr wrote: > I have a couple more questions on the FieldValueCache.  I see that the > number of items in the cache is basically the number of multi-valued fields > facets have been requested for.  What does each entry in the cache actually > contain?  

Re: Problem After Modifying CoreContainer

2009-07-31 Thread danben
And, re-examining the URL, this is clearly my fault for improper use of SolrJ. Please ignore. danben wrote: > > Hi, > > I'm developing an application that requires a large number of cores, and > since lazy loading / LRU caching won't be available until 1.5, I decided > to modify CoreContainer

Questions on FieldValueCache

2009-07-31 Thread Stephen Duncan Jr
I have a couple more questions on the FieldValueCache. I see that the number of items in the cache is basically the number of multi-valued fields facets have been requested for. What does each entry in the cache actually contain? How does it's size grow as the number of total documents increases

dealing with duplicates

2009-07-31 Thread Joe Calderon
hello all, i have a collection of a few million documents; i have many duplicates in this collection. they have been clustered with a simple algorithm, i have a field called 'duplicate' which is 0 or 1 and a fields called 'description, tags, meta', documents are clustered on different criteria and

Problem After Modifying CoreContainer

2009-07-31 Thread danben
Hi, I'm developing an application that requires a large number of cores, and since lazy loading / LRU caching won't be available until 1.5, I decided to modify CoreContainer to hold me over. Another requirement is that multiple Solr instances can access the same cores (on NAS, for instance), so

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Yao Ge
Having a large number of fields is not the same as having a large number of facets. To facets are something you would display to users as aid for query refinement or navigation. There is no way for a user to use 3700 facets at the same time. So it more of question on how to determine what facets t

Re: More like *these*? (recommendation system)

2009-07-31 Thread Avlesh Singh
> > So if I search for "id:1 OR id:2 OR id:3", I want the MLT result to be a > single list of items, rather than 3 lists. > I did not understand this. Isn't the "q" parameter in MLT handler supposed to serve the same objective. "/mlt?q=(id:1 OR id:2 OR id:3)&mlt.fl=mlt-field&mlt.mintf=1" just works

Re: Truncated XML responses from CoreAdminHandler

2009-07-31 Thread James Brady
Hi Mark, You're right - a custom request handler sounds like the right option. I've created a handler as you suggested, but I'm having problems on Solr startup (my class is LiveCoresHandler): Jul 31, 2009 5:20:39 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ClassCastException: Liv

99.9% uptime requirement

2009-07-31 Thread Robert Petersen
Hi all, My solr project powers almost all the pages in our site and so needs to be up period. My question is what can I do to ensure that happens? Does solr ever crash, assuming reasonable load conditions and no extreme index sizes? I saw some comments about running solr under daemontools in ord

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Bill Au
The CSVLoader is very fast but it doesn't support document or field boosting at index time. If you don't need that you can also generate input data to Solr into file(s) to be loaded by the CSVLoader. Just reload whenever you change the schema. You will need to regenerate data if you add/remove f

Re: Solr/Lucene performance differences on Mac OS X running Tiger vs. Leopard ?

2009-07-31 Thread Mark Bennett
Grant said: > I thought Apple promoted Leopard as being faster than Tiger... I won't comment on what Apple thinks, but yes this was my understanding that each version of the OS was getting faster, and then what they showed about more thorough 64 bit support in Snow Leopard I'd expect the trend to

Re: Faceting in more like this

2009-07-31 Thread Bill Au
Use the mlt handler and then add the facet parameters after that: /solr/mlt?q=tille:A&mlt.fl=author&facet-true&facet.field=topic Bill On Fri, Jul 31, 2009 at 11:11 AM, Jérôme Etévé wrote: > Hi all, > > Is there a way to enable faceting when using a more like this handler? > I'd like to have f

Re: mergeFactor / indexing speed

2009-07-31 Thread Chantal Ackermann
Hi again! Thanks for the answer, Grant. > It could very well be the case that you aren't seeing any merges with > only 20K docs. Ultimately, if you really want to, you can look in > your data.dir and count the files. If you have indexed a lot and have > an MF of 100 and haven't done an optimiz

Faceting in more like this

2009-07-31 Thread Jérôme Etévé
Hi all, Is there a way to enable faceting when using a more like this handler? I'd like to have facets from my similar documents. Cheers ! J. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R
We are using 1.3.0. Thanks for the suggestion. Will see if I can try one of the ngihtly builds. On Fri, Jul 31, 2009 at 7:49 PM, Erik Hatcher wrote: > What version of Solr? Try a nightly build if you're at Solr 1.3 or > earlier and you'll be amazed at the difference. > >Erik > > > On Ju

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Erik Hatcher
What version of Solr? Try a nightly build if you're at Solr 1.3 or earlier and you'll be amazed at the difference. Erik On Jul 31, 2009, at 10:00 AM, Rahul R wrote: In a production environment, having the caches enabled makes a lot of sense. And most definitely we will be enabling

Re: mergeFactor / indexing speed

2009-07-31 Thread Grant Ingersoll
On Jul 31, 2009, at 8:04 AM, Chantal Ackermann wrote: Dear all, I want to find out which settings give the best full index performance for my setup. Therefore, I have been running a small index (less than 20k documents) with a mergeFactor of 10 and 100. In both cases, indexing took about

Re: Problem with retrieving field from database using DIH

2009-07-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
All that you need to do is paste the contents of your your data-config.xml and hit the button. It shows up the details of the RHS pane. I should recommend you to use a recent nightly so that the line numbers make sense to us On Fri, Jul 31, 2009 at 6:50 PM, ahammad wrote: > > I looked at the DIH

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R
In a production environment, having the caches enabled makes a lot of sense. And most definitely we will be enabling them. However, the primary idea of this exercise is to verify if limiting the number of facets will actually improve the performance. An update on this. I did verify and looks like

Re: Solr/Lucene performance differences on Mac OS X running Tiger vs. Leopard ?

2009-07-31 Thread Grant Ingersoll
I thought Apple promoted Leopard as being faster than Tiger, so that would be my guess. Also, are they the same versions of 1.5? Are you exercising them in the same way? (same queries, docs, etc.?) On Jul 30, 2009, at 5:10 PM, Mark Bennett wrote: As far as our NOC guys know the machines a

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Edwin Stauthamer
Simple but effective ;-) On Fri, Jul 31, 2009 at 3:23 PM, Erik Hatcher wrote: > There certainly could be some intermediate storage of documents prior to > indexing, but as far as the Lucene index goes it is inherently a one-way > process. Solr could facilitate this pretty easily... with an updat

Re: More like *these*? (recommendation system)

2009-07-31 Thread Edwin Stauthamer
You don't have to create a new "handler" for this... just do some preprocessing on the resultset that comes back on your first "id:1 OR id:2 OR id:3" query. So - post your query - get the relevant text-nodes from the resultset (XSL-processing is great for that). - Combine the text - Send that text

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Erik Hatcher
There certainly could be some intermediate storage of documents prior to indexing, but as far as the Lucene index goes it is inherently a one-way process. Solr could facilitate this pretty easily... with an update processor that wrote the documents coming in to some other storage (one opti

Re: Problem with retrieving field from database using DIH

2009-07-31 Thread ahammad
I looked at the DIH debug page to to be honest I'm not sure how to use it well and get something out of it. I am using a solr 1.4 nightly from March. Cheers Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > you can try going to the DIH debug page. BTW which version of DIH are you > using? > > On Fri,

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Chantal Ackermann
Hi Edwin, what prevents you of storing the data (possibly formatted in SOLR xml input format) yourself on some disk? Cheers, Chantal Edwin Stauthamer schrieb: That is a shame. I have much experience with Autonomy IDOL and the possibility of quickly reindexing the content without making a cal

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Shalin Shekhar Mangar
On Fri, Jul 31, 2009 at 6:29 PM, Erik Hatcher wrote: > > On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote: > > On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher > >wrote: >> >> You'll have to reindex your documents from scratch. Such is the nature >>> of >>> changing the schema of an index. It's al

Re: Problem with retrieving field from database using DIH

2009-07-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can try going to the DIH debug page. BTW which version of DIH are you using? On Fri, Jul 31, 2009 at 6:31 PM, ahammad wrote: > > Hello, > > I tried it using the debug and verbose parameters in the address bar. This > is what appears in the logs: > > INFO: Starting Full Import > Jul 31, 2009 8:

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Edwin Stauthamer
That is a shame. I have much experience with Autonomy IDOL and the possibility of quickly reindexing the content without making a call to the original source is great. Just Export, update the config, and import (=reindex) to see if, for instance the performance is better or just to transport the in

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Erik Hatcher
On Jul 31, 2009, at 7:17 AM, Rahul R wrote: Erik, I understand that caching is going to improve performance. Infact we did a PSR run with caches enabled and we got awesome results. But these wouldn't be really representative because the PSR scripts will be doing the same searches again an

Re: Problem with retrieving field from database using DIH

2009-07-31 Thread ahammad
Hello, I tried it using the debug and verbose parameters in the address bar. This is what appears in the logs: INFO: Starting Full Import Jul 31, 2009 8:54:40 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Jul 31, 2009 8:54:40 AM org.apach

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Erik Hatcher
On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote: On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher >wrote: You'll have to reindex your documents from scratch. Such is the nature of changing the schema of an index. It's always a great idea (in fact, I'd say mandatory) to have a full reindex

More like *these*? (recommendation system)

2009-07-31 Thread Andrew Ingram
Hi all, I'm trying various methods of building a user-specific product recommendation system and one idea is to use solr's MLT functionality. For each customer I have a list of items they've bought, and I want to find similar items that are new to the site. The problem is that MLT operates on eac

mergeFactor / indexing speed

2009-07-31 Thread Chantal Ackermann
Dear all, I want to find out which settings give the best full index performance for my setup. Therefore, I have been running a small index (less than 20k documents) with a mergeFactor of 10 and 100. In both cases, indexing took about 11.5 min: mergeFactor: 10 0:11:46.792 mergeFactor: 100 /ad

Re: update some index documents after indexing process is done with DIH

2009-07-31 Thread Marc Sturlese
: If you make your EventListener implements SolrCoreAware you can get : hold of the core on inform. use that to get hold of the : SolrIndexWriter Implementing SolrCoreAware I can get hold of the core and easy get hold of A SolrIndexSearcher and so a reader. But I can't see the way to get hold of

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Rahul R
Erik, I understand that caching is going to improve performance. Infact we did a PSR run with caches enabled and we got awesome results. But these wouldn't be really representative because the PSR scripts will be doing the same searches again and again. These would be cached and there would be virt

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Vannia Rajan
On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher wrote: > You'll have to reindex your documents from scratch. Such is the nature of > changing the schema of an index. It's always a great idea (in fact, I'd say > mandatory) to have a full reindex process handy. > > Thank you for your response. Yes,

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Vannia Rajan
On Fri, Jul 31, 2009 at 3:17 PM, Tim Sell wrote: > Are you using solr as a data store? > No, data comes from somewhere else, solr is just for indexing giving back query results. > > It is not possible via solr to change existing documents in a solr > index. It would be a nice feature though. >

Re: Limiting facets for huge data - setting indexed=false in schema.xml

2009-07-31 Thread Erik Hatcher
On Jul 31, 2009, at 2:35 AM, Rahul R wrote: Hello, We are trying to get Solr to work for a really huge parts database. Details of the database - 55 million parts - Totally 3700 properties (facets). But each record will not have value for all properties. - Most of these facets are defined

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Erik Hatcher
You'll have to reindex your documents from scratch. Such is the nature of changing the schema of an index. It's always a great idea (in fact, I'd say mandatory) to have a full reindex process handy. Erik On Jul 31, 2009, at 2:37 AM, Vannia Rajan wrote: Hi, We are using solr-se

Re: Recreating SOLR index after a schema change - without having to re-post the data

2009-07-31 Thread Tim Sell
That really is the only way, it would be far easier if you were importing from another source. Are you using solr as a data store? It is not possible via solr to change existing documents in a solr index. It would be a nice feature though. ~Tim. 2009/7/31 Vannia Rajan : > Hi, > >  We are using s

Re: Using DIH for parallel indexing

2009-07-31 Thread Avlesh Singh
Thanks Noble and Shalin. Cheers Avlesh On Fri, Jul 31, 2009 at 1:23 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Fri, Jul 31, 2009 at 11:53 AM, Avlesh Singh wrote: > > > Thanks for the revert Noble. A few questions are still open: > > > > 1. Can I pass parameters to DIH and

Re: Using DIH for parallel indexing

2009-07-31 Thread Shalin Shekhar Mangar
On Fri, Jul 31, 2009 at 11:53 AM, Avlesh Singh wrote: > Thanks for the revert Noble. A few questions are still open: > > 1. Can I pass parameters to DIH and be able to use them inside the > "query" attribute of an entity inside the data-config file? > Yes. Use ${dataimporter.request.X} or ${

Re: Problem with retrieving field from database using DIH

2009-07-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Jul 31, 2009 at 1:43 AM, ahammad wrote: > > Hello all, > > I've been having this issue for a while now. I am indexing a Sybase > database. Everything is fantastic, except that there is 1 column that I can > never get back. I don't have direct database access via Sybase client, but I > was a