Sorting
I need to sort a query two ways. Should I do the search one way: s.getDocListAndSet(query, restrictions, sort, req.getStart(), req.getLimit(), flags); then do the same search again with a different sort value or is there a method available to just sort the DocSet (like sortDocSet but it's protected) OR maybe it doesn't matter because caching will handle it anyway? Thanks
Re: Initial import problems
I'm having slow performance with my solr index. I'm not sure what to do. I need some suggestions on what to try. I have updated all my records in the last couple of days. I'm not sure how much it degraded because of that, but it now takes about 3 seconds per search. My cache statistics don't look so good either. Also... I'm not sure I was supposed to do a couple of things. - I did an optimize index through Luke with compound format and noticed in the solrconfig file that useCompoundFile is set to false. - I changed one of the fields in the schema from text_ws to string - I added a field (type="text" indexed="false" stored="true") My schema and solrconfig are the same as the example except I have a few more fields. My pc is winXP and has 2gig of ram. Below are some stats from the solr admin stat page. Thanks! caching : true numDocs : 1185814 maxDoc : 2070472 readerImpl : MultiReader name: filterCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=512, initialSize=512, autowarmCount=256, [EMAIL PROTECTED]) stats: lookups : 658446 hits : 30 hitratio : 0.00 inserts : 658420 evictions : 657908 size : 512 cumulative_lookups : 658446 cumulative_hits : 30 cumulative_hitratio : 0.00 cumulative_inserts : 658420 cumulative_evictions : 657908 name: queryResultCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=512, initialSize=512, autowarmCount=256, [EMAIL PROTECTED]) stats: lookups : 88 hits : 83 hitratio : 0.94 inserts : 6 evictions : 0 size : 5 cumulative_lookups : 88 cumulative_hits : 83 cumulative_hitratio : 0.94 cumulative_inserts : 6 cumulative_evictions : 0 name: documentCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=512, initialSize=512) stats: lookups : 780 hits : 738 hitratio : 0.94 inserts : 42 evictions : 0 size : 42 cumulative_lookups : 780 cumulative_hits : 738 cumulative_hitratio : 0.94 cumulative_inserts : 42 cumulative_evictions : 0
Performance issue.
Sorry.. I put the wrong subject on my message. I also wanted to mention that my cpu jumps to to almost 100% each query. I'm having slow performance with my solr index. I'm not sure what to do. I need some suggestions on what to try. I have updated all my records in the last couple of days. I'm not sure how much it degraded because of that, but it now takes about 3 seconds per search. My cache statistics don't look so good either. Also... I'm not sure I was supposed to do a couple of things. - I did an optimize index through Luke with compound format and noticed in the solrconfig file that useCompoundFile is set to false. - I changed one of the fields in the schema from text_ws to string - I added a field (type="text" indexed="false" stored="true") My schema and solrconfig are the same as the example except I have a few more fields. My pc is winXP and has 2gig of ram. Below are some stats from the solr admin stat page. Thanks! caching : true numDocs : 1185814 maxDoc : 2070472 readerImpl : MultiReader name: filterCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=512, initialSize=512, autowarmCount=256, [EMAIL PROTECTED]) stats: lookups : 658446 hits : 30 hitratio : 0.00 inserts : 658420 evictions : 657908 size : 512 cumulative_lookups : 658446 cumulative_hits : 30 cumulative_hitratio : 0.00 cumulative_inserts : 658420 cumulative_evictions : 657908 name: queryResultCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=512, initialSize=512, autowarmCount=256, [EMAIL PROTECTED]) stats: lookups : 88 hits : 83 hitratio : 0.94 inserts : 6 evictions : 0 size : 5 cumulative_lookups : 88 cumulative_hits : 83 cumulative_hitratio : 0.94 cumulative_inserts : 6 cumulative_evictions : 0 name: documentCache class: org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=512, initialSize=512) stats: lookups : 780 hits : 738 hitratio : 0.94 inserts : 42 evictions : 0 size : 42 cumulative_lookups : 780 cumulative_hits : 738 cumulative_hitratio : 0.94 cumulative_inserts : 42 cumulative_evictions : 0
Re: Performance issue.
There's nothing wrong with CPU jumping to 100% each query, that just means you aren't IO bound :-) What do you mean not IO bound? >- I did an optimize index through Luke with compound format and > noticed > in the solrconfig file that useCompoundFile is set to false. Don't do this unless you really know what you are doing... Luke is probably using a different version of Lucene than Solr, and it could be dangerous. Do you think I should reindex everything? - if you are using filters, any larger than 3000 will be double the size (maxDoc bits) What do you mean larger than 3000? 3000 what and how do I tell? Can you give some examples of what your queries look like? I will get this and send it. Thanks, Yonik
Re: Performance issue.
I reindexed and optimized and it helped. However now each query averages about 1 second(down from 3-4 seconds). The bottleneck now is the getFacetTermEnumCounts function. If I take that call out it is a non measurable query time and the filtercache is being used. With the getFacetTermEnumCounts in, the filter cache after three queries is below with the hitration at 0 and everything is being evicted. This call is for the brand/manufacturer so I'm sure it is going through many thousands of queries. I'm thinking about pre-processing the brand/manu to get a small set of top brands per category and just quering them no matter what the other facets are set to.(with certain filters, no brands will be shown) If I still want to call the getFacetTermEnumCounts for ALL brands, why is it not using the cache? lookups : 32849 hits : 0 hitratio : 0.00 inserts : 32850 evictions : 32338 size : 512 cumulative_lookups : 32849 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 32850 cumulative_evictions : 32338 Thanks, Mike - Original Message - From: "Yonik Seeley" <[EMAIL PROTECTED]> To: Sent: Tuesday, December 05, 2006 8:46 PM Subject: Re: Performance issue. On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote: > There's nothing wrong with CPU jumping to 100% each query, that just > means you aren't IO bound :-) What do you mean not IO bound? There is always going to be a bottleneck somewhere. In very large indicies, the bottleneck may be waiting for IO (waiting for data to be read from the disk). If you are on a single processor system and you aren't waiting for data to be read from the disk or the network, then the request will be using close to 100% CPU, which is actually a good thing. The bad thing is how long the query takes, not the fact that it's CPU bound. >> >- I did an optimize index through Luke with compound format and >> > noticed >> > in the solrconfig file that useCompoundFile is set to false. > > Don't do this unless you really know what you are doing... Luke is > probably using a different version of Lucene than Solr, and it could > be dangerous. Do you think I should reindex everything? That would be the safest thing to do. > - if you are using filters, any larger than 3000 will be double the > size (maxDoc bits) What do you mean larger than 3000? 3000 what and how do I tell? From solrconfig.xml: The key is that the memory consumed by a HashDocSet is independent of maxDoc (the maximum internal lucene docid), but a BitSet based set has maxDoc bits in it. Thus, an unoptimized index with more deleted documents causes a higher maxDoc and higher memory usage for any BitSet based filters. -Yonik
Re: Performance issue.
It is currently a string type. Here is everything that has to do with manu in my schema... Should it have been multi-valued? Do you see anything wrong with this? multiValued="true"/> . Thanks... - Original Message - From: "Yonik Seeley" <[EMAIL PROTECTED]> To: Sent: Wednesday, December 06, 2006 9:55 PM Subject: Re: Performance issue. It is using the cache, but the number of items is larger than the size of the cache. If you want to continue to use the filter method then you need to increase the size of the filter cache to something larger than the number of unique values than what you are filtering on. I don't know if you will have enough memory to take this approach or not. The second option is to make brand/manu a non-multi-valued string type. When you do that, Solr will use a different method to calculate the facet counts (it will use the FieldCache rather than filters). You would need to reindex to try this approach. -Yonik On 12/6/06, Gmail Account <[EMAIL PROTECTED]> wrote: I reindexed and optimized and it helped. However now each query averages about 1 second(down from 3-4 seconds). The bottleneck now is the getFacetTermEnumCounts function. If I take that call out it is a non measurable query time and the filtercache is being used. With the getFacetTermEnumCounts in, the filter cache after three queries is below with the hitration at 0 and everything is being evicted. This call is for the brand/manufacturer so I'm sure it is going through many thousands of queries. I'm thinking about pre-processing the brand/manu to get a small set of top brands per category and just quering them no matter what the other facets are set to.(with certain filters, no brands will be shown) If I still want to call the getFacetTermEnumCounts for ALL brands, why is it not using the cache? lookups : 32849 hits : 0 hitratio : 0.00 inserts : 32850 evictions : 32338 size : 512 cumulative_lookups : 32849 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 32850 cumulative_evictions : 32338 Thanks, Mike - Original Message - From: "Yonik Seeley" <[EMAIL PROTECTED]> To: Sent: Tuesday, December 05, 2006 8:46 PM Subject: Re: Performance issue. > On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote: >> > There's nothing wrong with CPU jumping to 100% each query, that just >> > means you aren't IO bound :-) >> What do you mean not IO bound? > > There is always going to be a bottleneck somewhere. In very large > indicies, the bottleneck may be waiting for IO (waiting for data to be > read from the disk). If you are on a single processor system and you > aren't waiting for data to be read from the disk or the network, then > the request will be using close to 100% CPU, which is actually a good > thing. > > The bad thing is how long the query takes, not the fact that it's CPU > bound. > >> >> >- I did an optimize index through Luke with compound format >> >> > and >> >> > noticed >> >> > in the solrconfig file that useCompoundFile is set to false. >> > >> > Don't do this unless you really know what you are doing... Luke is >> > probably using a different version of Lucene than Solr, and it could >> > be dangerous. >> Do you think I should reindex everything? > > That would be the safest thing to do. > >> > - if you are using filters, any larger than 3000 will be double the >> > size (maxDoc bits) >> What do you mean larger than 3000? 3000 what and how do I tell? > > From solrconfig.xml: > > > > The key is that the memory consumed by a HashDocSet is independent of > maxDoc (the maximum internal lucene docid), but a BitSet based set has > maxDoc bits in it. Thus, an unoptimized index with more deleted > documents causes a higher maxDoc and higher memory usage for any > BitSet based filters. > > -Yonik
Tagging
I know that I've seen this topic before.. Is there a guidline on the best way to create tagging in solr? For example, keeping track of what user tagged what item in solr. And facetting based on tags? Thanks, Mike
Re: convert custom facets to Solr facets...
This would be great! I can't help with the solution but I am very interested in using it if one of you guys can figure it out. I can't wait to see if this works out. Mike - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: Sent: Tuesday, February 06, 2007 4:51 AM Subject: Re: convert custom facets to Solr facets... Yonik - this is great! Thanks for codifying the use cases and providing a possible implementation. I'll tinker with this more when I can. Erik On Feb 4, 2007, at 2:13 PM, Yonik Seeley wrote: I was confusing myself too much without nailing down more concrete examples, so I took a shot at coming up with user tagging usecases and a way to implement them with a flat schema. The usecases may be biased toward a flat schema since that's what I had in mind... so feel free to add more, or change the usecase names or descriptions to make more sense. http://wiki.apache.org/solr/UserTagDesign -Yonik
Re: Tagging
I use solr for searching and facets and love it.. The performance is awesome. However I am about to add tagging to my application and I'm having a hard time deciding if I should just database my tags for now until a better solr solution is worked out... Does anyone know what technology some of the larger sites use for tagging? Database (MySQL, SQL Server) with denormalized cache tables everywhere, something similar to solr/lucene, or something else? Thanks, Mike - Original Message - From: "Mekin Maheshwari" <[EMAIL PROTECTED]> To: Sent: Thursday, February 22, 2007 7:39 AM Subject: Re: Tagging For a more general solution, I'm thinking a separate lucene index might be ideal. -Yonik I dont know if this will work for others, below is what we do. Also, if there are things I can improve, do let me know. All tag inserts go to a small DB table. And I reindex the docs that these tags belong to in a backup index that I keep, and swap the new Index in from time to time. I dont do this on the production index, as optimizing the index takes a long time. A hack that I need to do is when looking up for tags, I also look in this small table. For me exact matches suffice, hence a db table works, may not work for others. I understand that searches on this tag dont work, till it gets into the index. The solution can obviously be made much smarter. Basically use a queue, from which the indexUpdater can pick up documents to reindex & update them when search volumes are low. I am sure a small lucene index can be used as the queue, and while searching both the indices are looked at. Btw, we are still using lucene for our search, hope to move to solr soon. -mekin