Re: Solr Faceting

2012-07-07 Thread Darren Govoni
I don't think it comes at any added cost for solr to return that facet so you can filter it out in your business logic. On Sat, 2012-07-07 at 15:18 +0530, Shanu Jha wrote: > Hi, > > > I am generating facet for a field which has one of the value "NA" and I > want solr should not create facet

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Hi Amit, If the caches were per-segment, then NRT would be optimal in Solr. Currently the caches are stored per-multiple-segments, meaning after each 'soft' commit, the cache(s) will be purged. On Fri, Jul 6, 2012 at 9:45 PM, Amit Nithian wrote: > Sorry I'm a bit new to the nrt stuff in solr b

Re: Nrt and caching

2012-07-07 Thread Yonik Seeley
On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen wrote: > Currently the caches are stored per-multiple-segments, meaning after each > 'soft' commit, the cache(s) will be purged. Depends which caches. Some caches are per-segment, and some caches are top level. It's also a trade-off... for some th

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
The field caches are per-segment, which are used for sorting and basic [slower] facets. The result set, document, filter, and multi-value facet caches are [in Solr] per-multi-segment. Of these, the document, filter, and multi-value facet caches could be converted to be [performant] per-segment, a

Grouping and Averages

2012-07-07 Thread Jeremy Branham
I’m sorry – I sent this email before I was confirmed in the group, so I don’t know if anyone sent a reply =\ __ Hello - I’m not sure If this is an appropriate use for Solr, but I want to stay away from a typical DB store for high availability reasons. I

Re: Grouping and Averages

2012-07-07 Thread Jack Krupansky
You can always check the Lucene/Solr archives: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ Your message is here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3CBAY170-DS274C673A7C82D716E7E000BAED0%40phx.gbl%3E It does not yet appear to have any responses

Re: Nrt and caching

2012-07-07 Thread Andy
So If I want to use multi-value facet with NRT I'd need to convert the cache to per-segment? How do I do that? Thanks. From: Jason Rutherglen To: solr-user@lucene.apache.org Sent: Saturday, July 7, 2012 11:32 AM Subject: Re: Nrt and caching The field caches

Re: Grouping and Averages

2012-07-07 Thread Jeremy Branham
Thanks Jack! Jeremy Branham Software Engineer http://LinkedIn.com/in/JeremyBranham http://jeremybranham.wordpress.com/ http://Zeroth.biz -Original Message- From: Jack Krupansky Sent: Saturday, July 07, 2012 11:16 AM To: solr-user@lucene.apache.org Subject: Re: Grouping and Averages

Re: Nrt and caching

2012-07-07 Thread Amit Nithian
Thanks for the responses. I guess my specific question is if I had something which was dependent on the mapping between lucene document ids and some object primary key so i could pull in external data from another data source without a constant reindex, how would this get affected by soft and hard

MoreLikeThis and mlt.count

2012-07-07 Thread Bruno Mannina
Dear Solr users, I have a field name "fid" defined as: required="true" termVectors="true"/> This "fid" can have a value like: a0001 b57855 3254 etc... (length <20 digits) I would like to get *all* docs that result returns. Actually by default mlt.count is set to 5 but I don't want to set it

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Andy, You'd need to hack on the Solr code, specifically the SimpleFacets class. Solr uses UnInvertedField to build an in memory doc -> terms mapping, which would need to be cached per-segment. Then you'd need to aggregate the resultant per-segment counts. There is another open source library tha

Max Memory That Solr on Tomcat can utilize

2012-07-07 Thread Rohit
Hi, Just wanted to know how much memory can Tomcat running on Windows Enterprise RC2 server effectively utilize. Is there any limitation to this? Regards, Rohit

Re: Grouping and Averages

2012-07-07 Thread Walter Underwood
It sounds like you need a database for analytics, not a search engine. Solr cannot do aggregates like that. It can select and group, but to calculate averages you'll need to fetch all the results over the network and calculate them yourself. wunder On Jul 7, 2012, at 9:05 AM, Jeremy Branham wr

Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
Average should be doable in Solr, maybe not today, not sure. Median is the challenge :) Try Hive. On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood wrote: > It sounds like you need a database for analytics, not a search engine. > > Solr cannot do aggregates like that. It can select and group, bu

Re: Grouping and Averages

2012-07-07 Thread Jeremy Branham
Thanks for the replies. I may be able to simplify my requirements. In my application, the number of documents per group indicate popularity. If I could sort the groups descending by the document count, then using the stats component + filter I could query each group to get avg value for a field

Getting only one result by family?

2012-07-07 Thread Bruno Mannina
Dear Solr users, I have a field named "FID" for Family-ID: required="true" termVectors="true"/> My uniqueKey is the field "PN" and I have several others fields (text-en, string, general text, etc...). When I do a request on my index, like: title:airplane I get several docs but some docs are

Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
I don't think aggregations in the Solr group by are completed yet. There's a Lucene or Solr issue implementing group by count that could be adapted to implement average for example. On Sat, Jul 7, 2012 at 4:37 PM, Jeremy Branham wrote: > Thanks for the replies. > I may be able to simplify my req

Re: Grouping and Averages

2012-07-07 Thread Jeremy Branham
Thanks. At this time, it looks like it may be best to use a DB as a backing store, then scheduling a task to store pre-aggregated data and other documents in Solr. Jeremy Branham Software Engineer http://LinkedIn.com/in/JeremyBranham http://jeremybranham.wordpress.com/ http://Zeroth.biz

Re: Grouping and Averages

2012-07-07 Thread Walter Underwood
That could work well. Think of the Solr index as a big, flat view on your data. Index the fields you search on and store the fields you retrieve. Missing fields are OK. Fields can be multi-valued, which is non-relational but handy. If you are in MySQL, check out GROUP_CONCAT for a way to think

Re: Nrt and caching

2012-07-07 Thread Andy
Jason, If I just use stock Solr 4.0 without modifying the source code, does that mean multi-value faceting will be very slow when I'm constantly inserting/updating documents?  Which open source library are you referring to? Will Solr adopt this per-segment approach any time soon? Thanks ___

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Multi-value faceting is fast for queries, however because it's cached per-multi-segment, each soft commit will flush the cache, and it will be reloaded on the first query. As the index grows it becomes expensive to build, as well as being RAM consuming. I am not aware of any Jira issues open with

Indexing Wikipedia

2012-07-07 Thread kiran kumar
Hi, In our office we have wikipedia setup for intranet. I want to index the wikipedia, I have been recently studying that all the wiki pages are stored in database and the schema is a bit of standard followed from mediawiki. I am also thinking of whether to use xmldumper to dump all the wiki pages

Re: Use of Solr as primary store for search engine

2012-07-07 Thread William Bell
For the search results we actually put the small amount of data in the core. Once someone clicks the results and we need to go to the item to display the detailed results, we create another core with a stored XML string field and an ID. The ID is indexable, and the string field is only stored. So

solr facet fields doesn't honor fq

2012-07-07 Thread Chamnap Chhorn
Hi all, I have a question related to solr 3.5 on field facet. Here is my query: http://localhost:8081/solr_new/select?tie=0.1&q.alt=*:*&q=bank&qf=nameaddress&fq= *portal_uuid:+A4E7890F-A188-4663-89EB-176D94DF6774*&defType=dismax&* facet=true*&facet.field=*location_uuid*&facet.field=*sub_category_