Re: Solr Faceting

2012-07-07 Thread Darren Govoni
I don't think it comes at any added cost for solr to return that facet
so you can filter it
out in your business logic.

On Sat, 2012-07-07 at 15:18 +0530, Shanu Jha wrote:

> Hi,
> 
> 
> I am generating facet for a field which has one of the value "NA" and I
> want solr should not create facet(or ignore) for this("NA") value. Is there
> any way to in solr to do that.
> 
> Thanks




Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Hi Amit,

If the caches were per-segment, then NRT would be optimal in Solr.

Currently the caches are stored per-multiple-segments, meaning after each
'soft' commit, the cache(s) will be purged.

On Fri, Jul 6, 2012 at 9:45 PM, Amit Nithian  wrote:

> Sorry I'm a bit new to the nrt stuff in solr but I'm trying to understand
> the implications of frequent commits and cache rebuilding and auto warming.
> What are the best practices surrounding nrt searching and caches and query
> performance.
>
> Thanks!
> Amit
>


Re: Nrt and caching

2012-07-07 Thread Yonik Seeley
On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
 wrote:
> Currently the caches are stored per-multiple-segments, meaning after each
> 'soft' commit, the cache(s) will be purged.

Depends which caches.  Some caches are per-segment, and some caches
are top level.
It's also a trade-off... for some things, per-segment data structures
would indeed turn around quicker on a reopen, but every query would be
slower for it.

-Yonik
http://lucidimagination.com


Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
The field caches are per-segment, which are used for sorting and basic
[slower] facets.  The result set, document, filter, and multi-value facet
caches are [in Solr] per-multi-segment.

Of these, the document, filter, and multi-value facet caches could be
converted to be [performant] per-segment, as with some other Apache
licensed Lucene based search engines.

On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley wrote:

> On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
>  wrote:
> > Currently the caches are stored per-multiple-segments, meaning after each
> > 'soft' commit, the cache(s) will be purged.
>
> Depends which caches.  Some caches are per-segment, and some caches
> are top level.
> It's also a trade-off... for some things, per-segment data structures
> would indeed turn around quicker on a reopen, but every query would be
> slower for it.
>
> -Yonik
> http://lucidimagination.com
>


Grouping and Averages

2012-07-07 Thread Jeremy Branham
I’m sorry – I sent this email before I was confirmed in the group, so I don’t 
know if anyone sent a reply =\

__

Hello -
I’m not sure If this is an appropriate use for Solr, but I want to stay away 
from a typical DB store for high availability reasons.

I am storing documents that may have a common value for a field we’ll call 
“category”.
In another field there will be an integer field we’ll call “rating”.

I would like to group the documents on the “category” field and display the 
average “rating” per group.

The stats component lets me get the avg rating, but when I collapse the results 
into groups it gives me the average for the entire collection, rather than for 
the specific group.

Am I going about this wrong?
Is it possible to get the desired outcome with a  single query?

I’d appreciate any insight!
Thank you,



Jeremy Branham
Software Engineer
http://LinkedIn.com/in/JeremyBranham
http://jeremybranham.wordpress.com/
http://Zeroth.biz


Re: Grouping and Averages

2012-07-07 Thread Jack Krupansky

You can always check the Lucene/Solr archives:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/

Your message is here:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3CBAY170-DS274C673A7C82D716E7E000BAED0%40phx.gbl%3E

It does not yet appear to have any responses.

-- Jack Krupansky

-Original Message- 
From: Jeremy Branham

Sent: Saturday, July 07, 2012 11:05 AM
To: solr-user@lucene.apache.org
Subject: Grouping and Averages

I’m sorry – I sent this email before I was confirmed in the group, so I don’t 
know if anyone sent a reply =\


__

Hello -
I’m not sure If this is an appropriate use for Solr, but I want to stay away 
from a typical DB store for high availability reasons.


I am storing documents that may have a common value for a field we’ll call 
“category”.

In another field there will be an integer field we’ll call “rating”.

I would like to group the documents on the “category” field and display the 
average “rating” per group.


The stats component lets me get the avg rating, but when I collapse the 
results into groups it gives me the average for the entire collection, 
rather than for the specific group.


Am I going about this wrong?
Is it possible to get the desired outcome with a  single query?

I’d appreciate any insight!
Thank you,



Jeremy Branham
Software Engineer
http://LinkedIn.com/in/JeremyBranham
http://jeremybranham.wordpress.com/
http://Zeroth.biz 



Re: Nrt and caching

2012-07-07 Thread Andy
So If I want to use multi-value facet with NRT I'd need to convert the cache to 
per-segment? How do I do that?

Thanks.



 From: Jason Rutherglen 
To: solr-user@lucene.apache.org 
Sent: Saturday, July 7, 2012 11:32 AM
Subject: Re: Nrt and caching
 
The field caches are per-segment, which are used for sorting and basic
[slower] facets.  The result set, document, filter, and multi-value facet
caches are [in Solr] per-multi-segment.

Of these, the document, filter, and multi-value facet caches could be
converted to be [performant] per-segment, as with some other Apache
licensed Lucene based search engines.

On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley wrote:

> On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
>  wrote:
> > Currently the caches are stored per-multiple-segments, meaning after each
> > 'soft' commit, the cache(s) will be purged.
>
> Depends which caches.  Some caches are per-segment, and some caches
> are top level.
> It's also a trade-off... for some things, per-segment data structures
> would indeed turn around quicker on a reopen, but every query would be
> slower for it.
>
> -Yonik
> http://lucidimagination.com
>

Re: Grouping and Averages

2012-07-07 Thread Jeremy Branham

Thanks Jack!



Jeremy Branham
Software Engineer
http://LinkedIn.com/in/JeremyBranham
http://jeremybranham.wordpress.com/
http://Zeroth.biz


-Original Message- 
From: Jack Krupansky

Sent: Saturday, July 07, 2012 11:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Grouping and Averages

You can always check the Lucene/Solr archives:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/

Your message is here:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3CBAY170-DS274C673A7C82D716E7E000BAED0%40phx.gbl%3E

It does not yet appear to have any responses.

-- Jack Krupansky

-Original Message- 
From: Jeremy Branham

Sent: Saturday, July 07, 2012 11:05 AM
To: solr-user@lucene.apache.org
Subject: Grouping and Averages

I’m sorry – I sent this email before I was confirmed in the group, so I don’t
know if anyone sent a reply =\

__

Hello -
I’m not sure If this is an appropriate use for Solr, but I want to stay away
from a typical DB store for high availability reasons.

I am storing documents that may have a common value for a field we’ll call
“category”.
In another field there will be an integer field we’ll call “rating”.

I would like to group the documents on the “category” field and display the
average “rating” per group.

The stats component lets me get the avg rating, but when I collapse the
results into groups it gives me the average for the entire collection,
rather than for the specific group.

Am I going about this wrong?
Is it possible to get the desired outcome with a  single query?

I’d appreciate any insight!
Thank you,



Jeremy Branham
Software Engineer
http://LinkedIn.com/in/JeremyBranham
http://jeremybranham.wordpress.com/
http://Zeroth.biz



Re: Nrt and caching

2012-07-07 Thread Amit Nithian
Thanks for the responses. I guess my specific question is if I had
something which was dependent on the mapping between lucene document ids
and some object primary key so i could pull in external data from another
data source without a constant reindex, how would this get affected by soft
and hard commits? I'd prefer not to have to rebuild this mapping from
scratch on each soft or even hard commits if possible since those seem to
happen frequently.

Also can you explain why and how per segment caches are used and how at the
client of lucene layer one gets access or knows about this? I always
thought segments were an implementation detail where they get merged on
optimize etc so wouldn't that affect clients depending on segment level
stuff? Or what am I missing?

Thanks again!
Amit
On Jul 7, 2012 9:22 AM, "Andy"  wrote:

> So If I want to use multi-value facet with NRT I'd need to convert the
> cache to per-segment? How do I do that?
>
> Thanks.
>
>
> 
>  From: Jason Rutherglen 
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 7, 2012 11:32 AM
> Subject: Re: Nrt and caching
>
> The field caches are per-segment, which are used for sorting and basic
> [slower] facets.  The result set, document, filter, and multi-value facet
> caches are [in Solr] per-multi-segment.
>
> Of these, the document, filter, and multi-value facet caches could be
> converted to be [performant] per-segment, as with some other Apache
> licensed Lucene based search engines.
>
> On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley  >wrote:
>
> > On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
> >  wrote:
> > > Currently the caches are stored per-multiple-segments, meaning after
> each
> > > 'soft' commit, the cache(s) will be purged.
> >
> > Depends which caches.  Some caches are per-segment, and some caches
> > are top level.
> > It's also a trade-off... for some things, per-segment data structures
> > would indeed turn around quicker on a reopen, but every query would be
> > slower for it.
> >
> > -Yonik
> > http://lucidimagination.com
> >


MoreLikeThis and mlt.count

2012-07-07 Thread Bruno Mannina

Dear Solr users,

I have a field name "fid" defined as:
required="true" termVectors="true"/>


This "fid" can have a value like:
a0001
b57855
3254
etc...
(length <20 digits)

I would like to get *all* docs that result returns. Actually by default 
mlt.count is set to 5 but I don't want to

set it to 200 in my url to be sure to get all results in the same xml.

Is there a way to set mlt.count to get always *all* mlt documents ?

I read http://wiki.apache.org/solr/MoreLikeThis without find a solution



Sincerely,
Bruno
Solr 3.6
Ubuntu


Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Andy,

You'd need to hack on the Solr code, specifically the SimpleFacets class.
Solr uses UnInvertedField to build an in memory doc -> terms mapping, which
would need to be cached per-segment.  Then you'd need to aggregate the
resultant per-segment counts.

There is another open source library that has taken the same basic faceting
approach (it is per-segment), and could be colloquially faster, however it
is built for Lucene 3.x at the moment.

On Sat, Jul 7, 2012 at 12:21 PM, Andy  wrote:

> So If I want to use multi-value facet with NRT I'd need to convert the
> cache to per-segment? How do I do that?
>
> Thanks.
>
>
> 
>  From: Jason Rutherglen 
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 7, 2012 11:32 AM
> Subject: Re: Nrt and caching
>
> The field caches are per-segment, which are used for sorting and basic
> [slower] facets.  The result set, document, filter, and multi-value facet
> caches are [in Solr] per-multi-segment.
>
> Of these, the document, filter, and multi-value facet caches could be
> converted to be [performant] per-segment, as with some other Apache
> licensed Lucene based search engines.
>
> On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley  >wrote:
>
> > On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
> >  wrote:
> > > Currently the caches are stored per-multiple-segments, meaning after
> each
> > > 'soft' commit, the cache(s) will be purged.
> >
> > Depends which caches.  Some caches are per-segment, and some caches
> > are top level.
> > It's also a trade-off... for some things, per-segment data structures
> > would indeed turn around quicker on a reopen, but every query would be
> > slower for it.
> >
> > -Yonik
> > http://lucidimagination.com
> >
>


Max Memory That Solr on Tomcat can utilize

2012-07-07 Thread Rohit
Hi,

 

Just wanted to know how much memory can Tomcat running on Windows Enterprise
RC2 server effectively utilize.  Is there any limitation to this?

 

Regards,

Rohit

 



Re: Grouping and Averages

2012-07-07 Thread Walter Underwood
It sounds like you need a database for analytics, not a search engine.

Solr cannot do aggregates like that. It can select and group, but to calculate 
averages you'll need to fetch all the results over the network and calculate 
them yourself.

wunder

On Jul 7, 2012, at 9:05 AM, Jeremy Branham wrote:

> I’m sorry – I sent this email before I was confirmed in the group, so I don’t 
> know if anyone sent a reply =\
> 
> __
> 
> Hello -
> I’m not sure If this is an appropriate use for Solr, but I want to stay away 
> from a typical DB store for high availability reasons.
> 
> I am storing documents that may have a common value for a field we’ll call 
> “category”.
> In another field there will be an integer field we’ll call “rating”.
> 
> I would like to group the documents on the “category” field and display the 
> average “rating” per group.
> 
> The stats component lets me get the avg rating, but when I collapse the 
> results into groups it gives me the average for the entire collection, rather 
> than for the specific group.
> 
> Am I going about this wrong?
> Is it possible to get the desired outcome with a  single query?
> 
> I’d appreciate any insight!
> Thank you,
> 
> 
> 
> Jeremy Branham
> Software Engineer
> http://LinkedIn.com/in/JeremyBranham
> http://jeremybranham.wordpress.com/
> http://Zeroth.biz





Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
Average should be doable in Solr, maybe not today, not sure.  Median is the
challenge :)  Try Hive.

On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood wrote:

> It sounds like you need a database for analytics, not a search engine.
>
> Solr cannot do aggregates like that. It can select and group, but to
> calculate averages you'll need to fetch all the results over the network
> and calculate them yourself.
>
> wunder
>
> On Jul 7, 2012, at 9:05 AM, Jeremy Branham wrote:
>
> > I’m sorry – I sent this email before I was confirmed in the group, so I
> don’t know if anyone sent a reply =\
> >
> > __
> >
> > Hello -
> > I’m not sure If this is an appropriate use for Solr, but I want to stay
> away from a typical DB store for high availability reasons.
> >
> > I am storing documents that may have a common value for a field we’ll
> call “category”.
> > In another field there will be an integer field we’ll call “rating”.
> >
> > I would like to group the documents on the “category” field and display
> the average “rating” per group.
> >
> > The stats component lets me get the avg rating, but when I collapse the
> results into groups it gives me the average for the entire collection,
> rather than for the specific group.
> >
> > Am I going about this wrong?
> > Is it possible to get the desired outcome with a  single query?
> >
> > I’d appreciate any insight!
> > Thank you,
> >
> >
> >
> > Jeremy Branham
> > Software Engineer
> > http://LinkedIn.com/in/JeremyBranham
> > http://jeremybranham.wordpress.com/
> > http://Zeroth.biz
>
>
>
>


Re: Grouping and Averages

2012-07-07 Thread Jeremy Branham

Thanks for the replies.
I may be able to simplify my requirements.

In my application, the number of documents per group indicate popularity.
If I could sort the groups descending by the document count, then using the 
stats component + filter I could query each group to get avg value for a 
field.


Though I dont see how to sort the groups by document count.
I thought maybe a pseudo field with a functional query would return a 
document element but my tests failed.


Its a bit of a challenge to switch my thought process from SQL to Solr.

Jeremy Branham
Software Engineer
http://LinkedIn.com/in/JeremyBranham
http://jeremybranham.wordpress.com/
http://Zeroth.biz

-Original Message- 
From: Jason Rutherglen

Sent: Saturday, July 07, 2012 2:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Grouping and Averages

Average should be doable in Solr, maybe not today, not sure.  Median is the
challenge :)  Try Hive.

On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood 
wrote:



It sounds like you need a database for analytics, not a search engine.

Solr cannot do aggregates like that. It can select and group, but to
calculate averages you'll need to fetch all the results over the network
and calculate them yourself.

wunder

On Jul 7, 2012, at 9:05 AM, Jeremy Branham wrote:

> I’m sorry – I sent this email before I was confirmed in the group, so I
don’t know if anyone sent a reply =\
>
> __
>
> Hello -
> I’m not sure If this is an appropriate use for Solr, but I want to stay
away from a typical DB store for high availability reasons.
>
> I am storing documents that may have a common value for a field we’ll
call “category”.
> In another field there will be an integer field we’ll call “rating”.
>
> I would like to group the documents on the “category” field and display
the average “rating” per group.
>
> The stats component lets me get the avg rating, but when I collapse the
results into groups it gives me the average for the entire collection,
rather than for the specific group.
>
> Am I going about this wrong?
> Is it possible to get the desired outcome with a  single query?
>
> I’d appreciate any insight!
> Thank you,
>
>
>
> Jeremy Branham
> Software Engineer
> http://LinkedIn.com/in/JeremyBranham
> http://jeremybranham.wordpress.com/
> http://Zeroth.biz








Getting only one result by family?

2012-07-07 Thread Bruno Mannina

Dear Solr users,

I have a field named "FID" for Family-ID:
required="true" termVectors="true"/>


My uniqueKey is the field "PN" and I have several others fields 
(text-en, string, general text, etc...).


When I do a request on my index, like:
title:airplane

I get several docs but some docs are from the same family members (FID 
are equals)

Example:
Doc1
fid=A0123
Doc2
fid=B777
Doc3
fid=C008
...
Doc175  <= same family Doc1
fid=A0123
...

Is it possible to get only docs with FID differents?
I don't want to see Doc175 on my XML result.
By this way if I set "rows=20" I will have 20 docs from 20 different 
families.


Thanks for your help,
Bruno
Solr3.6
Ubuntu


Re: Grouping and Averages

2012-07-07 Thread Jason Rutherglen
I don't think aggregations in the Solr group by are completed yet.  There's
a Lucene or Solr issue implementing group by count that could be adapted to
implement average for example.

On Sat, Jul 7, 2012 at 4:37 PM, Jeremy Branham wrote:

> Thanks for the replies.
> I may be able to simplify my requirements.
>
> In my application, the number of documents per group indicate popularity.
> If I could sort the groups descending by the document count, then using
> the stats component + filter I could query each group to get avg value for
> a field.
>
> Though I dont see how to sort the groups by document count.
> I thought maybe a pseudo field with a functional query would return a
> document element but my tests failed.
>
> Its a bit of a challenge to switch my thought process from SQL to Solr.
>
>
> Jeremy Branham
> Software Engineer
> http://LinkedIn.com/in/**JeremyBranham
> http://jeremybranham.**wordpress.com/
> http://Zeroth.biz
>
> -Original Message- From: Jason Rutherglen
> Sent: Saturday, July 07, 2012 2:45 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Grouping and Averages
>
> Average should be doable in Solr, maybe not today, not sure.  Median is the
> challenge :)  Try Hive.
>
> On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood  >wrote:
>
>  It sounds like you need a database for analytics, not a search engine.
>>
>> Solr cannot do aggregates like that. It can select and group, but to
>> calculate averages you'll need to fetch all the results over the network
>> and calculate them yourself.
>>
>> wunder
>>
>> On Jul 7, 2012, at 9:05 AM, Jeremy Branham wrote:
>>
>> > I’m sorry – I sent this email before I was confirmed in the group, so I
>> don’t know if anyone sent a reply =\
>> >
>> > __**
>> >
>> > Hello -
>> > I’m not sure If this is an appropriate use for Solr, but I want to stay
>> away from a typical DB store for high availability reasons.
>> >
>> > I am storing documents that may have a common value for a field we’ll
>> call “category”.
>> > In another field there will be an integer field we’ll call “rating”.
>> >
>> > I would like to group the documents on the “category” field and display
>> the average “rating” per group.
>> >
>> > The stats component lets me get the avg rating, but when I collapse the
>> results into groups it gives me the average for the entire collection,
>> rather than for the specific group.
>> >
>> > Am I going about this wrong?
>> > Is it possible to get the desired outcome with a  single query?
>> >
>> > I’d appreciate any insight!
>> > Thank you,
>> >
>> >
>> >
>> > Jeremy Branham
>> > Software Engineer
>> > http://LinkedIn.com/in/**JeremyBranham
>> > http://jeremybranham.**wordpress.com/
>> > http://Zeroth.biz
>>
>>
>>
>>
>>
>


Re: Grouping and Averages

2012-07-07 Thread Jeremy Branham

Thanks.
At this time, it looks like it may be best to use a DB as a backing store, 
then scheduling a task to store pre-aggregated data and other documents in 
Solr.



Jeremy Branham
Software Engineer
http://LinkedIn.com/in/JeremyBranham
http://jeremybranham.wordpress.com/
http://Zeroth.biz

-Original Message- 
From: Jason Rutherglen

Sent: Saturday, July 07, 2012 4:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Grouping and Averages

I don't think aggregations in the Solr group by are completed yet.  There's
a Lucene or Solr issue implementing group by count that could be adapted to
implement average for example.

On Sat, Jul 7, 2012 at 4:37 PM, Jeremy Branham wrote:


Thanks for the replies.
I may be able to simplify my requirements.

In my application, the number of documents per group indicate popularity.
If I could sort the groups descending by the document count, then using
the stats component + filter I could query each group to get avg value for
a field.

Though I dont see how to sort the groups by document count.
I thought maybe a pseudo field with a functional query would return a
document element but my tests failed.

Its a bit of a challenge to switch my thought process from SQL to Solr.


Jeremy Branham
Software Engineer
http://LinkedIn.com/in/**JeremyBranham
http://jeremybranham.**wordpress.com/
http://Zeroth.biz

-Original Message- From: Jason Rutherglen
Sent: Saturday, July 07, 2012 2:45 PM

To: solr-user@lucene.apache.org
Subject: Re: Grouping and Averages

Average should be doable in Solr, maybe not today, not sure.  Median is 
the

challenge :)  Try Hive.

On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood wrote:

 It sounds like you need a database for analytics, not a search engine.


Solr cannot do aggregates like that. It can select and group, but to
calculate averages you'll need to fetch all the results over the network
and calculate them yourself.

wunder

On Jul 7, 2012, at 9:05 AM, Jeremy Branham wrote:

> I’m sorry – I sent this email before I was confirmed in the group, so I
don’t know if anyone sent a reply =\
>
> __**
>
> Hello -
> I’m not sure If this is an appropriate use for Solr, but I want to stay
away from a typical DB store for high availability reasons.
>
> I am storing documents that may have a common value for a field we’ll
call “category”.
> In another field there will be an integer field we’ll call “rating”.
>
> I would like to group the documents on the “category” field and display
the average “rating” per group.
>
> The stats component lets me get the avg rating, but when I collapse the
results into groups it gives me the average for the entire collection,
rather than for the specific group.
>
> Am I going about this wrong?
> Is it possible to get the desired outcome with a  single query?
>
> I’d appreciate any insight!
> Thank you,
>
>
>
> Jeremy Branham
> Software Engineer
> http://LinkedIn.com/in/**JeremyBranham
> http://jeremybranham.**wordpress.com/
> http://Zeroth.biz











Re: Grouping and Averages

2012-07-07 Thread Walter Underwood
That could work well.

Think of the Solr index as a big, flat view on your data. Index the fields you 
search on and store the fields you retrieve. Missing fields are OK.

Fields can be multi-valued, which is non-relational but handy. If you are in 
MySQL, check out GROUP_CONCAT for a way to think about mapping relational to 
multi-valued fields.

wunder
Walter Underwood
Search Guy, Chegg

On Jul 7, 2012, at 2:51 PM, Jeremy Branham wrote:

> Thanks.
> At this time, it looks like it may be best to use a DB as a backing store, 
> then scheduling a task to store pre-aggregated data and other documents in 
> Solr.
> 
> 
> Jeremy Branham
> Software Engineer
> http://LinkedIn.com/in/JeremyBranham
> http://jeremybranham.wordpress.com/
> http://Zeroth.biz
> 
> -Original Message- From: Jason Rutherglen
> Sent: Saturday, July 07, 2012 4:39 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Grouping and Averages
> 
> I don't think aggregations in the Solr group by are completed yet.  There's
> a Lucene or Solr issue implementing group by count that could be adapted to
> implement average for example.
> 
> On Sat, Jul 7, 2012 at 4:37 PM, Jeremy Branham wrote:
> 
>> Thanks for the replies.
>> I may be able to simplify my requirements.
>> 
>> In my application, the number of documents per group indicate popularity.
>> If I could sort the groups descending by the document count, then using
>> the stats component + filter I could query each group to get avg value for
>> a field.
>> 
>> Though I dont see how to sort the groups by document count.
>> I thought maybe a pseudo field with a functional query would return a
>> document element but my tests failed.
>> 
>> Its a bit of a challenge to switch my thought process from SQL to Solr.
>> 
>> 
>> Jeremy Branham
>> Software Engineer
>> http://LinkedIn.com/in/**JeremyBranham
>> http://jeremybranham.**wordpress.com/
>> http://Zeroth.biz
>> 
>> -Original Message- From: Jason Rutherglen
>> Sent: Saturday, July 07, 2012 2:45 PM
>> 
>> To: solr-user@lucene.apache.org
>> Subject: Re: Grouping and Averages
>> 
>> Average should be doable in Solr, maybe not today, not sure.  Median is the
>> challenge :)  Try Hive.
>> 
>> On Sat, Jul 7, 2012 at 3:34 PM, Walter Underwood > >wrote:
>> 
>> It sounds like you need a database for analytics, not a search engine.
>>> 
>>> Solr cannot do aggregates like that. It can select and group, but to
>>> calculate averages you'll need to fetch all the results over the network
>>> and calculate them yourself.
>>> 
>>> wunder
>>> 
>>> On Jul 7, 2012, at 9:05 AM, Jeremy Branham wrote:
>>> 
>>> > I’m sorry – I sent this email before I was confirmed in the group, so I
>>> don’t know if anyone sent a reply =\
>>> >
>>> > __**
>>> >
>>> > Hello -
>>> > I’m not sure If this is an appropriate use for Solr, but I want to stay
>>> away from a typical DB store for high availability reasons.
>>> >
>>> > I am storing documents that may have a common value for a field we’ll
>>> call “category”.
>>> > In another field there will be an integer field we’ll call “rating”.
>>> >
>>> > I would like to group the documents on the “category” field and display
>>> the average “rating” per group.
>>> >
>>> > The stats component lets me get the avg rating, but when I collapse the
>>> results into groups it gives me the average for the entire collection,
>>> rather than for the specific group.
>>> >
>>> > Am I going about this wrong?
>>> > Is it possible to get the desired outcome with a  single query?
>>> >
>>> > I’d appreciate any insight!
>>> > Thank you,
>>> >
>>> >
>>> >
>>> > Jeremy Branham
>>> > Software Engineer
>>> > http://LinkedIn.com/in/**JeremyBranham
>>> > http://jeremybranham.**wordpress.com/
>>> > http://Zeroth.biz
>>> 
>>> 
>>> 
>>> 
>>> 
> 

--
Walter Underwood
wun...@wunderwood.org





Re: Nrt and caching

2012-07-07 Thread Andy
Jason,

If I just use stock Solr 4.0 without modifying the source code, does that mean 
multi-value faceting will be very slow when I'm constantly inserting/updating 
documents? 

Which open source library are you referring to? Will Solr adopt this 
per-segment approach any time soon?

Thanks



 From: Jason Rutherglen 
To: solr-user@lucene.apache.org 
Sent: Saturday, July 7, 2012 2:05 PM
Subject: Re: Nrt and caching
 
Andy,

You'd need to hack on the Solr code, specifically the SimpleFacets class.
Solr uses UnInvertedField to build an in memory doc -> terms mapping, which
would need to be cached per-segment.  Then you'd need to aggregate the
resultant per-segment counts.

There is another open source library that has taken the same basic faceting
approach (it is per-segment), and could be colloquially faster, however it
is built for Lucene 3.x at the moment.

On Sat, Jul 7, 2012 at 12:21 PM, Andy  wrote:

> So If I want to use multi-value facet with NRT I'd need to convert the
> cache to per-segment? How do I do that?
>
> Thanks.
>
>
> 
>  From: Jason Rutherglen 
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 7, 2012 11:32 AM
> Subject: Re: Nrt and caching
>
> The field caches are per-segment, which are used for sorting and basic
> [slower] facets.  The result set, document, filter, and multi-value facet
> caches are [in Solr] per-multi-segment.
>
> Of these, the document, filter, and multi-value facet caches could be
> converted to be [performant] per-segment, as with some other Apache
> licensed Lucene based search engines.
>
> On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley  >wrote:
>
> > On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
> >  wrote:
> > > Currently the caches are stored per-multiple-segments, meaning after
> each
> > > 'soft' commit, the cache(s) will be purged.
> >
> > Depends which caches.  Some caches are per-segment, and some caches
> > are top level.
> > It's also a trade-off... for some things, per-segment data structures
> > would indeed turn around quicker on a reopen, but every query would be
> > slower for it.
> >
> > -Yonik
> > http://lucidimagination.com
> >
>

Re: Nrt and caching

2012-07-07 Thread Jason Rutherglen
Multi-value faceting is fast for queries, however because it's cached
per-multi-segment, each soft commit will flush the cache, and it will be
reloaded on the first query.  As the index grows it becomes expensive to
build, as well as being RAM consuming.

I am not aware of any Jira issues open with activity regarding adding this
feature to Solr.

On Sat, Jul 7, 2012 at 8:32 PM, Andy  wrote:

> Jason,
>
> If I just use stock Solr 4.0 without modifying the source code, does that
> mean multi-value faceting will be very slow when I'm constantly
> inserting/updating documents?
>
> Which open source library are you referring to? Will Solr adopt this
> per-segment approach any time soon?
>
> Thanks
>
>
> 
>  From: Jason Rutherglen 
> To: solr-user@lucene.apache.org
> Sent: Saturday, July 7, 2012 2:05 PM
> Subject: Re: Nrt and caching
>
> Andy,
>
> You'd need to hack on the Solr code, specifically the SimpleFacets class.
> Solr uses UnInvertedField to build an in memory doc -> terms mapping, which
> would need to be cached per-segment.  Then you'd need to aggregate the
> resultant per-segment counts.
>
> There is another open source library that has taken the same basic faceting
> approach (it is per-segment), and could be colloquially faster, however it
> is built for Lucene 3.x at the moment.
>
> On Sat, Jul 7, 2012 at 12:21 PM, Andy  wrote:
>
> > So If I want to use multi-value facet with NRT I'd need to convert the
> > cache to per-segment? How do I do that?
> >
> > Thanks.
> >
> >
> > 
> >  From: Jason Rutherglen 
> > To: solr-user@lucene.apache.org
> > Sent: Saturday, July 7, 2012 11:32 AM
> > Subject: Re: Nrt and caching
> >
> > The field caches are per-segment, which are used for sorting and basic
> > [slower] facets.  The result set, document, filter, and multi-value facet
> > caches are [in Solr] per-multi-segment.
> >
> > Of these, the document, filter, and multi-value facet caches could be
> > converted to be [performant] per-segment, as with some other Apache
> > licensed Lucene based search engines.
> >
> > On Sat, Jul 7, 2012 at 10:42 AM, Yonik Seeley <
> yo...@lucidimagination.com
> > >wrote:
> >
> > > On Sat, Jul 7, 2012 at 9:59 AM, Jason Rutherglen
> > >  wrote:
> > > > Currently the caches are stored per-multiple-segments, meaning after
> > each
> > > > 'soft' commit, the cache(s) will be purged.
> > >
> > > Depends which caches.  Some caches are per-segment, and some caches
> > > are top level.
> > > It's also a trade-off... for some things, per-segment data structures
> > > would indeed turn around quicker on a reopen, but every query would be
> > > slower for it.
> > >
> > > -Yonik
> > > http://lucidimagination.com
> > >
> >
>


Indexing Wikipedia

2012-07-07 Thread kiran kumar
Hi,
In our office we have wikipedia setup for intranet. I want to index the
wikipedia, I have been recently studying that all the wiki pages are stored
in database and the schema is a bit of standard followed from mediawiki. I
am also thinking of whether to use xmldumper to dump all the wiki pages
into xml and index from there.
Have anybody done something like this. If so, which way is more efficient
and easy to implement.
For me the DB schema look quite a bit complicated. Can somebody please help
me in understanding what is the better implementation for this.

Thanks,
Kiran Bushireddy.


Re: Use of Solr as primary store for search engine

2012-07-07 Thread William Bell
For the search results we actually put the small amount of data in the core.

Once someone clicks the results and we need to go to the item to
display the detailed results, we create another core with a stored XML
string field and an ID. The ID is indexable, and the string field is
only stored.

So we have:

productsearch core
product core

This is in production and working fantastic for the last 4 months.

Our index is about 3M records.


On Thu, Jul 5, 2012 at 4:44 AM, Sohail Aboobaker  wrote:
> In many e-commerce sites, most of data that we display (except images)
> especially in grids and lists is minimal. We were inclined to use Solr as
> data store for only displaying the information in grids. We stopped only
> due to non-availability of joins in Solr3.5. Since, our data (like any
> other relational store) is split in multiple tables, we needed to
> de-normalize to use solr as a store. We decided against it because that
> would mean potentially heavy updates to indexes whenever related data is
> updated. With Solr 4.0, we might have decided differently and implement the
> grids using joins within solr.
>
> We are too new to Solr to have any insights into it.
>
> Regards,
> Sohail



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


solr facet fields doesn't honor fq

2012-07-07 Thread Chamnap Chhorn
Hi all,

I have a question related to solr 3.5 on field facet. Here is my query:

http://localhost:8081/solr_new/select?tie=0.1&q.alt=*:*&q=bank&qf=nameaddress&fq=
*portal_uuid:+A4E7890F-A188-4663-89EB-176D94DF6774*&defType=dismax&*
facet=true*&facet.field=*location_uuid*&facet.field=*sub_category_uuids*

What I get back with field facet are:
1. Some location_uuids which is in the current portal_uuid (has facet count
> 0)
2. Some location_uuids are not in the current portal_uuid at all (has facet
count = 0)

It seems that solr doesn't honor the fq at all when returning field facet.
I need to add one more parameter "facet.mincount=1" in order to not return
location_uuids facet (2).

I think, solr does faceting on all location_uuid. It should does that
scoping to current portal_uuid. Any idea?

-- 
Chhorn Chamnap
http://chamnap.github.com/