pivot range faceting

2013-10-19 Thread Toby Lazar
Is it possible to get pivot info on a range-faceted query?  For example, if
I want to query the number of orders placed in January, February, etc., I
know I can use a simple range search.  If I want to get the number of
orders by category, I can do that easily by faceting on category.  I'm
wondering if I can get the number of all orders by month, and also broken
down by category.  Is that possible in a single query?

Thanks,

Toby


Re: pivot range faceting

2013-10-20 Thread Toby Lazar
Thanks for confirming my fears.  I saw some presentations where I thought
this feature was used, but perhaps it was done performing multiple range
queries.

Any chance there is a way for copyField to copy a function of a field
instead of the original itself is there?  Or, must this be done by the
application?

Thank you again for your help.

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Sun, Oct 20, 2013 at 2:39 PM, Upayavira  wrote:

>
>
> On Sun, Oct 20, 2013, at 04:04 AM, Toby Lazar wrote:
> > Is it possible to get pivot info on a range-faceted query?  For example,
> > if
> > I want to query the number of orders placed in January, February, etc., I
> > know I can use a simple range search.  If I want to get the number of
> > orders by category, I can do that easily by faceting on category.  I'm
> > wondering if I can get the number of all orders by month, and also broken
> > down by category.  Is that possible in a single query?
>
> You can't yet include a range facet within a pivot. The way to achieve
> this is to store a version of your date field rounded to the nearest
> month, then you will be able to use that field in a pivot facet.
>
> Obviously, this requires index time effort, which is less than ideal.
>
> I guess this is a feature just waiting for someone to implement it.
>
> Upayavira
>


Re: How to get similarity score between 0 and 1 not relative score

2013-10-31 Thread Toby Lazar
I think you are looking for something like this, though you can omit the fq
section:


http://localhost:8983/solr/collection/select?abc=text:bob&q={!func}scale(product(query($abc),1),0,1)&fq={!
frange l=0.9}$q

Also, I don't understand all the fuss about normalized scores.  In the
linked example, I can see an interest in searching for "apple bannana",
"zzz yyy xxx qqq kkk ttt rrr 111", etc. and wanting only close matches for
that point in time.  Would this be a good use for this approach?  I
understand that the results can change if the documents in the index change.

Thanks,

Toby



On Thu, Oct 31, 2013 at 12:56 AM, Anshum Gupta wrote:

> Hi Susheel,
>
> Have a look at this:
> http://wiki.apache.org/lucene-java/ScoresAsPercentages
>
> You may really want to reconsider doing that.
>
>
>
>
> On Thu, Oct 31, 2013 at 9:41 AM, sushil sharma  >wrote:
>
> > Hi,
> >
> > We have a requirement where user would like to see a score (between 0 to
> > 1) which can tell how close the input search string is with result
> string.
> > So if input was very close but not exact matach, score could be .90 etc.
> >
> > I do understand that we can get score from solr & divide by highest score
> > but that will always show 1 even if we match was not exact.
> >
> > Regards,
> > Susheel
>
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>


Re: Facet field query on subset of documents

2013-12-20 Thread Toby Lazar
Luis (or anyone else),

Did you ever find a solution for this problem?  If not, is querying twice
the way to go?  I'm looking to do the same with no luck yet.

Thanks,

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Thu, Nov 21, 2013 at 5:44 PM, Luis Lebolo  wrote:

> Hi Erick,
>
> Thanks for the reply and sorry, my fault, wasn't clear enough. I was
> wondering if there was a way to remove terms that would always be zero
> (because the term came from a document that didn't match the filter query).
>
> Here's an example. I have a bunch of documents with fields 'manufacturer'
> and 'location'. If I set my filter query to "manufacturer = Sony" and all
> Sony documents had a value of 'Florida' for location, then I want 'Florida'
> NOT to show up in my facet field results. Instead, it shows up with a count
> of zero (and it'll always be zero because of my filter query).
>
> Using mincount = 1 doesn't solve my problem because I don't want it to hide
> zeroes that came from documents that actually pass my filter query.
>
> Does that make more sense?
>
>
> On Thu, Nov 21, 2013 at 4:36 PM, Erick Erickson  >wrote:
>
> > That's what faceting does. The facets are only tabulated
> > for documents that satisfy they query, including all of
> > the filter queries and anh other criteria.
> >
> > Otherwise, facet counts would be the same no matter
> > what the query was.
> >
> > Or I'm completely misunderstanding your question...
> >
> > Best,
> > Erick
> >
> >
> > On Thu, Nov 21, 2013 at 4:22 PM, Luis Lebolo 
> > wrote:
> >
> > > Hi All,
> > >
> > > Is it possible to perform a facet field query on a subset of documents
> > (the
> > > subset being defined via a filter query for instance)?
> > >
> > > I understand that facet pivoting might work, but it would require that
> > the
> > > subset be defined by some field hierarchy, e.g. manufacturer -> price
> > (then
> > > only look at the results for the manufacturer I'm interested in).
> > >
> > > What if I wanted to define a more complex subset (where the name starts
> > > with A but ends with Z and some other field is greater than 5 and yet
> > > another field is not 'x', etc.)?
> > >
> > > Ideally I would then define a "facet field constraining query" to
> include
> > > only terms from documents that pass this query.
> > >
> > > Thanks,
> > > Luis
> > >
> >
>


Re: Replicating Between Solr Clouds

2014-03-05 Thread Toby Lazar
Unless Solr is your system of record, aren't you already replicating your 
source data across the WAN?  If so, could you load Solr in colo B from your 
colo B data source?  You may be duplicating some indexing work, but at least 
your colo B Solr would be more closely in sync with your colo B data.

Toby
Sent via BlackBerry by AT&T

-Original Message-
From: Tim Potter 
Date: Wed, 5 Mar 2014 02:51:21 
To: solr-user@lucene.apache.org
Reply-To: solr-user@lucene.apache.org
Subject: RE: Replicating Between Solr Clouds

Unfortunately, there is no out-of-the-box solution for this at the moment. 

In the past, I solved this using a couple of different approaches, which 
weren't all that elegant but served the purpose and were simple enough to allow 
the ops folks to setup monitors and alerts if things didn't work.

1) use DIH's Solr entity processor to pull data from one Solr to another, see: 
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

This only works if you store all fields, which in my use case was OK because I 
also did lots of partial document updates, which also required me to store all 
fields

2) use the replication handler's snapshot support to create snapshots on a 
regular basis and then move the files over the network

This one works but required the use of read and write aliases and two 
collections on the remote (slave) data center so that I could rebuild my write 
collection from the snapshots and then update the aliases to point the reads at 
the updated collection. Work on an automated backup/restore solution is 
planned, see https://issues.apache.org/jira/browse/SOLR-5750, but if you need 
something sooner, you can write a backup driver using SolrJ that uses 
CloudSolrServer to get the address of all the shard leaders, initiate the 
backup command on each leader, poll the replication details handler for 
snapshot completion on each shard, and then ship the files across the network. 
Obviously, this isn't a solution for NRT multi-homing ;-)

Lastly, these aren't the only ways to go about this, just wanted to share some 
high-level details about what has worked.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: perdurabo 
Sent: Tuesday, March 04, 2014 1:04 PM
To: solr-user@lucene.apache.org
Subject: Replicating Between Solr Clouds

We are looking to setup a highly available failover site across a WAN for our
SolrCloud instance.  The main production instance is at colo center A and
consists of a 3-node ZooKeeper ensemble managing configs for a 4-node
SolrCloud running Solr 4.6.1.  We only have one collection among the 4 cores
and there are two shards in the collection, one master node and one replica
node for each shard.  Our search and indexing services address the Solr
cloud through a load balancer VIP, not a compound API call.

Anyway, the Solr wiki explains fairly well how to replicate single node Solr
collections, but I do not see an obvious way for replicating a SolrCloud's
indices over a WAN to another SolrCloud.  I need for a SolrCloud in another
data center to be able to replicate both shards of the collection in the
other data center over a WAN.  It needs to be able to replicate from a load
balancer VIP, not a single named server of the SolrCloud, which round robins
across all four nodes/2 shards for high availability.

I've searched high and low for a white paper or some discussion of how to do
this and haven't found anything.  Any ideas?

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
I believe SolrJ uses XML under the covers.  If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan  wrote:

> Hi,
>
> One thing to consider is, I think solrnet use xml update, there is xml
> parsing overhead with it.
> Switching to solrJ or CSV can cause additional gain.
>
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> Ahmet
>
>
> On Wednesday, March 5, 2014 10:13 PM, sweety 
> wrote:
> I will surely read about JVM Garbage collection. Thanks a lot, all of you.
>
> But, is the time required for my indexing good enough? I dont know about
> the
> ideal timings.
> I think that my indexing is taking more time.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
Thanks Ahmet for the correction.  I used wireshark to capture an
UpdateRequest to solr and saw this XML:

123blah

and figured that javabin was only for the responses.  Does wt apply for how
solrj send requests to solr?  Could this HTTP content be in javabin format?

Toby


On Wed, Mar 5, 2014 at 4:34 PM, Ahmet Arslan  wrote:

> Hi Toby,
>
> SolrJ uses javabin by default.
>
> Ahmet
>
>
> On Wednesday, March 5, 2014 11:31 PM, Toby Lazar 
> wrote:
> I believe SolrJ uses XML under the covers.  If so, I don't think you would
> improve performance by switching to SolrJ, since the client would convert
> it to XML before sending it on the wire.
>
> Toby
>
> ***
>   Toby Lazar
>   Capital Technology Group
>   Email: tla...@capitaltg.com
>   Mobile: 646-469-5865
> ***
>
>
>
> On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan  wrote:
>
> > Hi,
> >
> > One thing to consider is, I think solrnet use xml update, there is xml
> > parsing overhead with it.
> > Switching to solrJ or CSV can cause additional gain.
> >
> > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
> >
> > Ahmet
> >
> >
> > On Wednesday, March 5, 2014 10:13 PM, sweety 
> > wrote:
> > I will surely read about JVM Garbage collection. Thanks a lot, all of
> you.
> >
> > But, is the time required for my indexing good enough? I dont know about
> > the
> > ideal timings.
> > I think that my indexing is taking more time.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html
> >
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>


Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
OK, I was using HttpSolrServer since I haven't yet migrated to
CloudSolrServer.  I added the line:

   solrServer.setRequestWriter(new BinaryRequestWriter())

after creating the server object and now see the difference through
wireshark.  Is it fair to assume that this usage is multi-thread safe?

Thank you Shawn and Ahmet,

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Wed, Mar 5, 2014 at 4:46 PM, Shawn Heisey  wrote:

> On 3/5/2014 2:31 PM, Toby Lazar wrote:
>
>> I believe SolrJ uses XML under the covers.  If so, I don't think you would
>> improve performance by switching to SolrJ, since the client would convert
>> it to XML before sending it on the wire.
>>
>
> Until recently, SolrJ always used XML by default for requests and javabin
> for responses.  That is moving to javabin for both.  This is already the
> case in the newest versions for CloudSolrServer.  HttpSolrServer is still
> using the XML RequestWriter by default, but you can change this very easily
> to BinaryRequestWriter.  If you plan to use SolrJ, it's a change I would
> highly recommend.
>
> Thanks,
> Shawn
>
>