Re: feedback on Solr 4.x LotsOfCores feature

2013-10-19 Thread Erick Erickson
For my quick-and-dirty test I just rebooted my machine totally and still
had 1K/sec core discovery. So this still puzzles me greatly. The time
do do this should be approximated by the time it takes to just walk
your tree, find all the core.properties and read them. I it possible to
just write a tiny Java program to do that? Or rip off the core discovery
code and use that for a small stand-alone program? Because this is quite
a bit at odds with what I've seen. Although now that I think about it,
the code has gone through some revisions since then, but I don't think
they should have affected this...

Best
Erick


On Fri, Oct 18, 2013 at 2:59 PM, Soyez Olivier
wrote:

> 15K cores is around 4 minutes : no network drive, just a spinning disk
> But, one important thing, to simulate a cold start or an useless linux
> buffer cache,
> I used the following command to empty the linux buffer cache :
> sync && echo 3 > /proc/sys/vm/drop_caches
> Then, I started Solr and I found the result above
>
>
> Le 11/10/2013 13:06, Erick Erickson a écrit :
>
>
> bq: sharing the underlying solrconfig object the configset introduced
> in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode
>
> SOLR-4478 will NOT share the underlying config objects, it simply
> shares the underlying directory. Each core will, at least as presently
> envisioned, simply read the files that exist there and create their
> own solrconfig object. Schema objects may be shared, but not config
> objects. It may turn out to be relatively easy to do in the configset
> situation, but last time I looked at sharing the underlying config
> object it was too fraught with problems.
>
> bq: 15K cores is around 4 minutes
>
> I find this very odd. On my laptop, spinning disk, I think I was
> seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
> have no idea what's going on here. If this is just reading the files,
> you should be seeing horrible disk contention. Are you on some kind of
> networked drive?
>
> bq: To do that in background and to block on that request until core
> discovery is complete, should not work for us (due to the worst case).
> What other choices are there? Either you have to do it up front or
> with some kind of blocking. Hmmm, I suppose you could keep some kind
> of custom store (DB? File? ZooKeeper?) that would keep the last known
> layout. You'd still have some kind of worst-case situation where the
> core you were trying to load wouldn't be in your persistent store and
> you'd _still_ have to wait for the discovery process to complete.
>
> bq: and we will use the cores Auto option to create load or only load
> the core on
> Interesting. I can see how this could all work without any core
> discovery but it does require a very specific setup.
>
> On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
>  wrote:
> > The corresponding patch for Solr 4.2.1 LotsOfCores can be found in
> SOLR-5316, including the new Cores options :
> > - "numBuckets" to create a subdirectory based on a hash on the corename
> % numBuckets in the core Datadir
> > - "Auto" with 3 differents values :
> >   1) false : default behaviour
> >   2) createLoad : create, if not exist, and load the core on the fly on
> the first incoming request (update, select)
> >   3) onlyLoad : load the core on the fly on the first incoming request
> (update, select), if exist on disk
> >
> > Concerning :
> > - sharing the underlying solrconfig object, the configset introduced in
> JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode.
> > We need to test it for our use case. If another solution exists, please
> tell me. We are very interested in such functionality and to contribute, if
> we can.
> >
> > - the possibility of lotsOfCores in SolrCloud, we don't know in details
> how SolrCloud is working.
> > But one possible limit is the maximum number of entries that can be
> added to a zookeeper node.
> > Maybe, a solution will be just a kind of hashing in the zookeeper tree.
> >
> > - the time to discover cores in Solr 4.4 : with spinning disk under
> linux, all cores with transient="true" and loadOnStartup="false", the linux
> buffer cache empty before starting Solr :
> > 15K cores is around 4 minutes. It's linear in the cores number, so for
> 50K it's more than 13 minutes. In fact, it corresponding to the time to
> read all core.properties files.
> > To do that in background and to block on that request until core
> discovery is complete, should not work for us (due to the worst case).
> > So, we will just disable the core Discovery, because we don't need to
> know all cores from the start. Start Solr without any core entries in
> solr.xml, and we will use the cores Auto option to create load or only load
> the core on the fly, based on the existence of the core on the disk
> (absolute path calculated from the core name).
> >
> > Thanks for your interest,
> >
> > Olivier
> > 
> > De :

Re: querying nested entity fields

2013-10-19 Thread Erick Erickson
You'd have to flatten your data. Perhaps you create
a field and add the concatenated data and category,
values like
a_product1, a_product2, b_product12, b_product23

Then your searches are easy...

Best,
Erick


On Fri, Oct 18, 2013 at 4:16 PM, sathish_ix wrote:

> Hi ,
>
> can some help if below query is possible,
>
> Schema:
>
> 
> A
> product1
> product2
> 
> B
> product12
> product23
> 
> 
>
> Is it possible to like this q=tag.category:A AND
> tag.category.product=product1 ???
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/querying-nested-entity-fields-tp4096382.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Seeking New Moderators for solr-user@lucene

2013-10-19 Thread Furkan KAMACI
Hi Chris;

I am volunteer and I really want to be a moderator for a long time as you
know :)

Thanks;
Furkan KAMACI


2013/10/19 Alexandre Rafalovitch 

> I'll be happy to moderate. I do it for some other lists already.
>
> Regards,
> Alex
>


Re: SOLRJ replace document

2013-10-19 Thread Brent Ryan
So I found out the issue here...  It was related to what you guys said
regarding the Map object in my document.  The problem is that I had data
being serialized from DB -> .NET -> JSON and some of the fields in .NET was
== System.DBNull.Value instead of null.  This caused the JSON serializer to
write out an object (ie. Map) so when these fields got deserialized into
the SolrInputDocument it had the Map objects as you indicated.

Thanks for the help! Much appreciated!


On Sat, Oct 19, 2013 at 12:58 AM, Jack Krupansky wrote:

> By all means please do file a support request with DataStax, either as an
> official support ticket or as a question on StackOverflow.
>
> But, I do think the previous answer of avoiding the use of a Map object in
> your document is likely to be the solution.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Brent Ryan
> Sent: Friday, October 18, 2013 10:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLRJ replace document
>
>
> So I think the issue might be related to the tech stack we're using which
> is SOLR within DataStax enterprise which doesn't support atomic updates.
> But I think it must have some sort of bug around this because it doesn't
> appear to work correctly for this use case when using solrj ...  Anyways,
> I've contacted support so lets see what they say.
>
>
> On Fri, Oct 18, 2013 at 5:51 PM, Shawn Heisey  wrote:
>
>  On 10/18/2013 3:36 PM, Brent Ryan wrote:
>>
>>  My schema is pretty simple and has a string field called solr_id as my
>>> unique key.  Once I get back to my computer I'll send some more details.
>>>
>>>
>> If you are trying to use a Map object as the value of a field, that is
>> probably why it is interpreting your add request as an atomic update.  If
>> this is the case, and you're doing it because you have a multivalued
>> field,
>> you can use a List object rather than a Map.
>>
>> If this doesn't sound like what's going on, can you share your code, or a
>> simplification of the SolrJ parts of it?
>>
>> Thanks,
>> Shawn
>>
>>
>>
>


pivot range faceting

2013-10-19 Thread Toby Lazar
Is it possible to get pivot info on a range-faceted query?  For example, if
I want to query the number of orders placed in January, February, etc., I
know I can use a simple range search.  If I want to get the number of
orders by category, I can do that easily by faceting on category.  I'm
wondering if I can get the number of all orders by month, and also broken
down by category.  Is that possible in a single query?

Thanks,

Toby