RE: Jetty servlet container in production environment

2015-07-15 Thread Adrian Liew
Thanks Upaya for sharing. I am looking to deploy Solr in a Windows 64 Bit 
Server environment. Some people do say Jetty works optimally in a Linux based 
environment. Having said that, I believe Solr will have improved it's stability 
within a Windows environment.

I agree with you on the advice. Shall just leave it as Jetty servlet. Thanks.

Best regards,

Adrian Liew |  Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030


-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Wednesday, July 15, 2015 2:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Jetty servlet container in production environment

Use Jetty. Or rather, just use bin/solr or bin\solr.cmd to interact with Solr.

In the past, Solr shipped as a "war" which could be deployed in any servlet 
container. Since 5.0, it is to be considered a self-contained application, that 
just happens to use Jetty underneath.

If you used something other than the inbuilt Jetty, you might end up with 
issues later on down the line when developers decide to make an optimisation or 
improvement that isn't compatible with the Servlet spec.

Upayavira

On Wed, Jul 15, 2015, at 07:43 AM, Adrian Liew wrote:
> Hi all,
> 
> Will like to ask your opinion if it is recommended to use the default 
> Jetty servlet container as a service to run Solr on a multi-server 
> production environment. I hear some places that recommend using Tomcat 
> as a servlet container. Is anyone able to share some thoughts about this?
> Limitations, advantages or disadvantages of using Jetty servlet in a 
> production environment
> 
> Regards,
> Adrian


Re: Jetty servlet container in production environment

2015-07-15 Thread Vincenzo D'Amore
Hi Adrian,

since version 5.0 Solr is shipped with Jetty. But I think it could be a
more interesting question is understand if default Jetty configuration
could be used "as is" in a production environment.


On Wed, Jul 15, 2015 at 8:43 AM, Adrian Liew 
wrote:

> Hi all,
>
> Will like to ask your opinion if it is recommended to use the default
> Jetty servlet container as a service to run Solr on a multi-server
> production environment. I hear some places that recommend using Tomcat as a
> servlet container. Is anyone able to share some thoughts about this?
> Limitations, advantages or disadvantages of using Jetty servlet in a
> production environment
>
> Regards,
> Adrian
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: Solr 5 options

2015-07-15 Thread Charlie Hull

On 14/07/2015 17:04, Erick Erickson wrote:

Well, Shawn I for one am in your corner.

Schemaless is great for getting thing running, but it's
not an AI. And it can get into trouble guessing. Say
it guesses a field should be an int because the first one
it sees is 123 but it's really a part number. Then when
a part number 123-456 comes through the doc will fail
to index  with "illegal number format".


This is my issue with 'schemaless' - it makes far too many assumptions 
about data types. Elasticsearch suffers from this as well: 
https://orchestrate.io/blog/2014/09/30/improved-elasticsearch-indexing/


Charlie


bq: Also, does the fact that I intend to use a data import handler to run feeds
from large numbers of oracle schemas have any impact on the above?

Yes. You have to map the DB schemas into Solr
somehow. Schemaless will try to guess, but as above it doesn't
have any real understanding of the data. Dynamic fields are certainly
a viable option, you'll be assigning columns to fields for each schema
variant though.

Best,
Erick

On Tue, Jul 14, 2015 at 6:15 AM, Shawn Heisey  wrote:

On 7/14/2015 4:44 AM, spleenboy wrote:

Many Thanks to those who helped me on my last post: I'm almost there.
So here is the doc I need to index:
{
   "doc":
   {
 "id":"2",
 "cus_name_s":"Paul Brown",
 "cus_email_t":["paul.br...@here.net"],
 "com_id_i":201,
 "com_name_s":"Berenices",
 "url_s":"domain.net/integration/"}}

I only need to be able to search on email.
My plan was to to use classic, as I was going to run this on a single node.
I am happy to use dynamic fields to define the structure of the doc, so I
don't think I need a schema.xml: I think this is classic/schemaless (?)
I am still a little confused between schemaless and managed schema.
Do I implement this using the right combination of parameters in my bin/solr
create_core command.
Also, does the fact that I intend to use a data import handler to run feeds
from large numbers of oracle schemas have any impact on the above?


The "schemaless" mode isn't really schemaless ... it just means that
Solr will automatically guess what fieldType to use for a field that has
never been seen before, and then modify the schema to include that field
with the guessed fieldType.  It's sort of like the managed schema,
except it's managed automatically instead of by the admin.

I personally would not want Solr to guess on the schema, I would want to
explicitly define Solr's behavior ... but not everyone does things the
same way that I do.

Thanks,
Shawn




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


To the experts: howto force opening a new searcher?

2015-07-15 Thread Bernd Fehling
I'm doing some testing on long running huge indexes.
Therefore I need a "clean" state after some days running.
My idea was to open a new searcher with commit command:

INFO  - org.apache.solr.update.DirectUpdateHandler2;
start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. 
Skipping IW.commit.
INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed - not 
re-opening: org.apache.solr.search.SolrIndexSearcher
INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

But the result is that the DirectUpdateHandler2  is skipping the commit.

Any other ideas how to force opening a new searcher without optimizing or 
loading anything?

Best regards
Bernd


Re: dataDir config

2015-07-15 Thread Don Bosco Durai
I also feel having dataDir configurable helps deployments in enterprise
easy. Generally software are installed in root disk e.g. /opt/solr and if
the data folder is within it, then it will require root drive to be
expanded as Solr index increases or need to be optimized, etc. Having data
folder configurable gives an easy option to store the indexes on another
drive and manage it independently of the OS drive. SLAs can be also more
predictable with dedicated hard drive or SSD...

Thanks

Bosco




On 7/15/15, 1:35 AM, "Shawn Heisey"  wrote:

>On 7/14/2015 4:05 PM, Steven White wrote:
>> Thank you Erick and Shawn.
>>
>> I needed to separate the data from the Solr application so that Solr
>>can be
>> uninstalled / reinstalled / upgraded without impact on the data or the
>> configuration of the core.  I did some more research and found it here:
>> 
>>https://cwiki.apache.org/confluence/display/solr/Solr+Start+Script+Refere
>>nce
>>  The "-s" parameter will let me tell Solr where my index should be
>> created.  This works great and I will use it unless if someone tells me
>>"no
>> way, here is why".
>
>Placing the solr home (which is what the -s option controls) outside of
>your Solr installation directory is a good idea, for exactly the reasons
>you state.  That's a step that I recommend for almost all users.
>Changing the dataDir is a more advanced config option, one that has a
>few pitfalls for inexperienced users, so it's not a good idea to mess
>with it unless you completely understand it.
>
>I think you're probably in good shape.  Good luck with your setup!
>
>Thanks,
>Shawn
>




Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Andrea Gazzarini
What do you mean with "clean" state? A searcher is a view over a given
index (let's say) "state"...if the state didn't change why do you want
another (identical) view?

On 15 Jul 2015 02:30, "Bernd Fehling" 
wrote:
>
> I'm doing some testing on long running huge indexes.
> Therefore I need a "clean" state after some days running.
> My idea was to open a new searcher with commit command:
>
> INFO  - org.apache.solr.update.DirectUpdateHandler2;
> start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
changes. Skipping IW.commit.
> INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed
- not re-opening: org.apache.solr.search.SolrIndexSearcher
> INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>
> But the result is that the DirectUpdateHandler2  is skipping the commit.
>
> Any other ideas how to force opening a new searcher without optimizing or
loading anything?
>
> Best regards
> Bernd


Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Andrea Gazzarini
On top of that sorry, I didn't answer to your question because I don't know
if that is possible

Best,
Andrea
On 15 Jul 2015 02:51, "Andrea Gazzarini"  wrote:

> What do you mean with "clean" state? A searcher is a view over a given
> index (let's say) "state"...if the state didn't change why do you want
> another (identical) view?
>
> On 15 Jul 2015 02:30, "Bernd Fehling" 
> wrote:
> >
> > I'm doing some testing on long running huge indexes.
> > Therefore I need a "clean" state after some days running.
> > My idea was to open a new searcher with commit command:
> >
> > INFO  - org.apache.solr.update.DirectUpdateHandler2;
> > start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> > INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
> changes. Skipping IW.commit.
> > INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not changed
> - not re-opening: org.apache.solr.search.SolrIndexSearcher
> > INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> >
> > But the result is that the DirectUpdateHandler2  is skipping the commit.
> >
> > Any other ideas how to force opening a new searcher without optimizing
> or loading anything?
> >
> > Best regards
> > Bernd
>


Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Martínez López , Alfonso
Hi,

i'm using Solr 4.10.3, and i'm trying update a doc field using atomic update 
(http://wiki.apache.org/solr/Atomic_Updates).

My schema.xml is like this:












I add a document with this command:



curl http://:/solr/default/update?commit=true -H 
"Content-Type: text/xml" --data-binary '1pacofriend of mine'



And later I update the field 'name' with this command:



curl http://:/solr/default/update?commit=true -H 
"Content-Type: text/xml" --data-binary '1paquico'



As I do so the doc i retrive from Solr is:




 
  1
  paquico
  friend of mine
  
   friend of mine
   friend of mine
  
  1506750859550130176
  1.0
 




So I get a non multivalued field (dinamic_desc) with multiple values :(






Este correo electrónico y, en su caso, cualquier fichero anexo al mismo, 
contiene información de carácter confidencial exclusivamente dirigida a su 
destinatario o destinatarios. Si no es vd. el destinatario indicado, queda 
notificado que la lectura, utilización, divulgación y/o copia sin autorización 
está prohibida en virtud de la legislación vigente. En el caso de haber 
recibido este correo electrónico por error, se ruega notificar inmediatamente 
esta circunstancia mediante reenvío a la dirección electrónica del remitente.
Evite imprimir este mensaje si no es estrictamente necesario.

This email and any file attached to it (when applicable) contain(s) 
confidential information that is exclusively addressed to its recipient(s). If 
you are not the indicated recipient, you are informed that reading, using, 
disseminating and/or copying it without authorisation is forbidden in 
accordance with the legislation in effect. If you have received this email by 
mistake, please immediately notify the sender of the situation by resending it 
to their email address.
Avoid printing this message if it is not absolutely necessary.


Re: Get content in response from ExtractingRequestHandler

2015-07-15 Thread trung.ht
HI Erick,

Thanks for pointing out the main problem of my system.

Trung.

On Fri, Jul 10, 2015 at 11:47 PM, Erick Erickson 
wrote:

> In a word, no. If you don't store the data it is completely gone
> with no chance of retrieval.
>
> There are a couple of things to think about though
>
> 1> The original doc must exist somewhere. Store some kind
> of URI in Solr that you can use to retrieve the original doc
> on demand.
>
> 2> Go ahead and store the data. Disk space is cheap, and the
> stored data goes in special files (*.fdt) that have very little impact
> on either search speed or memory requirements. And the memory
> requirements can be controlled somewhat with the documentCache
> assuming you don't have gigantic docs.
>
> This kind of sidesteps the question of re-extracting the document
> on Solr on demand and returning the text (which I think is what
> you're asking). I would  definitely avoid doing this even if I knew how.
> The problem here is that you're making Solr do quite intensive
> work (Tika extraction) while at the same time serving queries
> what has negative performance implications. It it turns out that you
> have to do this, consider running Tika in the app layer and
> doing the extraction on demand there. It's not very hard, see:
> https://lucidworks.com/blog/indexing-with-solrj/
> and ignore the db bits.
>
> Best,
> Erick
>
> On Thu, Jul 9, 2015 at 7:53 PM, trung.ht  wrote:
> > Hi everyone,
> >
> > I use solr to index and search in office file (docx, pptx, ...). To
> reduce
> > the size of solr index, I do not store the content of the file on solr,
> > however now my customer want to preview the content of the file.
> >
> > I have read the document of ExtractingRequestHandler, but it seems that
> to
> > return content in the response from solr, the only option is to
> > set extractOnly=true, but in that case, solr would not index the file.
> >
> > My question is: is there anyway for solr to extract the content from
> tika,
> > index the content (without storing it) and then give me the content in
> the
> > response?
> >
> > Thanks in advanced and sorry because my explanation is confusing.
> >
> > Trung.
>


Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Alessandro Benedetti
Triggering a commit , implies the new Searcher to be opened in a soft
commit scenario.
With an hard commit, you can decide if opening or not the new searcher.

But this is probably a X/Y problem.

Can you describe better your real problem and not the way you were trying
to solve it ?

Cheers

2015-07-15 9:57 GMT+01:00 Andrea Gazzarini :

> On top of that sorry, I didn't answer to your question because I don't know
> if that is possible
>
> Best,
> Andrea
> On 15 Jul 2015 02:51, "Andrea Gazzarini"  wrote:
>
> > What do you mean with "clean" state? A searcher is a view over a given
> > index (let's say) "state"...if the state didn't change why do you want
> > another (identical) view?
> >
> > On 15 Jul 2015 02:30, "Bernd Fehling" 
> > wrote:
> > >
> > > I'm doing some testing on long running huge indexes.
> > > Therefore I need a "clean" state after some days running.
> > > My idea was to open a new searcher with commit command:
> > >
> > > INFO  - org.apache.solr.update.DirectUpdateHandler2;
> > > start
> >
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> > > INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
> > changes. Skipping IW.commit.
> > > INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not
> changed
> > - not re-opening: org.apache.solr.search.SolrIndexSearcher
> > > INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> > >
> > > But the result is that the DirectUpdateHandler2  is skipping the
> commit.
> > >
> > > Any other ideas how to force opening a new searcher without optimizing
> > or loading anything?
> > >
> > > Best regards
> > > Bernd
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Alessandro Benedetti
This is kinda weird and looks a lot like a bug.
Let me try to reproduce it locally!
I let you know soon !

Cheers

2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso :

> Hi,
>
> i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
> update (http://wiki.apache.org/solr/Atomic_Updates).
>
> My schema.xml is like this:
>
> 
>  required="true" />
> 
> 
>  
>  multiValued="false" />
>  multiValued="false" />
> 
> 
>
>
> I add a document with this command:
>
>
>
> curl http://:/solr/default/update?commit=true -H
> "Content-Type: text/xml" --data-binary ' name="id">1paco >friend of mine'
>
>
>
> And later I update the field 'name' with this command:
>
>
>
> curl http://:/solr/default/update?commit=true -H
> "Content-Type: text/xml" --data-binary ' name="id">1 update="set">paquico'
>
>
>
> As I do so the doc i retrive from Solr is:
>
>
>
> 
>  
>   1
>   paquico
>   friend of mine
>   
>friend of mine
>friend of mine
>   
>   1506750859550130176
>   1.0
>  
> 
>
>
>
> So I get a non multivalued field (dinamic_desc) with multiple values :(
>
>
>
>
>
> 
> Este correo electrónico y, en su caso, cualquier fichero anexo al mismo,
> contiene información de carácter confidencial exclusivamente dirigida a su
> destinatario o destinatarios. Si no es vd. el destinatario indicado, queda
> notificado que la lectura, utilización, divulgación y/o copia sin
> autorización está prohibida en virtud de la legislación vigente. En el caso
> de haber recibido este correo electrónico por error, se ruega notificar
> inmediatamente esta circunstancia mediante reenvío a la dirección
> electrónica del remitente.
> Evite imprimir este mensaje si no es estrictamente necesario.
>
> This email and any file attached to it (when applicable) contain(s)
> confidential information that is exclusively addressed to its recipient(s).
> If you are not the indicated recipient, you are informed that reading,
> using, disseminating and/or copying it without authorisation is forbidden
> in accordance with the legislation in effect. If you have received this
> email by mistake, please immediately notify the sender of the situation by
> resending it to their email address.
> Avoid printing this message if it is not absolutely necessary.
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Alessandro Benedetti
Just tried, on Solr 5.1 and I get the proper behaviour.

Actually where is the value for the dinamic_desc coming from ?

I can not see it in the updates and actually it is not in my index.
Are you sure you have not forgotten any detail ?

Cheers

2015-07-15 11:48 GMT+01:00 Alessandro Benedetti 
:

> This is kinda weird and looks a lot like a bug.
> Let me try to reproduce it locally!
> I let you know soon !
>
> Cheers
>
> 2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso :
>
>> Hi,
>>
>> i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
>> update (http://wiki.apache.org/solr/Atomic_Updates).
>>
>> My schema.xml is like this:
>>
>> 
>> > required="true" />
>> 
>> 
>> > 
>> > multiValued="false" />
>> > multiValued="false" />
>> 
>> 
>>
>>
>> I add a document with this command:
>>
>>
>>
>> curl http://:/solr/default/update?commit=true -H
>> "Content-Type: text/xml" --data-binary '> name="id">1paco> >friend of mine'
>>
>>
>>
>> And later I update the field 'name' with this command:
>>
>>
>>
>> curl http://:/solr/default/update?commit=true -H
>> "Content-Type: text/xml" --data-binary '> name="id">1> update="set">paquico'
>>
>>
>>
>> As I do so the doc i retrive from Solr is:
>>
>>
>>
>> 
>>  
>>   1
>>   paquico
>>   friend of mine
>>   
>>friend of mine
>>friend of mine
>>   
>>   1506750859550130176
>>   1.0
>>  
>> 
>>
>>
>>
>> So I get a non multivalued field (dinamic_desc) with multiple values :(
>>
>>
>>
>>
>>
>> 
>> Este correo electrónico y, en su caso, cualquier fichero anexo al mismo,
>> contiene información de carácter confidencial exclusivamente dirigida a su
>> destinatario o destinatarios. Si no es vd. el destinatario indicado, queda
>> notificado que la lectura, utilización, divulgación y/o copia sin
>> autorización está prohibida en virtud de la legislación vigente. En el caso
>> de haber recibido este correo electrónico por error, se ruega notificar
>> inmediatamente esta circunstancia mediante reenvío a la dirección
>> electrónica del remitente.
>> Evite imprimir este mensaje si no es estrictamente necesario.
>>
>> This email and any file attached to it (when applicable) contain(s)
>> confidential information that is exclusively addressed to its recipient(s).
>> If you are not the indicated recipient, you are informed that reading,
>> using, disseminating and/or copying it without authorisation is forbidden
>> in accordance with the legislation in effect. If you have received this
>> email by mistake, please immediately notify the sender of the situation by
>> resending it to their email address.
>> Avoid printing this message if it is not absolutely necessary.
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Martínez López , Alfonso
Hi, thanks for your help!

Value for 'dinamic_desc' field come from 'src_desc' field. I copy the value 
with:



Seems like when I update a different field (field 'name') via atomic update, 
the copyField directive copies the value again from 'src_desc' to 'desc_field', 
instead of updating the value, like if 'desc_field' or 'desc_*' where 
multivalued.

Cheers.

From: Alessandro Benedetti [benedetti.ale...@gmail.com]
Sent: Wednesday, July 15, 2015 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Does update field feature work in a schema with dynamic fields?

Just tried, on Solr 5.1 and I get the proper behaviour.

Actually where is the value for the dinamic_desc coming from ?

I can not see it in the updates and actually it is not in my index.
Are you sure you have not forgotten any detail ?

Cheers

2015-07-15 11:48 GMT+01:00 Alessandro Benedetti 
:

> This is kinda weird and looks a lot like a bug.
> Let me try to reproduce it locally!
> I let you know soon !
>
> Cheers
>
> 2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso :
>
>> Hi,
>>
>> i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
>> update (http://wiki.apache.org/solr/Atomic_Updates).
>>
>> My schema.xml is like this:
>>
>> 
>> > required="true" />
>> 
>> 
>> > 
>> > multiValued="false" />
>> > multiValued="false" />
>> 
>> 
>>
>>
>> I add a document with this command:
>>
>>
>>
>> curl http://:/solr/default/update?commit=true -H
>> "Content-Type: text/xml" --data-binary '> name="id">1paco> >friend of mine'
>>
>>
>>
>> And later I update the field 'name' with this command:
>>
>>
>>
>> curl http://:/solr/default/update?commit=true -H
>> "Content-Type: text/xml" --data-binary '> name="id">1> update="set">paquico'
>>
>>
>>
>> As I do so the doc i retrive from Solr is:
>>
>>
>>
>> 
>>  
>>   1
>>   paquico
>>   friend of mine
>>   
>>friend of mine
>>friend of mine
>>   
>>   1506750859550130176
>>   1.0
>>  
>> 
>>
>>
>>
>> So I get a non multivalued field (dinamic_desc) with multiple values :(
>>
>>
>>
>>
>>
>> 
>> Este correo electrónico y, en su caso, cualquier fichero anexo al mismo,
>> contiene información de carácter confidencial exclusivamente dirigida a su
>> destinatario o destinatarios. Si no es vd. el destinatario indicado, queda
>> notificado que la lectura, utilización, divulgación y/o copia sin
>> autorización está prohibida en virtud de la legislación vigente. En el caso
>> de haber recibido este correo electrónico por error, se ruega notificar
>> inmediatamente esta circunstancia mediante reenvío a la dirección
>> electrónica del remitente.
>> Evite imprimir este mensaje si no es estrictamente necesario.
>>
>> This email and any file attached to it (when applicable) contain(s)
>> confidential information that is exclusively addressed to its recipient(s).
>> If you are not the indicated recipient, you are informed that reading,
>> using, disseminating and/or copying it without authorisation is forbidden
>> in accordance with the legislation in effect. If you have received this
>> email by mistake, please immediately notify the sender of the situation by
>> resending it to their email address.
>> Avoid printing this message if it is not absolutely necessary.
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Este correo electrónico y, en su caso, cualquier fichero anexo al mismo, 
contiene información de carácter confidencial exclusivamente dirigida a su 
destinatario o destinatarios. Si no es vd. el destinatario indicado, queda 
notificado que la lectura, utilización, divulgación y/o copia sin autorización 
está prohibida en virtud de la legislación vigente. En el caso de haber 
recibido este correo electrónico por error, se ruega notificar inmediatamente 
esta circunstancia mediante reenvío a la dirección electrónica del remitente.
Evite imprimir este mensaje si no es estrictamente necesario.

This email and any file attached to it (when applicable) contain(s) 
confidential information that is exclusively addressed to its recipient(s). If 
you are not the indicated recipient, you are informed that reading, using, 
disseminating and/or copying it without authorisation is forbidden in 
accordance with the legislation in effect. If you have received this email by 
mistake, please immediately notify the sender of the s

Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Alessandro Benedetti
Ohhh!
I didn't read it completely, so i missed the copy field.
Ok now.
This is the explanation :
Copy fields are added at indexing time, when the document arrived to the
RunUpdateRequest processor.
If i remember well at this point , before we start the indexing the content
of source field is added to the copy field as a value.

The first time you indexed your document , the first copy was added.

What you didn't know is the fact that actually atomic update works in this
way :

1) I get the current Doc from the index ( the stored fields),
2) I do the update, and
3) then I send the document to the indexing processing chain *again*.
So the value is copied a second time.

This will produce the duplicate value.
I can go in deep, but I think this is the cause.

Cheers


2015-07-15 12:10 GMT+01:00 Martínez López, Alfonso :

> Hi, thanks for your help!
>
> Value for 'dinamic_desc' field come from 'src_desc' field. I copy the
> value with:
>
> 
>
> Seems like when I update a different field (field 'name') via atomic
> update, the copyField directive copies the value again from 'src_desc' to
> 'desc_field', instead of updating the value, like if 'desc_field' or
> 'desc_*' where multivalued.
>
> Cheers.
> 
> From: Alessandro Benedetti [benedetti.ale...@gmail.com]
> Sent: Wednesday, July 15, 2015 12:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Does update field feature work in a schema with dynamic
> fields?
>
> Just tried, on Solr 5.1 and I get the proper behaviour.
>
> Actually where is the value for the dinamic_desc coming from ?
>
> I can not see it in the updates and actually it is not in my index.
> Are you sure you have not forgotten any detail ?
>
> Cheers
>
> 2015-07-15 11:48 GMT+01:00 Alessandro Benedetti <
> benedetti.ale...@gmail.com>
> :
>
> > This is kinda weird and looks a lot like a bug.
> > Let me try to reproduce it locally!
> > I let you know soon !
> >
> > Cheers
> >
> > 2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso :
> >
> >> Hi,
> >>
> >> i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
> >> update (http://wiki.apache.org/solr/Atomic_Updates).
> >>
> >> My schema.xml is like this:
> >>
> >> 
> >>  >> required="true" />
> >> 
> >> 
> >>  >> 
> >>  >> multiValued="false" />
> >>  stored="true"
> >> multiValued="false" />
> >> 
> >> 
> >>
> >>
> >> I add a document with this command:
> >>
> >>
> >>
> >> curl http://:/solr/default/update?commit=true -H
> >> "Content-Type: text/xml" --data-binary ' >> name="id">1paco name="src_desc"
> >> >friend of mine'
> >>
> >>
> >>
> >> And later I update the field 'name' with this command:
> >>
> >>
> >>
> >> curl http://:/solr/default/update?commit=true -H
> >> "Content-Type: text/xml" --data-binary ' >> name="id">1 >> update="set">paquico'
> >>
> >>
> >>
> >> As I do so the doc i retrive from Solr is:
> >>
> >>
> >>
> >> 
> >>  
> >>   1
> >>   paquico
> >>   friend of mine
> >>   
> >>friend of mine
> >>friend of mine
> >>   
> >>   1506750859550130176
> >>   1.0
> >>  
> >> 
> >>
> >>
> >>
> >> So I get a non multivalued field (dinamic_desc) with multiple values :(
> >>
> >>
> >>
> >>
> >>
> >> 
> >> Este correo electrónico y, en su caso, cualquier fichero anexo al mismo,
> >> contiene información de carácter confidencial exclusivamente dirigida a
> su
> >> destinatario o destinatarios. Si no es vd. el destinatario indicado,
> queda
> >> notificado que la lectura, utilización, divulgación y/o copia sin
> >> autorización está prohibida en virtud de la legislación vigente. En el
> caso
> >> de haber recibido este correo electrónico por error, se ruega notificar
> >> inmediatamente esta circunstancia mediante reenvío a la dirección
> >> electrónica del remitente.
> >> Evite imprimir este mensaje si no es estrictamente necesario.
> >>
> >> This email and any file attached to it (when applicable) contain(s)
> >> confidential information that is exclusively addressed to its
> recipient(s).
> >> If you are not the indicated recipient, you are informed that reading,
> >> using, disseminating and/or copying it without authorisation is
> forbidden
> >> in accordance with the legislation in effect. If you have received this
> >> email by mistake, please immediately notify the sender of the situation
> by
> >> resending it to their email address.
> >> Avoid printing this message if it is not absolutely necessary.
> >>
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk

Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Bernd Fehling
What ever you name a problem, I just wanted to open a new searcher
after several days of heavy load/searching on one of my slaves
to do some testing with empty field-/document-/filter-caches.

Sure, I could first add, then delete a document and do a commit.
Or may be only do a fake update of a document with a commit (if this works).
But I don't want any changes on the index, just start a new searcher
and close the old one. This is my problem. I don't see any X/Y here.

Regards
Bernd


Am 15.07.2015 um 12:46 schrieb Alessandro Benedetti:
> Triggering a commit , implies the new Searcher to be opened in a soft
> commit scenario.
> With an hard commit, you can decide if opening or not the new searcher.
> 
> But this is probably a X/Y problem.
> 
> Can you describe better your real problem and not the way you were trying
> to solve it ?
> 
> Cheers
> 
> 2015-07-15 9:57 GMT+01:00 Andrea Gazzarini :
> 
>> On top of that sorry, I didn't answer to your question because I don't know
>> if that is possible
>>
>> Best,
>> Andrea
>> On 15 Jul 2015 02:51, "Andrea Gazzarini"  wrote:
>>
>>> What do you mean with "clean" state? A searcher is a view over a given
>>> index (let's say) "state"...if the state didn't change why do you want
>>> another (identical) view?
>>>
>>> On 15 Jul 2015 02:30, "Bernd Fehling" 
>>> wrote:

 I'm doing some testing on long running huge indexes.
 Therefore I need a "clean" state after some days running.
 My idea was to open a new searcher with commit command:

 INFO  - org.apache.solr.update.DirectUpdateHandler2;
 start
>>>
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
>>> changes. Skipping IW.commit.
 INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not
>> changed
>>> - not re-opening: org.apache.solr.search.SolrIndexSearcher
 INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

 But the result is that the DirectUpdateHandler2  is skipping the
>> commit.

 Any other ideas how to force opening a new searcher without optimizing
>>> or loading anything?

 Best regards
 Bernd
>>>
>>


RE: To the experts: howto force opening a new searcher?

2015-07-15 Thread Markus Jelsma
Well yes, a simple empty commit won't do the trick, the searcher is not going 
to reload on recent versions. Reloading the core will. 
 
-Original message-
> From:Bernd Fehling 
> Sent: Wednesday 15th July 2015 13:42
> To: solr-user@lucene.apache.org
> Subject: Re: To the experts: howto force opening a new searcher?
> 
> What ever you name a problem, I just wanted to open a new searcher
> after several days of heavy load/searching on one of my slaves
> to do some testing with empty field-/document-/filter-caches.
> 
> Sure, I could first add, then delete a document and do a commit.
> Or may be only do a fake update of a document with a commit (if this works).
> But I don't want any changes on the index, just start a new searcher
> and close the old one. This is my problem. I don't see any X/Y here.
> 
> Regards
> Bernd
> 
> 
> Am 15.07.2015 um 12:46 schrieb Alessandro Benedetti:
> > Triggering a commit , implies the new Searcher to be opened in a soft
> > commit scenario.
> > With an hard commit, you can decide if opening or not the new searcher.
> > 
> > But this is probably a X/Y problem.
> > 
> > Can you describe better your real problem and not the way you were trying
> > to solve it ?
> > 
> > Cheers
> > 
> > 2015-07-15 9:57 GMT+01:00 Andrea Gazzarini :
> > 
> >> On top of that sorry, I didn't answer to your question because I don't know
> >> if that is possible
> >>
> >> Best,
> >> Andrea
> >> On 15 Jul 2015 02:51, "Andrea Gazzarini"  wrote:
> >>
> >>> What do you mean with "clean" state? A searcher is a view over a given
> >>> index (let's say) "state"...if the state didn't change why do you want
> >>> another (identical) view?
> >>>
> >>> On 15 Jul 2015 02:30, "Bernd Fehling" 
> >>> wrote:
> 
>  I'm doing some testing on long running huge indexes.
>  Therefore I need a "clean" state after some days running.
>  My idea was to open a new searcher with commit command:
> 
>  INFO  - org.apache.solr.update.DirectUpdateHandler2;
>  start
> >>>
> >> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>  INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
> >>> changes. Skipping IW.commit.
>  INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not
> >> changed
> >>> - not re-opening: org.apache.solr.search.SolrIndexSearcher
>  INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
> 
>  But the result is that the DirectUpdateHandler2  is skipping the
> >> commit.
> 
>  Any other ideas how to force opening a new searcher without optimizing
> >>> or loading anything?
> 
>  Best regards
>  Bernd
> >>>
> >>
> 


Re: Why I get a hit on %, &, but not on !, @, #, $, ^, *

2015-07-15 Thread Steven White
Thank you all for helping on this topic.  I'm going to play with this and
might come back with more questions.

Steve

On Tue, Jul 14, 2015 at 1:57 PM, Erick Erickson 
wrote:

> Steve:
>
> Simplest solution:
> remove WordDelimiterFilterFactory.
> Use something like PatternReplaceCharFilterFactory or
> PatternReplaceFilterFactory to selectively remove the characters you
> don't care about and leave in the ones you do care about.
>
> You might also want to do this kind of thing in a copyField and search
> one or the other selectively as desired, or perhaps boost or...
>
> NOTE: one side effect of WDFF is that punctuation is removed, so you
> have to consider what you want to do with periods at the end of a
> sentence, apostrophes and the like.
>
> Best,
> Erick
>
> On Tue, Jul 14, 2015 at 10:08 AM, Steven White 
> wrote:
> > Thanks Jack.
> >
> > Can you provide me with a concrete example of how to:
> >
> > 1) Be able to search and find "$10" (without quotes).  This will get me
> > started on how to add all other variations for !, @, etc. and be able to
> > search on them.  In this case, a search for "$10" will give me a hit on
> > text of "$10", but not "10" and a search on "10" will give me a hit on
> "10"
> > but not "$10".
> >
> > 2) Prevent a hit on "10%" (without quotes).  This will get me started on
> > howto prevent a hit on %, &, etc.  In this case, a search for "%" or
> "10%"
> > will give me 0 hits, but a search on "10" will give me a hit on "10" or
> > "10%".
> >
> > Do you see where I'm going with this?  Are both of those configurations
> > possible?  This will let me customize Solr to meet customer need.
> >
> > Thanks.
> >
> > Steve
> >
> > On Mon, Jul 13, 2015 at 11:12 PM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> >> Oops... that's the "types" attribute.
> >>
> >> -- Jack Krupansky
> >>
> >> On Mon, Jul 13, 2015 at 11:11 PM, Jack Krupansky <
> jack.krupan...@gmail.com
> >> >
> >> wrote:
> >>
> >> > The word delimiter filter is remmoving special characters. You can
> add a
> >> > file containing a list of the special characters that you wish to
> treat
> >> as
> >> > alpha, using the "type" parameter.
> >> >
> >> > -- Jack Krupansky
> >> >
> >> > On Mon, Jul 13, 2015 at 6:43 PM, Steven White 
> >> > wrote:
> >> >
> >> >> Hi Everyone,
> >> >>
> >> >> I think the subject line said it all.  Here is the schema I'm using:
> >> >>
> >> >>  >> >> positionIncrementGap="100"
> >> >> autoGeneratePhraseQueries="true">
> >> >>   
> >> >> 
> >> >>  >> >> words="lang/stopwords_en.txt"/>
> >> >>  >> >> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >> >> catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1"
> >> >> stemEnglishPossessive="1" preserveOriginal="1"/>
> >> >> 
> >> >>  >> >> protected="protwords.txt"/>
> >> >> 
> >> >> 
> >> >>   
> >> >> 
> >> >>
> >> >> I'm guessing this is due to how solr.WhitespaceTokenizerFactory works
> >> and
> >> >> those that it is not indexing are removed because they are considered
> >> >> "white-spaces"?  If so, how can I include %, &, etc. into this
> >> >> none-indexed
> >> >> list?  I would rather see all these not indexed vs some are and some
> are
> >> >> not causing confusion to my users.
> >> >>
> >> >> Thanks
> >> >>
> >> >> Steve
> >> >>
> >> >
> >> >
> >>
>


Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Steven White
Hi Everyone,

Out-of-the box, Solr (Lucene?) is set to use OR as the default Boolean
operator.  Can someone tell me the advantages / disadvantages of using OR
or AND as the default?

I'm leaning toward AND as the default because the more words a user types,
the narrower the result set should be.

Thanks

Steve


Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Alessandro Benedetti
Going through the code in the RunUpdateRequestProcessor we call at one
point :

…

Document luceneDocument = cmd.getLuceneDocument();
// SolrCore.verbose("updateDocument",updateTerm,luceneDocument,writer);
writer.updateDocument(updateTerm, luceneDocument);

..


Inside that method we call :

public Document getLuceneDocument() {
  return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema());
}


Then exploring the toDocument we find what we need :

org/apache/solr/update/DocumentBuilder.java:114

And looking into there we realise it is a bug :

…

if (!destinationField.multiValued() && destHasValues) {
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
  "ERROR: "+getID(doc, schema)+"multiple values encountered
for non multiValued copy field " +
  destinationField.getName() + ": " + v);
}

...


Because the check is actually checking in already used fields, not
expecting  the copy field to be already valorised.

Hence the exception coming.

I will create an issue for that and provide a patch as soon as I have
time ( or anyone can provide the patch).


Cheers






2015-07-15 12:27 GMT+01:00 Alessandro Benedetti 
:

> Ohhh!
> I didn't read it completely, so i missed the copy field.
> Ok now.
> This is the explanation :
> Copy fields are added at indexing time, when the document arrived to the
> RunUpdateRequest processor.
> If i remember well at this point , before we start the indexing the
> content of source field is added to the copy field as a value.
>
> The first time you indexed your document , the first copy was added.
>
> What you didn't know is the fact that actually atomic update works in this
> way :
>
> 1) I get the current Doc from the index ( the stored fields),
> 2) I do the update, and
> 3) then I send the document to the indexing processing chain *again*.
> So the value is copied a second time.
>
> This will produce the duplicate value.
> I can go in deep, but I think this is the cause.
>
> Cheers
>
>
> 2015-07-15 12:10 GMT+01:00 Martínez López, Alfonso :
>
>> Hi, thanks for your help!
>>
>> Value for 'dinamic_desc' field come from 'src_desc' field. I copy the
>> value with:
>>
>> 
>>
>> Seems like when I update a different field (field 'name') via atomic
>> update, the copyField directive copies the value again from 'src_desc' to
>> 'desc_field', instead of updating the value, like if 'desc_field' or
>> 'desc_*' where multivalued.
>>
>> Cheers.
>> 
>> From: Alessandro Benedetti [benedetti.ale...@gmail.com]
>> Sent: Wednesday, July 15, 2015 12:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Does update field feature work in a schema with dynamic
>> fields?
>>
>> Just tried, on Solr 5.1 and I get the proper behaviour.
>>
>> Actually where is the value for the dinamic_desc coming from ?
>>
>> I can not see it in the updates and actually it is not in my index.
>> Are you sure you have not forgotten any detail ?
>>
>> Cheers
>>
>> 2015-07-15 11:48 GMT+01:00 Alessandro Benedetti <
>> benedetti.ale...@gmail.com>
>> :
>>
>> > This is kinda weird and looks a lot like a bug.
>> > Let me try to reproduce it locally!
>> > I let you know soon !
>> >
>> > Cheers
>> >
>> > 2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso :
>> >
>> >> Hi,
>> >>
>> >> i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
>> >> update (http://wiki.apache.org/solr/Atomic_Updates).
>> >>
>> >> My schema.xml is like this:
>> >>
>> >> 
>> >> > >> required="true" />
>> >> 
>> >> 
>> >> > >> 
>> >> > >> multiValued="false" />
>> >> > stored="true"
>> >> multiValued="false" />
>> >> 
>> >> 
>> >>
>> >>
>> >> I add a document with this command:
>> >>
>> >>
>> >>
>> >> curl http://:/solr/default/update?commit=true -H
>> >> "Content-Type: text/xml" --data-binary '> >> name="id">1paco> name="src_desc"
>> >> >friend of mine'
>> >>
>> >>
>> >>
>> >> And later I update the field 'name' with this command:
>> >>
>> >>
>> >>
>> >> curl http://:/solr/default/update?commit=true -H
>> >> "Content-Type: text/xml" --data-binary '> >> name="id">1> >> update="set">paquico'
>> >>
>> >>
>> >>
>> >> As I do so the doc i retrive from Solr is:
>> >>
>> >>
>> >>
>> >> 
>> >>  
>> >>   1
>> >>   paquico
>> >>   friend of mine
>> >>   
>> >>friend of mine
>> >>friend of mine
>> >>   
>> >>   1506750859550130176
>> >>   1.0
>> >>  
>> >> 
>> >>
>> >>
>> >>
>> >> So I get a non multivalued field (dinamic_desc) with multiple values :(
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> 
>> >> Este correo electrónico y, en su caso, cualquier fichero anexo al
>> mismo,
>> >> contiene información de carácter confidencial exclusivamente dirigida
>> a su
>> >> destinatario o destinatarios. Si no es vd. el destinatario indicado,
>> queda
>> >> notificado que la lectura, utilización, divulgación y/o copia sin
>> >> autorización está prohibida en virtud de la legislación vigente. En el
>> caso
>> >> de haber recibido este corre

Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Bernd Fehling
Hi Markus,

excellent, reloading the core did it.

Best regards
Bernd


Am 15.07.2015 um 13:44 schrieb Markus Jelsma:
> Well yes, a simple empty commit won't do the trick, the searcher is not going 
> to reload on recent versions. Reloading the core will. 
>  
> -Original message-
>> From:Bernd Fehling 
>> Sent: Wednesday 15th July 2015 13:42
>> To: solr-user@lucene.apache.org
>> Subject: Re: To the experts: howto force opening a new searcher?
>>
>> What ever you name a problem, I just wanted to open a new searcher
>> after several days of heavy load/searching on one of my slaves
>> to do some testing with empty field-/document-/filter-caches.
>>
>> Sure, I could first add, then delete a document and do a commit.
>> Or may be only do a fake update of a document with a commit (if this works).
>> But I don't want any changes on the index, just start a new searcher
>> and close the old one. This is my problem. I don't see any X/Y here.
>>
>> Regards
>> Bernd
>>
>>
>> Am 15.07.2015 um 12:46 schrieb Alessandro Benedetti:
>>> Triggering a commit , implies the new Searcher to be opened in a soft
>>> commit scenario.
>>> With an hard commit, you can decide if opening or not the new searcher.
>>>
>>> But this is probably a X/Y problem.
>>>
>>> Can you describe better your real problem and not the way you were trying
>>> to solve it ?
>>>
>>> Cheers
>>>
>>> 2015-07-15 9:57 GMT+01:00 Andrea Gazzarini :
>>>
 On top of that sorry, I didn't answer to your question because I don't know
 if that is possible

 Best,
 Andrea
 On 15 Jul 2015 02:51, "Andrea Gazzarini"  wrote:

> What do you mean with "clean" state? A searcher is a view over a given
> index (let's say) "state"...if the state didn't change why do you want
> another (identical) view?
>
> On 15 Jul 2015 02:30, "Bernd Fehling" 
> wrote:
>>
>> I'm doing some testing on long running huge indexes.
>> Therefore I need a "clean" state after some days running.
>> My idea was to open a new searcher with commit command:
>>
>> INFO  - org.apache.solr.update.DirectUpdateHandler2;
>> start
>
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
> changes. Skipping IW.commit.
>> INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not
 changed
> - not re-opening: org.apache.solr.search.SolrIndexSearcher
>> INFO  - org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>
>> But the result is that the DirectUpdateHandler2  is skipping the
 commit.
>>
>> Any other ideas how to force opening a new searcher without optimizing
> or loading anything?
>>
>> Best regards
>> Bernd
>

>>


Re: Solr 5 options

2015-07-15 Thread spleenboy
OK. so effectively use the core product as it was in Solr 4, running a
schema.xml file to control doc structures and validation. In Sol 5, does
anyone have a clear link or some pointers as to the options for bin/solr
create_core to boot up the instance I need?
Thanks for all the help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-options-tp4217236p4217459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Alessandro Benedetti
2015-07-15 12:44 GMT+01:00 Markus Jelsma :

> Well yes, a simple empty commit won't do the trick, the searcher is not
> going to reload on recent versions. Reloading the core will.
>
mmm  Markus, let's assume we trigger a soft commit, even empty, if open
searcher is equal true, it is not going to be forced ?

>From the DirectUpdateHandler2
…

if (cmd.openSearcher) {
  core.getSearcher(true, false, waitSearcher);

…

It seems that :

forceNew if true, force the open of a new index searcher
regardless if there is already one open.

Am i wrong ? Am I missing anything ?



>
> -Original message-
> > From:Bernd Fehling 
> > Sent: Wednesday 15th July 2015 13:42
> > To: solr-user@lucene.apache.org
> > Subject: Re: To the experts: howto force opening a new searcher?
> >
> > What ever you name a problem, I just wanted to open a new searcher
> > after several days of heavy load/searching on one of my slaves
> > to do some testing with empty field-/document-/filter-caches.
>
Aren't you warming your caches on commits ? You always discard all the old
caches without warming them ?



> >
> > Sure, I could first add, then delete a document and do a commit.
> > Or may be only do a fake update of a document with a commit (if this
> works).
> > But I don't want any changes on the index, just start a new searcher
> > and close the old one. This is my problem. I don't see any X/Y here.
> >
> > Regards
> > Bernd
> >
> >
> > Am 15.07.2015 um 12:46 schrieb Alessandro Benedetti:
> > > Triggering a commit , implies the new Searcher to be opened in a soft
> > > commit scenario.
> > > With an hard commit, you can decide if opening or not the new searcher.
> > >
> > > But this is probably a X/Y problem.
> > >
> > > Can you describe better your real problem and not the way you were
> trying
> > > to solve it ?
> > >
> > > Cheers
> > >
> > > 2015-07-15 9:57 GMT+01:00 Andrea Gazzarini :
> > >
> > >> On top of that sorry, I didn't answer to your question because I
> don't know
> > >> if that is possible
> > >>
> > >> Best,
> > >> Andrea
> > >> On 15 Jul 2015 02:51, "Andrea Gazzarini" 
> wrote:
> > >>
> > >>> What do you mean with "clean" state? A searcher is a view over a
> given
> > >>> index (let's say) "state"...if the state didn't change why do you
> want
> > >>> another (identical) view?
> > >>>
> > >>> On 15 Jul 2015 02:30, "Bernd Fehling" <
> bernd.fehl...@uni-bielefeld.de>
> > >>> wrote:
> > 
> >  I'm doing some testing on long running huge indexes.
> >  Therefore I need a "clean" state after some days running.
> >  My idea was to open a new searcher with commit command:
> > 
> >  INFO  - org.apache.solr.update.DirectUpdateHandler2;
> >  start
> > >>>
> > >>
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> >  INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
> > >>> changes. Skipping IW.commit.
> >  INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not
> > >> changed
> > >>> - not re-opening: org.apache.solr.search.SolrIndexSearcher
> >  INFO  - org.apache.solr.update.DirectUpdateHandler2;
> end_commit_flush
> > 
> >  But the result is that the DirectUpdateHandler2  is skipping the
> > >> commit.
> > 
> >  Any other ideas how to force opening a new searcher without
> optimizing
> > >>> or loading anything?
> > 
> >  Best regards
> >  Bernd
> > >>>
> > >>
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


RE: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Martínez López , Alfonso
Ok, thank very much.

When I try a second atomic udpate is when I got the exception you mentioned 
"multiple values encountered
for non multiValued copy field". First time there is not exception but the 
non-multivalued field get indexed with 2 values.

Cheers.


From: Alessandro Benedetti [benedetti.ale...@gmail.com]
Sent: Wednesday, July 15, 2015 2:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Does update field feature work in a schema with dynamic fields?

Going through the code in the RunUpdateRequestProcessor we call at one
point :

…

Document luceneDocument = cmd.getLuceneDocument();
// SolrCore.verbose("updateDocument",updateTerm,luceneDocument,writer);
writer.updateDocument(updateTerm, luceneDocument);

..


Inside that method we call :

public Document getLuceneDocument() {
  return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema());
}


Then exploring the toDocument we find what we need :

org/apache/solr/update/DocumentBuilder.java:114

And looking into there we realise it is a bug :

…

if (!destinationField.multiValued() && destHasValues) {
  throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
  "ERROR: "+getID(doc, schema)+"multiple values encountered
for non multiValued copy field " +
  destinationField.getName() + ": " + v);
}

...


Because the check is actually checking in already used fields, not
expecting  the copy field to be already valorised.

Hence the exception coming.

I will create an issue for that and provide a patch as soon as I have
time ( or anyone can provide the patch).


Cheers






2015-07-15 12:27 GMT+01:00 Alessandro Benedetti 
:

> Ohhh!
> I didn't read it completely, so i missed the copy field.
> Ok now.
> This is the explanation :
> Copy fields are added at indexing time, when the document arrived to the
> RunUpdateRequest processor.
> If i remember well at this point , before we start the indexing the
> content of source field is added to the copy field as a value.
>
> The first time you indexed your document , the first copy was added.
>
> What you didn't know is the fact that actually atomic update works in this
> way :
>
> 1) I get the current Doc from the index ( the stored fields),
> 2) I do the update, and
> 3) then I send the document to the indexing processing chain *again*.
> So the value is copied a second time.
>
> This will produce the duplicate value.
> I can go in deep, but I think this is the cause.
>
> Cheers
>
>
> 2015-07-15 12:10 GMT+01:00 Martínez López, Alfonso :
>
>> Hi, thanks for your help!
>>
>> Value for 'dinamic_desc' field come from 'src_desc' field. I copy the
>> value with:
>>
>> 
>>
>> Seems like when I update a different field (field 'name') via atomic
>> update, the copyField directive copies the value again from 'src_desc' to
>> 'desc_field', instead of updating the value, like if 'desc_field' or
>> 'desc_*' where multivalued.
>>
>> Cheers.
>> 
>> From: Alessandro Benedetti [benedetti.ale...@gmail.com]
>> Sent: Wednesday, July 15, 2015 12:56 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Does update field feature work in a schema with dynamic
>> fields?
>>
>> Just tried, on Solr 5.1 and I get the proper behaviour.
>>
>> Actually where is the value for the dinamic_desc coming from ?
>>
>> I can not see it in the updates and actually it is not in my index.
>> Are you sure you have not forgotten any detail ?
>>
>> Cheers
>>
>> 2015-07-15 11:48 GMT+01:00 Alessandro Benedetti <
>> benedetti.ale...@gmail.com>
>> :
>>
>> > This is kinda weird and looks a lot like a bug.
>> > Let me try to reproduce it locally!
>> > I let you know soon !
>> >
>> > Cheers
>> >
>> > 2015-07-15 10:01 GMT+01:00 Martínez López, Alfonso :
>> >
>> >> Hi,
>> >>
>> >> i'm using Solr 4.10.3, and i'm trying update a doc field using atomic
>> >> update (http://wiki.apache.org/solr/Atomic_Updates).
>> >>
>> >> My schema.xml is like this:
>> >>
>> >> 
>> >> > >> required="true" />
>> >> 
>> >> 
>> >> > >> 
>> >> > >> multiValued="false" />
>> >> > stored="true"
>> >> multiValued="false" />
>> >> 
>> >> 
>> >>
>> >>
>> >> I add a document with this command:
>> >>
>> >>
>> >>
>> >> curl http://:/solr/default/update?commit=true -H
>> >> "Content-Type: text/xml" --data-binary '> >> name="id">1paco> name="src_desc"
>> >> >friend of mine'
>> >>
>> >>
>> >>
>> >> And later I update the field 'name' with this command:
>> >>
>> >>
>> >>
>> >> curl http://:/solr/default/update?commit=true -H
>> >> "Content-Type: text/xml" --data-binary '> >> name="id">1> >> update="set">paquico'
>> >>
>> >>
>> >>
>> >> As I do so the doc i retrive from Solr is:
>> >>
>> >>
>> >>
>> >> 
>> >>  
>> >>   1
>> >>   paquico
>> >>   friend of mine
>> >>   
>> >>friend of mine
>> >>friend of mine
>> >>   
>> >>   1506750859550130176
>> >>   1.0
>> >>  
>> >> 
>> >>
>> >>
>> >>
>> >> So I get a non multivalued field (dinamic_desc) with multiple values :(
>

RE: SOLR nrt read writes

2015-07-15 Thread Reitzel, Charles
And, to answer your other question, yes, you can turn off auto-warming.If 
your instance is dedicated to this client task, it may serve no purpose or be 
actually counter-productive.

In the past, I worked on a Solr-based application that committed frequently 
under application control (vs. auto commit) and we turned off all auto-warming 
and most of the caching.

There is scant documentation in the new Solr reference (cwiki.apache.org), but 
the old docs cover this well and appear current enough: 
https://wiki.apache.org/solr/SolrCaching

Just a thought: would true be helpful here?

Also, since you have just inserted the documents, it sounds like you probably 
could search by ID ...

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, July 14, 2015 6:04 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR nrt read writes

On 7/14/2015 12:19 PM, Bhawna Asnani wrote:
> I have a use case where we have to write data into solr and 
> immediately read it back.
> The read is not get by Id but a search call.
>
> I am doing a softCommit after every such write which needs to be 
> visible immediately.
> However sometimes the changes are not visible immediately.
>
> We have a solr cloud but I have also tried sending reads, writes and 
> commits to cloud leader only and still there is some latency.
>
> Has anybody tried to use solr this way?

Don't ignore what Erick has said just because you're getting this reply from 
someone else.  That advice is correct.  My intent here is to provide more 
detail.

Since you are not doing a retrieval by id (uniqueKey field), you can't use the 
Realtime Get handler.  That handler would get the latest version of a doc, 
whether it has been committed or not.  The transaction logs (configured with 
updateLog in solrconfig.xml) are used to retrieve uncommitted information.  Can 
you change your retrieval so it's by id rather than a search query?  If you 
can, that might solve this for you.

Normally, if you do a commit operation with openSearcher=true and 
waitSearcher=true, control of the program will not be returned until that 
commit is completely done ... but as Erick said, if you are doing a LOT of 
commits very quickly, you're probably going to exceed maxWarmingSearchers, and 
in that scenario, you cannot rely on using the commit operation as a blocker 
for your retrieval attempt.

In order to have any hope of getting what you want with your current methods, 
your commit frequency must be low enough that each commit has time to finish 
before the next one begins.  I personally would not do commits more often than 
once a minute.  Commits on my larger index shards are known to take up to ten 
seconds when the index is quiet, and even more if the index is busy.  There are 
ways to make commits happen faster, but it often involves disabling features 
that you might want to leave enabled.

Thanks,
Shawn


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*


Re: To the experts: howto force opening a new searcher?

2015-07-15 Thread Bernd Fehling


Am 15.07.2015 um 14:47 schrieb Alessandro Benedetti:
...
>>> What ever you name a problem, I just wanted to open a new searcher
>>> after several days of heavy load/searching on one of my slaves
>>> to do some testing with empty field-/document-/filter-caches.
>>
> Aren't you warming your caches on commits ? You always discard all the old
> caches without warming them ?
> 
> 

Only in this single very special case, yes!

In normal operation mode there will be some warming of caches,
but never a forced opening of a new searcher or anything unusual.



RE: To the experts: howto force opening a new searcher?

2015-07-15 Thread Markus Jelsma
See SOLR-5783.
 
 
-Original message-
> From:Alessandro Benedetti 
> Sent: Wednesday 15th July 2015 14:48
> To: solr-user@lucene.apache.org
> Subject: Re: To the experts: howto force opening a new searcher?
> 
> 2015-07-15 12:44 GMT+01:00 Markus Jelsma :
> 
> > Well yes, a simple empty commit won't do the trick, the searcher is not
> > going to reload on recent versions. Reloading the core will.
> >
> mmm  Markus, let's assume we trigger a soft commit, even empty, if open
> searcher is equal true, it is not going to be forced ?
> 
> From the DirectUpdateHandler2
> …
> 
> if (cmd.openSearcher) {
>   core.getSearcher(true, false, waitSearcher);
> 
> …
> 
> It seems that :
> 
> forceNew if true, force the open of a new index searcher
> regardless if there is already one open.
> 
> Am i wrong ? Am I missing anything ?
> 
> 
> 
> >
> > -Original message-
> > > From:Bernd Fehling 
> > > Sent: Wednesday 15th July 2015 13:42
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: To the experts: howto force opening a new searcher?
> > >
> > > What ever you name a problem, I just wanted to open a new searcher
> > > after several days of heavy load/searching on one of my slaves
> > > to do some testing with empty field-/document-/filter-caches.
> >
> Aren't you warming your caches on commits ? You always discard all the old
> caches without warming them ?
> 
> 
> 
> > >
> > > Sure, I could first add, then delete a document and do a commit.
> > > Or may be only do a fake update of a document with a commit (if this
> > works).
> > > But I don't want any changes on the index, just start a new searcher
> > > and close the old one. This is my problem. I don't see any X/Y here.
> > >
> > > Regards
> > > Bernd
> > >
> > >
> > > Am 15.07.2015 um 12:46 schrieb Alessandro Benedetti:
> > > > Triggering a commit , implies the new Searcher to be opened in a soft
> > > > commit scenario.
> > > > With an hard commit, you can decide if opening or not the new searcher.
> > > >
> > > > But this is probably a X/Y problem.
> > > >
> > > > Can you describe better your real problem and not the way you were
> > trying
> > > > to solve it ?
> > > >
> > > > Cheers
> > > >
> > > > 2015-07-15 9:57 GMT+01:00 Andrea Gazzarini :
> > > >
> > > >> On top of that sorry, I didn't answer to your question because I
> > don't know
> > > >> if that is possible
> > > >>
> > > >> Best,
> > > >> Andrea
> > > >> On 15 Jul 2015 02:51, "Andrea Gazzarini" 
> > wrote:
> > > >>
> > > >>> What do you mean with "clean" state? A searcher is a view over a
> > given
> > > >>> index (let's say) "state"...if the state didn't change why do you
> > want
> > > >>> another (identical) view?
> > > >>>
> > > >>> On 15 Jul 2015 02:30, "Bernd Fehling" <
> > bernd.fehl...@uni-bielefeld.de>
> > > >>> wrote:
> > > 
> > >  I'm doing some testing on long running huge indexes.
> > >  Therefore I need a "clean" state after some days running.
> > >  My idea was to open a new searcher with commit command:
> > > 
> > >  INFO  - org.apache.solr.update.DirectUpdateHandler2;
> > >  start
> > > >>>
> > > >>
> > commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> > >  INFO  - org.apache.solr.update.DirectUpdateHandler2; No uncommitted
> > > >>> changes. Skipping IW.commit.
> > >  INFO  - org.apache.solr.core.SolrCore; SolrIndexSearcher has not
> > > >> changed
> > > >>> - not re-opening: org.apache.solr.search.SolrIndexSearcher
> > >  INFO  - org.apache.solr.update.DirectUpdateHandler2;
> > end_commit_flush
> > > 
> > >  But the result is that the DirectUpdateHandler2  is skipping the
> > > >> commit.
> > > 
> > >  Any other ideas how to force opening a new searcher without
> > optimizing
> > > >>> or loading anything?
> > > 
> > >  Best regards
> > >  Bernd
> > > >>>
> > > >>
> > >
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England
> 


Re: SOLR nrt read writes

2015-07-15 Thread Daniel Collins
Just to re-iterate Charles' response with an example, we have a system
which needs to be as Near RT as we can make it.  So we have application
level commitWith set to 250ms.  Yes, we have to turn off a lot of caching,
auto-warming, etc, but it was necessary to make the index as real time as
we needed it to be.  Now we have the benefit of being able to throw a lot
of hardware, RAM and SSDs at this in order to get any kind of sane search
latency.

We have the luxury of being able to afford that, but it comes with other
problems because we have an index that is changing so fast (replicating to
other nodes in the cloud becomes tricky, peer sync fails most of the time,
etc.)

What is your use case that requires this level of real-time access?

On 15 July 2015 at 13:59, Reitzel, Charles 
wrote:

> And, to answer your other question, yes, you can turn off auto-warming.
> If your instance is dedicated to this client task, it may serve no purpose
> or be actually counter-productive.
>
> In the past, I worked on a Solr-based application that committed
> frequently under application control (vs. auto commit) and we turned off
> all auto-warming and most of the caching.
>
> There is scant documentation in the new Solr reference (cwiki.apache.org),
> but the old docs cover this well and appear current enough:
> https://wiki.apache.org/solr/SolrCaching
>
> Just a thought: would true be helpful
> here?
>
> Also, since you have just inserted the documents, it sounds like you
> probably could search by ID ...
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Tuesday, July 14, 2015 6:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR nrt read writes
>
> On 7/14/2015 12:19 PM, Bhawna Asnani wrote:
> > I have a use case where we have to write data into solr and
> > immediately read it back.
> > The read is not get by Id but a search call.
> >
> > I am doing a softCommit after every such write which needs to be
> > visible immediately.
> > However sometimes the changes are not visible immediately.
> >
> > We have a solr cloud but I have also tried sending reads, writes and
> > commits to cloud leader only and still there is some latency.
> >
> > Has anybody tried to use solr this way?
>
> Don't ignore what Erick has said just because you're getting this reply
> from someone else.  That advice is correct.  My intent here is to provide
> more detail.
>
> Since you are not doing a retrieval by id (uniqueKey field), you can't use
> the Realtime Get handler.  That handler would get the latest version of a
> doc, whether it has been committed or not.  The transaction logs
> (configured with updateLog in solrconfig.xml) are used to retrieve
> uncommitted information.  Can you change your retrieval so it's by id
> rather than a search query?  If you can, that might solve this for you.
>
> Normally, if you do a commit operation with openSearcher=true and
> waitSearcher=true, control of the program will not be returned until that
> commit is completely done ... but as Erick said, if you are doing a LOT of
> commits very quickly, you're probably going to exceed maxWarmingSearchers,
> and in that scenario, you cannot rely on using the commit operation as a
> blocker for your retrieval attempt.
>
> In order to have any hope of getting what you want with your current
> methods, your commit frequency must be low enough that each commit has time
> to finish before the next one begins.  I personally would not do commits
> more often than once a minute.  Commits on my larger index shards are known
> to take up to ten seconds when the index is quiet, and even more if the
> index is busy.  There are ways to make commits happen faster, but it often
> involves disabling features that you might want to leave enabled.
>
> Thanks,
> Shawn
>
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA-CREF
> *
>


Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Jack Krupansky
It is simply precision (AND) vs. recall (OR) - the former tries to limit
the total result count, while the latter tries to focus on relevancy of the
top results even if the total result count is higher.

Recall is good for discovery and browsing, where you sort of know what you
generally want, but not exactly with any great precision.

Recall will include results that almost meet the query terms, but maybe
some are missing.

Precision will guarantee and insist that all query terms are present.

One great example for recall is a plagiarism query - enter all the terms
for a passage and then find documents that most closely approximate the
passage without being necessarily exact matches. IOW, the plagiarizer
changes a word here and there.

-- Jack Krupansky

On Wed, Jul 15, 2015 at 8:16 AM, Steven White  wrote:

> Hi Everyone,
>
> Out-of-the box, Solr (Lucene?) is set to use OR as the default Boolean
> operator.  Can someone tell me the advantages / disadvantages of using OR
> or AND as the default?
>
> I'm leaning toward AND as the default because the more words a user types,
> the narrower the result set should be.
>
> Thanks
>
> Steve
>


Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Shawn Heisey
On 7/15/2015 3:01 AM, Martínez López, Alfonso wrote:
> 
> 
> 
> 
>  
>  multiValued="false" />
>  multiValued="false" />
> 
> 



> And later I update the field 'name' with this command:
>
> curl http://:/solr/default/update?commit=true -H 
> "Content-Type: text/xml" --data-binary ' name="id">1 update="set">paquico'
>
> As I do so the doc i retrive from Solr is:
>
> 
>  
>   1
>   paquico
>   friend of mine
>   
>friend of mine
>friend of mine
>   
>   1506750859550130176
>   1.0
>  
> 

The problem here is that the copyField destination is stored, so you get
the original value of the destination field plus another copy from src_desc.

If you look carefully at the "caveats and limitations" for Atomic
Updates, you will see that all copyField destinations must be unstored
for proper operation:

https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

It appears that this information was *NOT* in the Solr Reference Guide,
so I updated the reference guide to include it.

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

Here's a question for those with more expertise than me:  If a copyField
destination were stored, but not multiValued, and an atomic update was
attempted, would the update fail entirely?  I suspect it would, and I'd
like to make the ref guide info as accurate as I can.

Thanks,
Shawn



Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Walter Underwood
The AND default has one big problem. If the user misspells a single word, they 
get no results. About 10% of queries are misspelled, so that means a lot more 
failures.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jul 15, 2015, at 7:21 AM, Jack Krupansky  wrote:

> It is simply precision (AND) vs. recall (OR) - the former tries to limit
> the total result count, while the latter tries to focus on relevancy of the
> top results even if the total result count is higher.
> 
> Recall is good for discovery and browsing, where you sort of know what you
> generally want, but not exactly with any great precision.
> 
> Recall will include results that almost meet the query terms, but maybe
> some are missing.
> 
> Precision will guarantee and insist that all query terms are present.
> 
> One great example for recall is a plagiarism query - enter all the terms
> for a passage and then find documents that most closely approximate the
> passage without being necessarily exact matches. IOW, the plagiarizer
> changes a word here and there.
> 
> -- Jack Krupansky
> 
> On Wed, Jul 15, 2015 at 8:16 AM, Steven White  wrote:
> 
>> Hi Everyone,
>> 
>> Out-of-the box, Solr (Lucene?) is set to use OR as the default Boolean
>> operator.  Can someone tell me the advantages / disadvantages of using OR
>> or AND as the default?
>> 
>> I'm leaning toward AND as the default because the more words a user types,
>> the narrower the result set should be.
>> 
>> Thanks
>> 
>> Steve
>> 



Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Alessandro Benedetti
Hey Shawn, I was debugging a little bit,this is the problem :

When adding a field from the solr document, to the Lucene one, even if this
document was previously added by the execution of the copy field
instruction to the Lucene Document, this check is carried :

org/apache/solr/update/DocumentBuilder.java:89

// Make sure it has the correct number
if( sfield!=null && !sfield.multiValued() && field.getValueCount() > 1 ) {
  throw new SolrException( SolrException.ErrorCode.BAD_REQUEST,
  "ERROR: "+getID(doc, schema)+"multiple values encountered for
non multiValued field " +
sfield.getName() + ": " +field.getValue() );
}


This will actually check the Solr Document field.getValueCount(),
ignoring the fact that the field was already added to the Lucene
Document to be indexed.

So the field will be added again.

I understand that doesn't make a lot of sense to store a copy field (
nothing will change from the source field stored content).

But I guess we should try to avoid this anomalies.

The simple solution should be to not execute any copy field
instruction in an updatedDocument.

Unfortunately we don't have any signal in the DocumentBuilder at that
point, that will let us know if the Document is a new one or an
updated one.

What do you think?

Will be easy to have the Document builder aware if it is an add or an
update, and react ?

If not we need to think something else.


Cheers





2015-07-15 15:25 GMT+01:00 Shawn Heisey :

> On 7/15/2015 3:01 AM, Martínez López, Alfonso wrote:
> > 
> >  required="true" />
> > 
> > 
> >  > 
> >  multiValued="false" />
> >  stored="true" multiValued="false" />
> > 
> > 
>
> 
>
> > And later I update the field 'name' with this command:
> >
> > curl http://:/solr/default/update?commit=true -H
> "Content-Type: text/xml" --data-binary ' name="id">1 update="set">paquico'
> >
> > As I do so the doc i retrive from Solr is:
> >
> > 
> >  
> >   1
> >   paquico
> >   friend of mine
> >   
> >friend of mine
> >friend of mine
> >   
> >   1506750859550130176
> >   1.0
> >  
> > 
>
> The problem here is that the copyField destination is stored, so you get
> the original value of the destination field plus another copy from
> src_desc.
>
> If you look carefully at the "caveats and limitations" for Atomic
> Updates, you will see that all copyField destinations must be unstored
> for proper operation:
>
> https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations
>
> It appears that this information was *NOT* in the Solr Reference Guide,
> so I updated the reference guide to include it.
>
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
> Here's a question for those with more expertise than me:  If a copyField
> destination were stored, but not multiValued, and an atomic update was
> attempted, would the update fail entirely?  I suspect it would, and I'd
> like to make the ref guide info as accurate as I can.
>
> Thanks,
> Shawn
>
>


-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: SOLR nrt read writes

2015-07-15 Thread Bhawna Asnani
We are building an admin for our inventory. Using solr's faceting,
searching and stats functionality it provides different ways an admin can
look at the inventory.
The admin can also do some updates on the items and they need to see the
updates almost real time.

Our public facing website is already built using solr so we already have
the api in place to work with solr.
We were hoping we can put a solr instance just for admin (low traffic and
low latency) and build the functionality.

Thanks for your suggesstions.

On Wed, Jul 15, 2015 at 9:37 AM, Daniel Collins 
wrote:

> Just to re-iterate Charles' response with an example, we have a system
> which needs to be as Near RT as we can make it.  So we have application
> level commitWith set to 250ms.  Yes, we have to turn off a lot of caching,
> auto-warming, etc, but it was necessary to make the index as real time as
> we needed it to be.  Now we have the benefit of being able to throw a lot
> of hardware, RAM and SSDs at this in order to get any kind of sane search
> latency.
>
> We have the luxury of being able to afford that, but it comes with other
> problems because we have an index that is changing so fast (replicating to
> other nodes in the cloud becomes tricky, peer sync fails most of the time,
> etc.)
>
> What is your use case that requires this level of real-time access?
>
> On 15 July 2015 at 13:59, Reitzel, Charles 
> wrote:
>
> > And, to answer your other question, yes, you can turn off auto-warming.
> > If your instance is dedicated to this client task, it may serve no
> purpose
> > or be actually counter-productive.
> >
> > In the past, I worked on a Solr-based application that committed
> > frequently under application control (vs. auto commit) and we turned off
> > all auto-warming and most of the caching.
> >
> > There is scant documentation in the new Solr reference (cwiki.apache.org
> ),
> > but the old docs cover this well and appear current enough:
> > https://wiki.apache.org/solr/SolrCaching
> >
> > Just a thought: would true be helpful
> > here?
> >
> > Also, since you have just inserted the documents, it sounds like you
> > probably could search by ID ...
> >
> > -Original Message-
> > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > Sent: Tuesday, July 14, 2015 6:04 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SOLR nrt read writes
> >
> > On 7/14/2015 12:19 PM, Bhawna Asnani wrote:
> > > I have a use case where we have to write data into solr and
> > > immediately read it back.
> > > The read is not get by Id but a search call.
> > >
> > > I am doing a softCommit after every such write which needs to be
> > > visible immediately.
> > > However sometimes the changes are not visible immediately.
> > >
> > > We have a solr cloud but I have also tried sending reads, writes and
> > > commits to cloud leader only and still there is some latency.
> > >
> > > Has anybody tried to use solr this way?
> >
> > Don't ignore what Erick has said just because you're getting this reply
> > from someone else.  That advice is correct.  My intent here is to provide
> > more detail.
> >
> > Since you are not doing a retrieval by id (uniqueKey field), you can't
> use
> > the Realtime Get handler.  That handler would get the latest version of a
> > doc, whether it has been committed or not.  The transaction logs
> > (configured with updateLog in solrconfig.xml) are used to retrieve
> > uncommitted information.  Can you change your retrieval so it's by id
> > rather than a search query?  If you can, that might solve this for you.
> >
> > Normally, if you do a commit operation with openSearcher=true and
> > waitSearcher=true, control of the program will not be returned until that
> > commit is completely done ... but as Erick said, if you are doing a LOT
> of
> > commits very quickly, you're probably going to exceed
> maxWarmingSearchers,
> > and in that scenario, you cannot rely on using the commit operation as a
> > blocker for your retrieval attempt.
> >
> > In order to have any hope of getting what you want with your current
> > methods, your commit frequency must be low enough that each commit has
> time
> > to finish before the next one begins.  I personally would not do commits
> > more often than once a minute.  Commits on my larger index shards are
> known
> > to take up to ten seconds when the index is quiet, and even more if the
> > index is busy.  There are ways to make commits happen faster, but it
> often
> > involves disabling features that you might want to leave enabled.
> >
> > Thanks,
> > Shawn
> >
> >
> > *
> > This e-mail may contain confidential or privileged information.
> > If you are not the intended recipient, please notify the sender
> > immediately and then delete it.
> >
> > TIAA-CREF
> > *
> >
>


Re: Querying Nested documents

2015-07-15 Thread Mikhail Khludnev
1. I can't get your explanation.

2. childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)
is not correct, lacks of quotes , and pointless (selecting some term, and
negating all terms gives nothing). Thus, considerable syntax can be only
childFilter="other_field:somevalue -image_uri_s:*"

3.  I can only guess that you are asking about something like:
http://localhost:8983/solr/demo/select?q={!parent
which='type:parent'}image_uri_s:somevalue&fl=*,[child
parentFilter=type:parent
childFilter=-type:parent]&indent=true


On Tue, Jul 14, 2015 at 11:56 PM, Ramesh Nuthalapati <
ramesh.nuthalap...@gmail.com> wrote:

> Yes you are right.
>
> So the query you are saying should be like below .. or did I misunderstood
> it
>
> http://localhost:8983/solr/demo/select?q= {!parent
> which='type:parent'}&fl=*,[child parentFilter=type:parent
> childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)]&indent=true
>
> If so, I am getting an error with parsing field name.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217169p4217348.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





RE: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Martínez López , Alfonso
Hi,
in some cases it can be necessary to have the copy field stored. My Solr 
instance is used by some legacy applications that need to retrive fields by 
some especific field names. That's why i need to mantain 2 copies of the same 
field: one with the old name and other for the new name (that is provided by 
others applications).

In my case I could work this out at a higher lever but it would be very helpful 
it can be solve in the schema.xml.

From: Alessandro Benedetti [benedetti.ale...@gmail.com]
Sent: Wednesday, July 15, 2015 4:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Does update field feature work in a schema with dynamic fields?

Hey Shawn, I was debugging a little bit,this is the problem :

When adding a field from the solr document, to the Lucene one, even if this
document was previously added by the execution of the copy field
instruction to the Lucene Document, this check is carried :

org/apache/solr/update/DocumentBuilder.java:89

// Make sure it has the correct number
if( sfield!=null && !sfield.multiValued() && field.getValueCount() > 1 ) {
  throw new SolrException( SolrException.ErrorCode.BAD_REQUEST,
  "ERROR: "+getID(doc, schema)+"multiple values encountered for
non multiValued field " +
sfield.getName() + ": " +field.getValue() );
}


This will actually check the Solr Document field.getValueCount(),
ignoring the fact that the field was already added to the Lucene
Document to be indexed.

So the field will be added again.

I understand that doesn't make a lot of sense to store a copy field (
nothing will change from the source field stored content).

But I guess we should try to avoid this anomalies.

The simple solution should be to not execute any copy field
instruction in an updatedDocument.

Unfortunately we don't have any signal in the DocumentBuilder at that
point, that will let us know if the Document is a new one or an
updated one.

What do you think?

Will be easy to have the Document builder aware if it is an add or an
update, and react ?

If not we need to think something else.


Cheers





2015-07-15 15:25 GMT+01:00 Shawn Heisey :

> On 7/15/2015 3:01 AM, Martínez López, Alfonso wrote:
> > 
> >  required="true" />
> > 
> > 
> >  > 
> >  multiValued="false" />
> >  stored="true" multiValued="false" />
> > 
> > 
>
> 
>
> > And later I update the field 'name' with this command:
> >
> > curl http://:/solr/default/update?commit=true -H
> "Content-Type: text/xml" --data-binary ' name="id">1 update="set">paquico'
> >
> > As I do so the doc i retrive from Solr is:
> >
> > 
> >  
> >   1
> >   paquico
> >   friend of mine
> >   
> >friend of mine
> >friend of mine
> >   
> >   1506750859550130176
> >   1.0
> >  
> > 
>
> The problem here is that the copyField destination is stored, so you get
> the original value of the destination field plus another copy from
> src_desc.
>
> If you look carefully at the "caveats and limitations" for Atomic
> Updates, you will see that all copyField destinations must be unstored
> for proper operation:
>
> https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations
>
> It appears that this information was *NOT* in the Solr Reference Guide,
> so I updated the reference guide to include it.
>
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
> Here's a question for those with more expertise than me:  If a copyField
> destination were stored, but not multiValued, and an atomic update was
> attempted, would the update fail entirely?  I suspect it would, and I'd
> like to make the ref guide info as accurate as I can.
>
> Thanks,
> Shawn
>
>


--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Este correo electrónico y, en su caso, cualquier fichero anexo al mismo, 
contiene información de carácter confidencial exclusivamente dirigida a su 
destinatario o destinatarios. Si no es vd. el destinatario indicado, queda 
notificado que la lectura, utilización, divulgación y/o copia sin autorización 
está prohibida en virtud de la legislación vigente. En el caso de haber 
recibido este correo electrónico por error, se ruega notificar inmediatamente 
esta circunstancia mediante reenvío a la dirección electrónica del remitente.
Evite imprimir este mensaje si no es estrictamente necesario.

This email and any file attached to it (when applicable) contain(s) 
confidential information that is exclusively addressed to its recipient(s). If 
you are not the indicated recipient, you are informed that reading, using, 
disseminating and/or copying it without authorisation is forbidden in 
accordance with the legislation in effect. If you have received this email by 
mistake, please immediately notif

RE: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Reitzel, Charles
A common approach to this problem is to include the spellcheck component and, 
if there are corrections, include a "Did you mean ..." link in the results page.

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Wednesday, July 15, 2015 10:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Which default Boolean operator to set, AND or OR?

The AND default has one big problem. If the user misspells a single word, they 
get no results. About 10% of queries are misspelled, so that means a lot more 
failures.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jul 15, 2015, at 7:21 AM, Jack Krupansky  wrote:

> It is simply precision (AND) vs. recall (OR) - the former tries to 
> limit the total result count, while the latter tries to focus on 
> relevancy of the top results even if the total result count is higher.
> 
> Recall is good for discovery and browsing, where you sort of know what 
> you generally want, but not exactly with any great precision.
> 
> Recall will include results that almost meet the query terms, but 
> maybe some are missing.
> 
> Precision will guarantee and insist that all query terms are present.
> 
> One great example for recall is a plagiarism query - enter all the 
> terms for a passage and then find documents that most closely 
> approximate the passage without being necessarily exact matches. IOW, 
> the plagiarizer changes a word here and there.
> 
> -- Jack Krupansky
> 
> On Wed, Jul 15, 2015 at 8:16 AM, Steven White  wrote:
> 
>> Hi Everyone,
>> 
>> Out-of-the box, Solr (Lucene?) is set to use OR as the default 
>> Boolean operator.  Can someone tell me the advantages / disadvantages 
>> of using OR or AND as the default?
>> 
>> I'm leaning toward AND as the default because the more words a user 
>> types, the narrower the result set should be.
>> 
>> Thanks
>> 
>> Steve
>> 


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*



Re: SOLR nrt read writes

2015-07-15 Thread Erick Erickson
bq: The admin can also do some updates on the items and they need to see the
updates almost real time.

Why not give the admin control over commits and default the other commits to
something reasonable? So make your defaults, say, 15 seconds (or 30 seconds
or longer). If the admin really needs the search to be absolutely up to
date, they can hit the "commit" button. With perhaps a little tool tip that
"the index is up to date as of  seconds ago,
press this button
to see absolutely all changes in real time".

That will quickly train the admins to use that button as necessary
when they really
_do_ need absolutely up-to-date data. My prediction: they'll issues these quite
rarely. 9 times out of 10, this kind of requirement is based on faulty
assumptions
and/or not understanding the work flow. That said, it may be totally a
requirement.
But at least ask the question.

Best,
Erick

On Wed, Jul 15, 2015 at 7:57 AM, Bhawna Asnani  wrote:
> We are building an admin for our inventory. Using solr's faceting,
> searching and stats functionality it provides different ways an admin can
> look at the inventory.
> The admin can also do some updates on the items and they need to see the
> updates almost real time.
>
> Our public facing website is already built using solr so we already have
> the api in place to work with solr.
> We were hoping we can put a solr instance just for admin (low traffic and
> low latency) and build the functionality.
>
> Thanks for your suggesstions.
>
> On Wed, Jul 15, 2015 at 9:37 AM, Daniel Collins 
> wrote:
>
>> Just to re-iterate Charles' response with an example, we have a system
>> which needs to be as Near RT as we can make it.  So we have application
>> level commitWith set to 250ms.  Yes, we have to turn off a lot of caching,
>> auto-warming, etc, but it was necessary to make the index as real time as
>> we needed it to be.  Now we have the benefit of being able to throw a lot
>> of hardware, RAM and SSDs at this in order to get any kind of sane search
>> latency.
>>
>> We have the luxury of being able to afford that, but it comes with other
>> problems because we have an index that is changing so fast (replicating to
>> other nodes in the cloud becomes tricky, peer sync fails most of the time,
>> etc.)
>>
>> What is your use case that requires this level of real-time access?
>>
>> On 15 July 2015 at 13:59, Reitzel, Charles 
>> wrote:
>>
>> > And, to answer your other question, yes, you can turn off auto-warming.
>> > If your instance is dedicated to this client task, it may serve no
>> purpose
>> > or be actually counter-productive.
>> >
>> > In the past, I worked on a Solr-based application that committed
>> > frequently under application control (vs. auto commit) and we turned off
>> > all auto-warming and most of the caching.
>> >
>> > There is scant documentation in the new Solr reference (cwiki.apache.org
>> ),
>> > but the old docs cover this well and appear current enough:
>> > https://wiki.apache.org/solr/SolrCaching
>> >
>> > Just a thought: would true be helpful
>> > here?
>> >
>> > Also, since you have just inserted the documents, it sounds like you
>> > probably could search by ID ...
>> >
>> > -Original Message-
>> > From: Shawn Heisey [mailto:apa...@elyograg.org]
>> > Sent: Tuesday, July 14, 2015 6:04 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: SOLR nrt read writes
>> >
>> > On 7/14/2015 12:19 PM, Bhawna Asnani wrote:
>> > > I have a use case where we have to write data into solr and
>> > > immediately read it back.
>> > > The read is not get by Id but a search call.
>> > >
>> > > I am doing a softCommit after every such write which needs to be
>> > > visible immediately.
>> > > However sometimes the changes are not visible immediately.
>> > >
>> > > We have a solr cloud but I have also tried sending reads, writes and
>> > > commits to cloud leader only and still there is some latency.
>> > >
>> > > Has anybody tried to use solr this way?
>> >
>> > Don't ignore what Erick has said just because you're getting this reply
>> > from someone else.  That advice is correct.  My intent here is to provide
>> > more detail.
>> >
>> > Since you are not doing a retrieval by id (uniqueKey field), you can't
>> use
>> > the Realtime Get handler.  That handler would get the latest version of a
>> > doc, whether it has been committed or not.  The transaction logs
>> > (configured with updateLog in solrconfig.xml) are used to retrieve
>> > uncommitted information.  Can you change your retrieval so it's by id
>> > rather than a search query?  If you can, that might solve this for you.
>> >
>> > Normally, if you do a commit operation with openSearcher=true and
>> > waitSearcher=true, control of the program will not be returned until that
>> > commit is completely done ... but as Erick said, if you are doing a LOT
>> of
>> > commits very quickly, you're probably going to exceed
>> maxWarmingSearchers,
>> > and in that scenario, you cannot rely on using the commit opera

Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Shawn Heisey
On 7/15/2015 8:55 AM, Martínez López, Alfonso wrote:
> in some cases it can be necessary to have the copy field stored. My Solr 
> instance is used by some legacy applications that need to retrive fields by 
> some especific field names. That's why i need to mantain 2 copies of the same 
> field: one with the old name and other for the new name (that is provided by 
> others applications).
>
> In my case I could work this out at a higher lever but it would be very 
> helpful it can be solve in the schema.xml.

This is understandable ... but unless a new feature is created, stored
copyField destinations are incompatible with Atomic Updates.

If we do build a new feature, we'd need to figure out exactly how Solr
should behave by default, and possibly whether there should be a
configuration option to choose old or new behavior.

It might be a good idea to have DistributedUpdateProcessor (or possibly
the AtomicUpdateDocumentMerger that it uses) skip the import of any
field that's a copyField destination by default, unless a configuration
is present that restores the old behavior.  The information about which
fields are copyField destinations *is* available to those classes, so it
would probably be a very easy change.

There's not really a STRONG need for that new feature ... if you store
both the source and destination fields, then you are storing exactly the
same information twice, which is wasteful and increases resource
requirements.  Most applications that use Solr have at least surface
knowledge of the schema, which should make it possible for the
application writer to just pull the information from the source field. 
It seems that your application may not have that knowledge, though.

Thanks,
Shawn



RE: copying data from one collection to another collection (solr cloud 521)

2015-07-15 Thread Reitzel, Charles
Since they want explicitly search within a given "version" of the data, this 
seems like a textbook application for collection aliases.   

You could have N public collection names: current_stuff, previous_stuff_1, 
previous_stuff_2, ...   At any given time, these will be aliased to reference 
the "actual" collection names:
current_stuff -> stuff_20150712, 
previous_stuff_1 -> stuff_20150705, 
previous_stuff_2 -> stuff_20150628,
...

Every weekend, you create a new collection and index everything current into 
it.  Once done, reset all the aliases to point to the newest N collections and 
dropping the oldest:
current_stuff -> stuff_20150719
previous_stuff_1 -> stuff_20150712,
previous_stuff_2 -> stuff_20150705,
...

Collections API: Create or modify an Alias for a Collection
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4

Thus, you can keep the IDs the same and use them to compare to previous 
versions of any given document.   Useful, if only for debugging purposes.

Curious if there are opportunities for optimization here.  For example, would 
it be faster to make a file system copy of the most recent collection and load 
only changed documents (assuming the delta is available from the source system)?

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, July 13, 2015 11:55 PM
To: solr-user@lucene.apache.org
Subject: Re: copying data from one collection to another collection (solr cloud 
521)

bq: does offline

No. I'm talking about "collection aliasing". You can create an entirely new 
collection, index to it however  you want then switch to using that new 
collection.

bq: Any updates to EXISTING document in the LIVE collection should NOT be 
replicated to the previous week(s) snapshot(s)

then give it a new ID maybe?

Best,
Erick

On Mon, Jul 13, 2015 at 3:21 PM, Raja Pothuganti  
wrote:
> Thank you Erick
>>Actually, my question is why do it this way at all? Why not index 
>>directly to your "live" nodes? This is what SolrCloud is built for.
>>You an use "implicit" routing to create shards say, for each week and 
>>age out the ones that are "too old" as well.
>
>
> Any updates to EXISTING document in the LIVE collection should NOT be 
> replicated to the previous week(s) snapshot(s). Think of the 
> snapshot(s) as an archive of sort and searchable independent of LIVE. 
> We're aiming to support at most 2 archives of data in the past.
>
>
>>Another option would be to use "collection aliasing" to keep an 
>>offline index up to date then switch over when necessary.
>
> Does offline indexing refers to this link 
> https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0
> b271f
> 102aa/search-mr
>
>
> Thanks
> Raja
>
>
>
> On 7/13/15, 3:14 PM, "Erick Erickson"  wrote:
>
>>Actually, my question is why do it this way at all? Why not index 
>>directly to your "live" nodes? This is what SolrCloud is built for.
>>
>>There's the new backup/restore functionality that's still a work in 
>>progress, see: https://issues.apache.org/jira/browse/SOLR-5750
>>
>>You an use "implicit" routing to create shards say, for each week and 
>>age out the ones that are "too old" as well.
>>
>>Another option would be to use "collection aliasing" to keep an 
>>offline index up to date then switch over when necessary.
>>
>>I'd really like to know this isn't an XY problem though, what's the 
>>high-level problem you're trying to solve?
>>
>>Best,
>>Erick
>>
>>On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti 
>> wrote:
>>>
>>> Hi,
>>> We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu 
>>>boxes. We currently ingest data into a large collection, call it LIVE.
>>>After the full ingest is done we then trigger a delta delta ingestion 
>>>every 15 minutes to get the documents & data that have changed into 
>>>this LIVE instance.
>>>
>>> In Solr 4.X using a Master / Slave setup we had slaves that would 
>>>periodically (weekly, or monthly) refresh their data from the Master 
>>>rather than every 15 minutes. We're now trying to figure out how to 
>>>get this same type of setup using SolrCloud.
>>>
>>> Question(s):
>>> - Is there a way to copy data from one SolrCloud collection into 
>>>another quickly and easily?
>>> - Is there a way to programmatically control when a replica receives 
>>>it's data or possibly move it to another collection (without losing
>>>data) that updates on a  different interval? It ideally would be 
>>>another collection name, call it Week1 ... Week52 ... to avoid a 
>>>replica in the same collection serving old data.
>>>
>>> One option we thought of was to create a backup and then restore 
>>>that into a new clean cloud. This has a lot of moving parts and isn't 
>>>nearly as neat as the Master / Slave controlled replication setup. It 
>>>also has the side effect of potentially taking a very long time to 
>>>backup and restore instead of just

Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Erick Erickson
Alfonso:

Haven't worked with this myself, but could "field aliasing" handle your use-case
_without_ the need for a copyField at all?
See: https://issues.apache.org/jira/browse/SOLR-1205

Again I need to emphasize that I HAVE NOT worked with this so it may be
a really bad suggestion. Or it may not apply. Or.

Alessandro:

You wrote: "The simple solution should be to not execute any copy field
instruction in an updatedDocument."

I don't think that would work as then legitimate copyFields wouldn't
be reflected.
Say you had a copyField from A to B. Now you atomically update A. That update
should be reflected in B...

Best,
Erick

On Wed, Jul 15, 2015 at 8:19 AM, Shawn Heisey  wrote:
> On 7/15/2015 8:55 AM, Martínez López, Alfonso wrote:
>> in some cases it can be necessary to have the copy field stored. My Solr 
>> instance is used by some legacy applications that need to retrive fields by 
>> some especific field names. That's why i need to mantain 2 copies of the 
>> same field: one with the old name and other for the new name (that is 
>> provided by others applications).
>>
>> In my case I could work this out at a higher lever but it would be very 
>> helpful it can be solve in the schema.xml.
>
> This is understandable ... but unless a new feature is created, stored
> copyField destinations are incompatible with Atomic Updates.
>
> If we do build a new feature, we'd need to figure out exactly how Solr
> should behave by default, and possibly whether there should be a
> configuration option to choose old or new behavior.
>
> It might be a good idea to have DistributedUpdateProcessor (or possibly
> the AtomicUpdateDocumentMerger that it uses) skip the import of any
> field that's a copyField destination by default, unless a configuration
> is present that restores the old behavior.  The information about which
> fields are copyField destinations *is* available to those classes, so it
> would probably be a very easy change.
>
> There's not really a STRONG need for that new feature ... if you store
> both the source and destination fields, then you are storing exactly the
> same information twice, which is wasteful and increases resource
> requirements.  Most applications that use Solr have at least surface
> knowledge of the schema, which should make it possible for the
> application writer to just pull the information from the source field.
> It seems that your application may not have that knowledge, though.
>
> Thanks,
> Shawn
>


RE: MapReduceIndexerTool

2015-07-15 Thread Reitzel, Charles
The OP asked about MapReduceIndexerTool.   My understanding is that this is 
actually somewhat slower than the standard indexing path and is recommended 
only if the site is already invested in the Hadoop infrastructure.  E.g. input 
files are already distributed on the Hadoop/Search cluster via HDFS.

See also:
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

Note, there is no coordination between replication between the HDFS and Solr 
systems.  Thus, if you configure Solr replication N > 1 for each shard, and the 
HDFS replication factor is M > 1, then you get N * M copies of all your index 
data.   That can add up fast ...

There is work underway to harmonize/mitigate Solr and HDFS replication:
Ability to set the replication factor for index files created by 
HDFSDirectoryFactory
https://issues.apache.org/jira/browse/SOLR-6305

To get a feel for the overall condition of MR/Solr integration, I looked at 
JIRA issues related to HDFS and Hadoop.   It appears to be an area with some 
decent bug fixes.  There are some larger feature issues as well, but it isn't 
clear how much momentum these have.   Can anyone (developers, current users) 
comment on the state of Hadoop integration?

-

Currently open JIRA issues for Solr containing "HDFS" or "Hadoop":
https://issues.apache.org/jira/browse/SOLR-5069?jql=project%20%3D%20SOLR%20AND%20status%20%3D%20OPEN%20AND%20%28text%20~%20%22HDFS%22%20OR%20text%20~%20%22Hadoop%22%29%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC%2C%20created%20ASC

Recently closed issues containing "HDFS" or "Hadoop":
https://issues.apache.org/jira/browse/SOLR-7458?jql=project%20%3D%20SOLR%20AND%20status%20!%3D%20OPEN%20AND%20%28text%20~%20%22HDFS%22%20OR%20text%20~%20%22Hadoop%22%29%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC


-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] 
Sent: Wednesday, July 15, 2015 11:24 AM
To: solr-user@lucene.apache.org
Subject: RE: copying data from one collection to another collection (solr cloud 
521)

Since they want explicitly search within a given "version" of the data, this 
seems like a textbook application for collection aliases.   

You could have N public collection names: current_stuff, previous_stuff_1, 
previous_stuff_2, ...   At any given time, these will be aliased to reference 
the "actual" collection names:
current_stuff -> stuff_20150712, 
previous_stuff_1 -> stuff_20150705, 
previous_stuff_2 -> stuff_20150628,
...

Every weekend, you create a new collection and index everything current into 
it.  Once done, reset all the aliases to point to the newest N collections and 
dropping the oldest:
current_stuff -> stuff_20150719
previous_stuff_1 -> stuff_20150712,
previous_stuff_2 -> stuff_20150705,
...

Collections API: Create or modify an Alias for a Collection
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4

Thus, you can keep the IDs the same and use them to compare to previous 
versions of any given document.   Useful, if only for debugging purposes.

Curious if there are opportunities for optimization here.  For example, would 
it be faster to make a file system copy of the most recent collection and load 
only changed documents (assuming the delta is available from the source system)?

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, July 13, 2015 11:55 PM
To: solr-user@lucene.apache.org
Subject: Re: copying data from one collection to another collection (solr cloud 
521)

bq: does offline

No. I'm talking about "collection aliasing". You can create an entirely new 
collection, index to it however  you want then switch to using that new 
collection.

bq: Any updates to EXISTING document in the LIVE collection should NOT be 
replicated to the previous week(s) snapshot(s)

then give it a new ID maybe?

Best,
Erick

On Mon, Jul 13, 2015 at 3:21 PM, Raja Pothuganti  
wrote:
> Thank you Erick
>>Actually, my question is why do it this way at all? Why not index 
>>directly to your "live" nodes? This is what SolrCloud is built for.
>>You an use "implicit" routing to create shards say, for each week and 
>>age out the ones that are "too old" as well.
>
>
> Any updates to EXISTING document in the LIVE collection should NOT be 
> replicated to the previous week(s) snapshot(s). Think of the 
> snapshot(s) as an archive of sort and searchable independent of LIVE. 
> We're aiming to support at most 2 archives of data in the past.
>
>
>>Another option would be to use "collection aliasing" to keep an 
>>offline index up to date then switch over when necessary.
>
> Does offline indexing refers to this link 
> https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0
> b271f
> 102aa/search-mr
>
>
> Thanks
> Raja
>
>
>
> On 7/13/15, 3:14 PM, "Erick Erickson"  wrote:
>
>>Actually, my question is why do 

Re: Sorting documents by child documents

2015-07-15 Thread Mikhail Khludnev
If you inlined the query rather than referenced the thread, it would be
easy to understand the problem.
once again, what doesn't meet your expectation: order of returned parents
or order of children attached to a parent doc?

On Wed, Jul 15, 2015 at 1:56 AM, DorZion  wrote:

> I can sort the parent documents with the ScoreMode function, you can take a
> look here:
>
>
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html
> <
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html
> >
>
> I just want to sort the documents with the field, instead of FunctionQuery.
> The sort mode takes the best score of the children and the parents are
> sorted by that score.
>
> Here is what will happen if I use the sort I want:
>
> *Original Query:*
> Document 1
>  Id: 5
>  Children 1
>  Id:51
>  Title : "B"
>  Children 2
>  Id:52
>  Title : "M"
>
> Document 2
>  Id: 6
>  Children 1
>  Id:61
>  Title : "Y"
>  Children 2
>  Id:62
>  Title : "Z"
>
> Document 3
>  Id: 6
>  Children 1
>  Id:61
>  Title : "C"
>  Children 2
>  Id:62
>  Title : "A"
>
> *Sorted Query (By title field):*
> Document 3
>  Id: 6
>  Children 1
>  Id:61
>  Title : "C"
>  Children 2
>  Id:62
>  Title : "A"
>
> Document 1
>  Id: 5
>  Children 1
>  Id:51
>  Title : "B"
>  Children 2
>  Id:52
>  Title : "M"
>
>
> Document 2
>  Id: 6
>  Children 1
>  Id:61
>  Title : "Y"
>  Children 2
>  Id:62
>  Title : "Z"
>
>
> As you can see, the documents are sorted by "title". The document with the
> child that have the lowest "value" of the title field, will be the first in
> the result.
>
> Thanks,
>
> Dor
>
>
>
> Alessandro Benedetti wrote
> > I would like to get a deep understanding of your problem…
> > How do you want to sort a parent document by a normal field of children
> ??
> >
> > Example:
> >
> > Document 1
> >  Id: 5
> >  Children 1
> >  Id:51
> >  Title : "A"
> >  Children 2
> >  Id:52
> >  Title : "Z"
> >
> > Document 2
> >  Id: 6
> >  Children 1
> >  Id:61
> >  Title : "C"
> >  Children 2
> >  Id:62
> >  Title : "B"
> >
> > How can you sort the parent based on children fields ?
> > You can sort a parent based on a value calculated out of children fields
> (
> > after you calculate an unique value out of them Max ? Sum ? Concat ? ext
> > ext).
> >
> > Can you explain better your problem ?
> >
> > Cheers
> >
> >
> > 2015-07-08 7:17 GMT+01:00 DorZion <
>
> > Dorzion@
>
> > >:
> >
> >> Hey,
> >>
> >> I'm using Solr 4.10.2 and I have child documents in every parent
> >> document.
> >>
> >> Previously, I used FunctionQuery to sort the documents:
> >>
> >>
> >>
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html
> >> <
> >>
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-nested-child-docs-with-FunctionQueries-tp4209940.html
> >> >
> >>
> >> Now, I want to sort the documents by their child documents with normal
> >> fields.
> >>
> >> It doesn't work when I use the "sort" parameter.
> >>
> >> Thanks in advance,
> >>
> >> Dor
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-child-documents-tp4216263.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Sorting-documents-by-child-documents-tp4216263p4217400.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Solr 5 options

2015-07-15 Thread Erick Erickson
If you're running in cloud mode, move to using collections with
the configs kept in Zookeeper.

Assuming you're not, you can use the create_core stuff, I'm
not sure what's unclear about it, did you try
bin/solr create_core -help? If that's not clear please make some
suggestions for making it more so.

But you don't even have to do that. Just put the core config
information you want somewhere under solr_home. I.e.
you'll have something like
solr_home/core1
solr_home/core2

In each core? dir you'll have a conf dir and a file "core.properties".

Start Solr and the cores should just be there. The "core.properties"
file can be empty, it's the marker for "core discovery" to assume it's
a core for Solr, see:
https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml

Best,
Erick

On Wed, Jul 15, 2015 at 5:44 AM, spleenboy  wrote:
> OK. so effectively use the core product as it was in Solr 4, running a
> schema.xml file to control doc structures and validation. In Sol 5, does
> anyone have a clear link or some pointers as to the options for bin/solr
> create_core to boot up the instance I need?
> Thanks for all the help.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-options-tp4217236p4217459.html
> Sent from the Solr - User mailing list archive at Nabble.com.


IndexSearcher.search(query, collect)

2015-07-15 Thread Chetan Vora
Hi all

I asked a related question before but couldn't get any response (see
SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
here.

Is there a way to invoke

IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
in invokes the search/collect implicitly on individual shards of the
collection? If not, how does one do this explicitly?

I have a usecase that was implemented using custom request handler in
standalone Solr and we're trying to move to SolrCloud. It is necessary for
us to understand how to do the above so we can use SolrCloud functionality.

Thanks and would *really really* appreciate ANY help.

Regards
CV


Re: Does update field feature work in a schema with dynamic fields?

2015-07-15 Thread Alessandro Benedetti
Sorry Erick, I completely agree with you, I didn;'t specify in details what
I was thinking :

" copy fields must not be executed if the updated field is not a source
field ( in a copy field couple) "

furthermore I agree again with you, copy field should try to give a
different analysis to the new field, simply changing the name would be
matter of field analysis. Nothing required at indexing time !

Cheers


2015-07-15 16:26 GMT+01:00 Erick Erickson :

> Alfonso:
>
> Haven't worked with this myself, but could "field aliasing" handle your
> use-case
> _without_ the need for a copyField at all?
> See: https://issues.apache.org/jira/browse/SOLR-1205
>
> Again I need to emphasize that I HAVE NOT worked with this so it may be
> a really bad suggestion. Or it may not apply. Or.
>
> Alessandro:
>
> You wrote: "The simple solution should be to not execute any copy field
> instruction in an updatedDocument."
>
> I don't think that would work as then legitimate copyFields wouldn't
> be reflected.
> Say you had a copyField from A to B. Now you atomically update A. That
> update
> should be reflected in B...
>
> Best,
> Erick
>
> On Wed, Jul 15, 2015 at 8:19 AM, Shawn Heisey  wrote:
> > On 7/15/2015 8:55 AM, Martínez López, Alfonso wrote:
> >> in some cases it can be necessary to have the copy field stored. My
> Solr instance is used by some legacy applications that need to retrive
> fields by some especific field names. That's why i need to mantain 2 copies
> of the same field: one with the old name and other for the new name (that
> is provided by others applications).
> >>
> >> In my case I could work this out at a higher lever but it would be very
> helpful it can be solve in the schema.xml.
> >
> > This is understandable ... but unless a new feature is created, stored
> > copyField destinations are incompatible with Atomic Updates.
> >
> > If we do build a new feature, we'd need to figure out exactly how Solr
> > should behave by default, and possibly whether there should be a
> > configuration option to choose old or new behavior.
> >
> > It might be a good idea to have DistributedUpdateProcessor (or possibly
> > the AtomicUpdateDocumentMerger that it uses) skip the import of any
> > field that's a copyField destination by default, unless a configuration
> > is present that restores the old behavior.  The information about which
> > fields are copyField destinations *is* available to those classes, so it
> > would probably be a very easy change.
> >
> > There's not really a STRONG need for that new feature ... if you store
> > both the source and destination fields, then you are storing exactly the
> > same information twice, which is wasteful and increases resource
> > requirements.  Most applications that use Solr have at least surface
> > knowledge of the schema, which should make it possible for the
> > application writer to just pull the information from the source field.
> > It seems that your application may not have that knowledge, though.
> >
> > Thanks,
> > Shawn
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Querying Nested documents

2015-07-15 Thread Alessandro Benedetti
2015-07-15 16:01 GMT+01:00 Mikhail Khludnev :

> 1. I can't get your explanation.
>
> 2. childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)
> is not correct, lacks of quotes , and pointless (selecting some term, and
> negating all terms gives nothing).


Not considering the syntax,
We are talking about union of sets, not intersection.
Why this query should give nothing ?
Should return the union of all the children with "some value" in image_uri
and the set with no value at all in that field .


> Thus, considerable syntax can be only
> childFilter="other_field:somevalue -image_uri_s:*"
>

I have to check, but probably you can answer me directly, is it not
possible to express disjunctions there ?


>
> 3.  I can only guess that you are asking about something like:
> http://localhost:8983/solr/demo/select?q={!parent
> which='type:parent'}image_uri_s:somevalue&fl=*,[child
> parentFilter=type:parent
> childFilter=-type:parent]&indent=true
>
>
> On Tue, Jul 14, 2015 at 11:56 PM, Ramesh Nuthalapati <
> ramesh.nuthalap...@gmail.com> wrote:
>
> > Yes you are right.
> >
> > So the query you are saying should be like below .. or did I
> misunderstood
> > it
> >
> > http://localhost:8983/solr/demo/select?q= {!parent
> > which='type:parent'}&fl=*,[child parentFilter=type:parent
> > childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)]&indent=true
> >
> > If so, I am getting an error with parsing field name.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217169p4217348.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: MapReduceIndexerTool

2015-07-15 Thread Erick Erickson
Charles:

bq:  My understanding is that this is actually somewhat slower than
the standard indexing path...

Yes and no. If you just use a single thread, you're right it'll be
slower since it has to copy a
bunch of stuff around. Then at the end, the --go-live step copies the
built index to Solr
then runs a  MERGEINDEXES on it. That copying stuff around can take some time.
Not to mention that a number of the intermediate steps do, a MERGEINDEXES
several times to gather lots of sub-shards together. So you can copy
your index several
times for each shard.

However, in a situation where you need to index a zillion documents
into, say, 5 nodes but
have 200 nodes available in your cluster, the extra copying time is
way more than offset by
being able to farm out the indexing across those 200 nodes. MRIT actually uses
EmbeddedSolrServer under the covers so you get a lot of parallelism.
Or in a  situation
where the amount of data is massive, copying it somewhere where the
standard indexing
path can find it may, in fact, be prohibitive. Or situations where the
ETL pipeline is a
bottleneck that can be farmed out over a zillion commodity nodes. So
It Depends (tm).

bq:  ... is recommended only if the site is already invested in the
Hadoop infrastructure

That's mostly my feeling too. Hadoop adds its own complexity, although
there are some really
cool tools out there to help. I'm just not in favor of adding
complexity unless there's a
compelling use-case. M/R indexing by itself can be enough inducement
to move to Hadoop
for some situations though.

Best,
Erick


On Wed, Jul 15, 2015 at 8:28 AM, Reitzel, Charles
 wrote:
> The OP asked about MapReduceIndexerTool.   My understanding is that this is 
> actually somewhat slower than the standard indexing path and is recommended 
> only if the site is already invested in the Hadoop infrastructure.  E.g. 
> input files are already distributed on the Hadoop/Search cluster via HDFS.
>
> See also:
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>
> Note, there is no coordination between replication between the HDFS and Solr 
> systems.  Thus, if you configure Solr replication N > 1 for each shard, and 
> the HDFS replication factor is M > 1, then you get N * M copies of all your 
> index data.   That can add up fast ...
>
> There is work underway to harmonize/mitigate Solr and HDFS replication:
> Ability to set the replication factor for index files created by 
> HDFSDirectoryFactory
> https://issues.apache.org/jira/browse/SOLR-6305
>
> To get a feel for the overall condition of MR/Solr integration, I looked at 
> JIRA issues related to HDFS and Hadoop.   It appears to be an area with some 
> decent bug fixes.  There are some larger feature issues as well, but it isn't 
> clear how much momentum these have.   Can anyone (developers, current users) 
> comment on the state of Hadoop integration?
>
> -
>
> Currently open JIRA issues for Solr containing "HDFS" or "Hadoop":
> https://issues.apache.org/jira/browse/SOLR-5069?jql=project%20%3D%20SOLR%20AND%20status%20%3D%20OPEN%20AND%20%28text%20~%20%22HDFS%22%20OR%20text%20~%20%22Hadoop%22%29%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC%2C%20created%20ASC
>
> Recently closed issues containing "HDFS" or "Hadoop":
> https://issues.apache.org/jira/browse/SOLR-7458?jql=project%20%3D%20SOLR%20AND%20status%20!%3D%20OPEN%20AND%20%28text%20~%20%22HDFS%22%20OR%20text%20~%20%22Hadoop%22%29%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20created%20ASC
>
>
> -Original Message-
> From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
> Sent: Wednesday, July 15, 2015 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: RE: copying data from one collection to another collection (solr 
> cloud 521)
>
> Since they want explicitly search within a given "version" of the data, this 
> seems like a textbook application for collection aliases.
>
> You could have N public collection names: current_stuff, previous_stuff_1, 
> previous_stuff_2, ...   At any given time, these will be aliased to reference 
> the "actual" collection names:
> current_stuff -> stuff_20150712,
> previous_stuff_1 -> stuff_20150705,
> previous_stuff_2 -> stuff_20150628,
> ...
>
> Every weekend, you create a new collection and index everything current into 
> it.  Once done, reset all the aliases to point to the newest N collections 
> and dropping the oldest:
> current_stuff -> stuff_20150719
> previous_stuff_1 -> stuff_20150712,
> previous_stuff_2 -> stuff_20150705,
> ...
>
> Collections API: Create or modify an Alias for a Collection
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4
>
> Thus, you can keep the IDs the same and use them to compare to previous 
> versions of any given document.   Useful, if only for debugging purposes.
>
> Curious if there are opportunities for optimization here.  For example, would 
> it be 

Re: copying data from one collection to another collection (solr cloud 521)

2015-07-15 Thread Raja Pothuganti
Hi Charles,
Thank you for the response. We will be using aliasing. Looking into ways
to avoid ingestion into each of the collections as you have mentioned "For
example, would it be faster to make a file system copy of the most recent
collection ..² 

MapReduceIndexerTool is not an option at this point.


One option is to Backup each shard from current_stuff collection at the
end of week to a particular location( say directory /opt/data/) and then
1) empty/delete existing documents in previous_stuff_1 collection
2) restore each corresponding shard from /opt/data/ to previous_stuff_1
collection by using backup & restore as suggested
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backu
ps+of+SolrCores


Trying to find if there are any better ways than above option.

Thanks
Raja




On 7/15/15, 10:23 AM, "Reitzel, Charles" 
wrote:

>Since they want explicitly search within a given "version" of the data,
>this seems like a textbook application for collection aliases.
>
>You could have N public collection names: current_stuff,
>previous_stuff_1, previous_stuff_2, ...   At any given time, these will
>be aliased to reference the "actual" collection names:
>   current_stuff -> stuff_20150712,
>   previous_stuff_1 -> stuff_20150705,
>   previous_stuff_2 -> stuff_20150628,
>   ...
>
>Every weekend, you create a new collection and index everything current
>into it.  Once done, reset all the aliases to point to the newest N
>collections and dropping the oldest:
>   current_stuff -> stuff_20150719
>   previous_stuff_1 -> stuff_20150712,
>   previous_stuff_2 -> stuff_20150705,
>   ...
>
>Collections API: Create or modify an Alias for a Collection
>https://cwiki.apache.org/confluence/display/solr/Collections+API#Collectio
>nsAPI-api4
>
>Thus, you can keep the IDs the same and use them to compare to previous
>versions of any given document.   Useful, if only for debugging purposes.
>
>Curious if there are opportunities for optimization here.  For example,
>would it be faster to make a file system copy of the most recent
>collection and load only changed documents (assuming the delta is
>available from the source system)?
>
>-Original Message-
>From: Erick Erickson [mailto:erickerick...@gmail.com]
>Sent: Monday, July 13, 2015 11:55 PM
>To: solr-user@lucene.apache.org
>Subject: Re: copying data from one collection to another collection (solr
>cloud 521)
>
>bq: does offline
>
>No. I'm talking about "collection aliasing". You can create an entirely
>new collection, index to it however  you want then switch to using that
>new collection.
>
>bq: Any updates to EXISTING document in the LIVE collection should NOT be
>replicated to the previous week(s) snapshot(s)
>
>then give it a new ID maybe?
>
>Best,
>Erick
>
>On Mon, Jul 13, 2015 at 3:21 PM, Raja Pothuganti
> wrote:
>> Thank you Erick
>>>Actually, my question is why do it this way at all? Why not index
>>>directly to your "live" nodes? This is what SolrCloud is built for.
>>>You an use "implicit" routing to create shards say, for each week and
>>>age out the ones that are "too old" as well.
>>
>>
>> Any updates to EXISTING document in the LIVE collection should NOT be
>> replicated to the previous week(s) snapshot(s). Think of the
>> snapshot(s) as an archive of sort and searchable independent of LIVE.
>> We're aiming to support at most 2 archives of data in the past.
>>
>>
>>>Another option would be to use "collection aliasing" to keep an
>>>offline index up to date then switch over when necessary.
>>
>> Does offline indexing refers to this link
>> https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe0
>> b271f
>> 102aa/search-mr
>>
>>
>> Thanks
>> Raja
>>
>>
>>
>> On 7/13/15, 3:14 PM, "Erick Erickson"  wrote:
>>
>>>Actually, my question is why do it this way at all? Why not index
>>>directly to your "live" nodes? This is what SolrCloud is built for.
>>>
>>>There's the new backup/restore functionality that's still a work in
>>>progress, see: https://issues.apache.org/jira/browse/SOLR-5750
>>>
>>>You an use "implicit" routing to create shards say, for each week and
>>>age out the ones that are "too old" as well.
>>>
>>>Another option would be to use "collection aliasing" to keep an
>>>offline index up to date then switch over when necessary.
>>>
>>>I'd really like to know this isn't an XY problem though, what's the
>>>high-level problem you're trying to solve?
>>>
>>>Best,
>>>Erick
>>>
>>>On Mon, Jul 13, 2015 at 12:49 PM, Raja Pothuganti
>>> wrote:

 Hi,
 We are setting up a new SolrCloud environment with 5.2.1 on Ubuntu
boxes. We currently ingest data into a large collection, call it LIVE.
After the full ingest is done we then trigger a delta delta ingestion
every 15 minutes to get the documents & data that have changed into
this LIVE instance.

 In Solr 4.X using a Master / Slave setup we had slaves that would
periodically (weekly, or monthly) refresh th

Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Steven White
Thank you all.  Looks like OR is a better choice vs. AND.

Charles: I don't understand what you mean by the "spellcheck component".
Do you mean OR works best with spell checker?

Steve

On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles <
charles.reit...@tiaa-cref.org> wrote:

> A common approach to this problem is to include the spellcheck component
> and, if there are corrections, include a "Did you mean ..." link in the
> results page.
>
> -Original Message-
> From: Walter Underwood [mailto:wun...@wunderwood.org]
> Sent: Wednesday, July 15, 2015 10:36 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Which default Boolean operator to set, AND or OR?
>
> The AND default has one big problem. If the user misspells a single word,
> they get no results. About 10% of queries are misspelled, so that means a
> lot more failures.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Jul 15, 2015, at 7:21 AM, Jack Krupansky 
> wrote:
>
> > It is simply precision (AND) vs. recall (OR) - the former tries to
> > limit the total result count, while the latter tries to focus on
> > relevancy of the top results even if the total result count is higher.
> >
> > Recall is good for discovery and browsing, where you sort of know what
> > you generally want, but not exactly with any great precision.
> >
> > Recall will include results that almost meet the query terms, but
> > maybe some are missing.
> >
> > Precision will guarantee and insist that all query terms are present.
> >
> > One great example for recall is a plagiarism query - enter all the
> > terms for a passage and then find documents that most closely
> > approximate the passage without being necessarily exact matches. IOW,
> > the plagiarizer changes a word here and there.
> >
> > -- Jack Krupansky
> >
> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White 
> wrote:
> >
> >> Hi Everyone,
> >>
> >> Out-of-the box, Solr (Lucene?) is set to use OR as the default
> >> Boolean operator.  Can someone tell me the advantages / disadvantages
> >> of using OR or AND as the default?
> >>
> >> I'm leaning toward AND as the default because the more words a user
> >> types, the narrower the result set should be.
> >>
> >> Thanks
> >>
> >> Steve
> >>
>
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA-CREF
> *
>
>


Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Erick Erickson
bq: Is there a way to invoke IndexSearcher.search(Query, Collector)

Problem is that this question doesn't make a lot of sense to me.
IndexSearcher is, by definition, local to a single Lucene
instance. Distributed requests are a whole different beast. If you're going
to try to use custom request handlers in a distributed environment
(SolrCloud), you need to abstract up a level, see:
Here are some places to start:

https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
http://wiki.apache.org/solr/WritingDistributedSearchComponents

The thing to be aware of is that the "usual" way of writing this
involves two passes. Say you want to return the top 10 docs and have 5 shards.
The first pass sends the request to one replica of each shard. Each returns
its top 10 docs, but only the doc ID and score (or sort criteria). Then the
aggregator (whichever node received the original requests) sorts those 50 docs
into the true top N and sends a second request to each of the shards hosting one
of those docs for the contents of the doc.

Now, you can probably bypass a lot of that if you're happy with
returning the topN
lists from all the shards, this two-pass mechanism was put in place to
handle, say,
a 100 shard system where you wouldn't want to transmit all the top  N from every
shard.

HTH,
Erick


On Wed, Jul 15, 2015 at 8:46 AM, Chetan Vora  wrote:
> Hi all
>
> I asked a related question before but couldn't get any response (see
> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
> here.
>
> Is there a way to invoke
>
> IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
> in invokes the search/collect implicitly on individual shards of the
> collection? If not, how does one do this explicitly?
>
> I have a usecase that was implemented using custom request handler in
> standalone Solr and we're trying to move to SolrCloud. It is necessary for
> us to understand how to do the above so we can use SolrCloud functionality.
>
> Thanks and would *really really* appreciate ANY help.
>
> Regards
> CV


Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Steven White
By the way, using OR as the default, other than returning more results as
more words are entered, the ranking and performance of the search remains
the same right?

Steve

On Wed, Jul 15, 2015 at 12:12 PM, Steven White  wrote:

> Thank you all.  Looks like OR is a better choice vs. AND.
>
> Charles: I don't understand what you mean by the "spellcheck component".
> Do you mean OR works best with spell checker?
>
> Steve
>
> On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles <
> charles.reit...@tiaa-cref.org> wrote:
>
>> A common approach to this problem is to include the spellcheck component
>> and, if there are corrections, include a "Did you mean ..." link in the
>> results page.
>>
>> -Original Message-
>> From: Walter Underwood [mailto:wun...@wunderwood.org]
>> Sent: Wednesday, July 15, 2015 10:36 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Which default Boolean operator to set, AND or OR?
>>
>> The AND default has one big problem. If the user misspells a single word,
>> they get no results. About 10% of queries are misspelled, so that means a
>> lot more failures.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> On Jul 15, 2015, at 7:21 AM, Jack Krupansky 
>> wrote:
>>
>> > It is simply precision (AND) vs. recall (OR) - the former tries to
>> > limit the total result count, while the latter tries to focus on
>> > relevancy of the top results even if the total result count is higher.
>> >
>> > Recall is good for discovery and browsing, where you sort of know what
>> > you generally want, but not exactly with any great precision.
>> >
>> > Recall will include results that almost meet the query terms, but
>> > maybe some are missing.
>> >
>> > Precision will guarantee and insist that all query terms are present.
>> >
>> > One great example for recall is a plagiarism query - enter all the
>> > terms for a passage and then find documents that most closely
>> > approximate the passage without being necessarily exact matches. IOW,
>> > the plagiarizer changes a word here and there.
>> >
>> > -- Jack Krupansky
>> >
>> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White 
>> wrote:
>> >
>> >> Hi Everyone,
>> >>
>> >> Out-of-the box, Solr (Lucene?) is set to use OR as the default
>> >> Boolean operator.  Can someone tell me the advantages / disadvantages
>> >> of using OR or AND as the default?
>> >>
>> >> I'm leaning toward AND as the default because the more words a user
>> >> types, the narrower the result set should be.
>> >>
>> >> Thanks
>> >>
>> >> Steve
>> >>
>>
>>
>> *
>> This e-mail may contain confidential or privileged information.
>> If you are not the intended recipient, please notify the sender
>> immediately and then delete it.
>>
>> TIAA-CREF
>> *
>>
>>
>


Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Erick Erickson
This is really an apples/oranges comparison. They're essentially different
queries, and scores aren't comparable across different queries.

If you're asking "if doc 1 and doc 2 are returned by defaulting to AND or OR,
are they in the same position relative to each other?" then I'm pretty sure the
answer is "you can't count on it". You'll match on different fields depending on
what the default is, and with boosting you just don't know.

Best,
Erick

On Wed, Jul 15, 2015 at 9:14 AM, Steven White  wrote:
> By the way, using OR as the default, other than returning more results as
> more words are entered, the ranking and performance of the search remains
> the same right?
>
> Steve
>
> On Wed, Jul 15, 2015 at 12:12 PM, Steven White  wrote:
>
>> Thank you all.  Looks like OR is a better choice vs. AND.
>>
>> Charles: I don't understand what you mean by the "spellcheck component".
>> Do you mean OR works best with spell checker?
>>
>> Steve
>>
>> On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles <
>> charles.reit...@tiaa-cref.org> wrote:
>>
>>> A common approach to this problem is to include the spellcheck component
>>> and, if there are corrections, include a "Did you mean ..." link in the
>>> results page.
>>>
>>> -Original Message-
>>> From: Walter Underwood [mailto:wun...@wunderwood.org]
>>> Sent: Wednesday, July 15, 2015 10:36 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Which default Boolean operator to set, AND or OR?
>>>
>>> The AND default has one big problem. If the user misspells a single word,
>>> they get no results. About 10% of queries are misspelled, so that means a
>>> lot more failures.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>> On Jul 15, 2015, at 7:21 AM, Jack Krupansky 
>>> wrote:
>>>
>>> > It is simply precision (AND) vs. recall (OR) - the former tries to
>>> > limit the total result count, while the latter tries to focus on
>>> > relevancy of the top results even if the total result count is higher.
>>> >
>>> > Recall is good for discovery and browsing, where you sort of know what
>>> > you generally want, but not exactly with any great precision.
>>> >
>>> > Recall will include results that almost meet the query terms, but
>>> > maybe some are missing.
>>> >
>>> > Precision will guarantee and insist that all query terms are present.
>>> >
>>> > One great example for recall is a plagiarism query - enter all the
>>> > terms for a passage and then find documents that most closely
>>> > approximate the passage without being necessarily exact matches. IOW,
>>> > the plagiarizer changes a word here and there.
>>> >
>>> > -- Jack Krupansky
>>> >
>>> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White 
>>> wrote:
>>> >
>>> >> Hi Everyone,
>>> >>
>>> >> Out-of-the box, Solr (Lucene?) is set to use OR as the default
>>> >> Boolean operator.  Can someone tell me the advantages / disadvantages
>>> >> of using OR or AND as the default?
>>> >>
>>> >> I'm leaning toward AND as the default because the more words a user
>>> >> types, the narrower the result set should be.
>>> >>
>>> >> Thanks
>>> >>
>>> >> Steve
>>> >>
>>>
>>>
>>> *
>>> This e-mail may contain confidential or privileged information.
>>> If you are not the intended recipient, please notify the sender
>>> immediately and then delete it.
>>>
>>> TIAA-CREF
>>> *
>>>
>>>
>>


Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Chetan Vora
Erick

Thanks for your response and for the pointers! This will be a good starting
point; I will go through these.

The good news is in our usecase, we don't really care about the two passes.
In fact, our results are ConstantScore so we only need to aggregrate (i/e
sum) the results from each shard.

Regards
Chetan



On Wed, Jul 15, 2015 at 12:14 PM, Erick Erickson 
wrote:

> bq: Is there a way to invoke IndexSearcher.search(Query, Collector)
>
> Problem is that this question doesn't make a lot of sense to me.
> IndexSearcher is, by definition, local to a single Lucene
> instance. Distributed requests are a whole different beast. If you're going
> to try to use custom request handlers in a distributed environment
> (SolrCloud), you need to abstract up a level, see:
> Here are some places to start:
>
>
> https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
> http://wiki.apache.org/solr/WritingDistributedSearchComponents
>
> The thing to be aware of is that the "usual" way of writing this
> involves two passes. Say you want to return the top 10 docs and have 5
> shards.
> The first pass sends the request to one replica of each shard. Each returns
> its top 10 docs, but only the doc ID and score (or sort criteria). Then the
> aggregator (whichever node received the original requests) sorts those 50
> docs
> into the true top N and sends a second request to each of the shards
> hosting one
> of those docs for the contents of the doc.
>
> Now, you can probably bypass a lot of that if you're happy with
> returning the topN
> lists from all the shards, this two-pass mechanism was put in place to
> handle, say,
> a 100 shard system where you wouldn't want to transmit all the top  N from
> every
> shard.
>
> HTH,
> Erick
>
>
> On Wed, Jul 15, 2015 at 8:46 AM, Chetan Vora  wrote:
> > Hi all
> >
> > I asked a related question before but couldn't get any response (see
> > SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
> > here.
> >
> > Is there a way to invoke
> >
> > IndexSearcher.search(Query, Collector) over a SolrCloud collection so
> that
> > in invokes the search/collect implicitly on individual shards of the
> > collection? If not, how does one do this explicitly?
> >
> > I have a usecase that was implemented using custom request handler in
> > standalone Solr and we're trying to move to SolrCloud. It is necessary
> for
> > us to understand how to do the above so we can use SolrCloud
> functionality.
> >
> > Thanks and would *really really* appreciate ANY help.
> >
> > Regards
> > CV
>


Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Steven White
Hi Erick,

I understand there are variables that will impact ranking.  However, if I
leave my edismax setting as is and simply switch from AND to OR as the
default Boolean, now if a user types "apples oranges" (without quotes) will
the ranking be the same as when I had AND?  Will the performance be the
same as when I had AND as the default?

Thanks

Steve

On Wed, Jul 15, 2015 at 12:26 PM, Erick Erickson 
wrote:

> This is really an apples/oranges comparison. They're essentially different
> queries, and scores aren't comparable across different queries.
>
> If you're asking "if doc 1 and doc 2 are returned by defaulting to AND or
> OR,
> are they in the same position relative to each other?" then I'm pretty
> sure the
> answer is "you can't count on it". You'll match on different fields
> depending on
> what the default is, and with boosting you just don't know.
>
> Best,
> Erick
>
> On Wed, Jul 15, 2015 at 9:14 AM, Steven White 
> wrote:
> > By the way, using OR as the default, other than returning more results as
> > more words are entered, the ranking and performance of the search remains
> > the same right?
> >
> > Steve
> >
> > On Wed, Jul 15, 2015 at 12:12 PM, Steven White 
> wrote:
> >
> >> Thank you all.  Looks like OR is a better choice vs. AND.
> >>
> >> Charles: I don't understand what you mean by the "spellcheck component".
> >> Do you mean OR works best with spell checker?
> >>
> >> Steve
> >>
> >> On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles <
> >> charles.reit...@tiaa-cref.org> wrote:
> >>
> >>> A common approach to this problem is to include the spellcheck
> component
> >>> and, if there are corrections, include a "Did you mean ..." link in the
> >>> results page.
> >>>
> >>> -Original Message-
> >>> From: Walter Underwood [mailto:wun...@wunderwood.org]
> >>> Sent: Wednesday, July 15, 2015 10:36 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Which default Boolean operator to set, AND or OR?
> >>>
> >>> The AND default has one big problem. If the user misspells a single
> word,
> >>> they get no results. About 10% of queries are misspelled, so that
> means a
> >>> lot more failures.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
> >>>
> >>> On Jul 15, 2015, at 7:21 AM, Jack Krupansky 
> >>> wrote:
> >>>
> >>> > It is simply precision (AND) vs. recall (OR) - the former tries to
> >>> > limit the total result count, while the latter tries to focus on
> >>> > relevancy of the top results even if the total result count is
> higher.
> >>> >
> >>> > Recall is good for discovery and browsing, where you sort of know
> what
> >>> > you generally want, but not exactly with any great precision.
> >>> >
> >>> > Recall will include results that almost meet the query terms, but
> >>> > maybe some are missing.
> >>> >
> >>> > Precision will guarantee and insist that all query terms are present.
> >>> >
> >>> > One great example for recall is a plagiarism query - enter all the
> >>> > terms for a passage and then find documents that most closely
> >>> > approximate the passage without being necessarily exact matches. IOW,
> >>> > the plagiarizer changes a word here and there.
> >>> >
> >>> > -- Jack Krupansky
> >>> >
> >>> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White 
> >>> wrote:
> >>> >
> >>> >> Hi Everyone,
> >>> >>
> >>> >> Out-of-the box, Solr (Lucene?) is set to use OR as the default
> >>> >> Boolean operator.  Can someone tell me the advantages /
> disadvantages
> >>> >> of using OR or AND as the default?
> >>> >>
> >>> >> I'm leaning toward AND as the default because the more words a user
> >>> >> types, the narrower the result set should be.
> >>> >>
> >>> >> Thanks
> >>> >>
> >>> >> Steve
> >>> >>
> >>>
> >>>
> >>>
> *
> >>> This e-mail may contain confidential or privileged information.
> >>> If you are not the intended recipient, please notify the sender
> >>> immediately and then delete it.
> >>>
> >>> TIAA-CREF
> >>>
> *
> >>>
> >>>
> >>
>


Re: Querying Nested documents

2015-07-15 Thread Mikhail Khludnev
ok. I checked with with my data

color:orlean  => "numFound": 1,
-color:[* TO *] => "numFound": 602096 (it used to return 0 until 'pure
negational' (sic) queries were delivered)
color:orlean -color:[* TO *] => "numFound": 0,
color:orlean (*:* -color:[* TO *])  => "numFound": 602097,

fyi
https://lucidworks.com/blog/why-not-and-or-and-not/


On Wed, Jul 15, 2015 at 10:55 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> 2015-07-15 16:01 GMT+01:00 Mikhail Khludnev :
>
> > 1. I can't get your explanation.
> >
> > 2. childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)
> > is not correct, lacks of quotes , and pointless (selecting some term, and
> > negating all terms gives nothing).
>
>
> Not considering the syntax,
> We are talking about union of sets, not intersection.
> Why this query should give nothing ?
> Should return the union of all the children with "some value" in image_uri
> and the set with no value at all in that field .
>
>
> > Thus, considerable syntax can be only
> > childFilter="other_field:somevalue -image_uri_s:*"
> >
>
> I have to check, but probably you can answer me directly, is it not
> possible to express disjunctions there ?
>
>
> >
> > 3.  I can only guess that you are asking about something like:
> > http://localhost:8983/solr/demo/select?q={!parent
> > which='type:parent'}image_uri_s:somevalue&fl=*,[child
> > parentFilter=type:parent
> > childFilter=-type:parent]&indent=true
> >
> >
> > On Tue, Jul 14, 2015 at 11:56 PM, Ramesh Nuthalapati <
> > ramesh.nuthalap...@gmail.com> wrote:
> >
> > > Yes you are right.
> > >
> > > So the query you are saying should be like below .. or did I
> > misunderstood
> > > it
> > >
> > > http://localhost:8983/solr/demo/select?q= {!parent
> > > which='type:parent'}&fl=*,[child parentFilter=type:parent
> > > childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)]&indent=true
> > >
> > > If so, I am getting an error with parsing field name.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217169p4217348.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Mikhail Khludnev
On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora  wrote:

> Hi all
>
> I asked a related question before but couldn't get any response (see
> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
> here.
>
> Is there a way to invoke
>
> IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
> in invokes the search/collect implicitly on individual shards of the
> collection? If not, how does one do this explicitly?
>
> I have a usecase that was implemented using custom request handler in
> standalone Solr and we're trying to move to SolrCloud.


In your  custom request handler do you add any new "nodes" into response?
or you just modifies the standard response structure?

It is necessary for
> us to understand how to do the above so we can use SolrCloud functionality.
>
> Thanks and would *really really* appreciate ANY help.
>
> Regards
> CV
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Querying Nested documents

2015-07-15 Thread Ramesh Nuthalapati
Mikhail - 

This worked great.

http://localhost:8983/solr/demo/select?q={!parent 
which='type:parent'}image_uri_s:somevalue&fl=*,[child 
parentFilter=type:parent 
childFilter=-type:parent]&indent=true 

Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217169p4217534.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Chetan Vora
Mikhail

We do add new nodes with our custom results in some cases... just curious-
 does that preclude us from doing what we're trying to do above? FWIW, we
can avoid the custom nodes if we had to.

Chetan

On Wed, Jul 15, 2015 at 12:39 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

>
>
> On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora 
> wrote:
>
>> Hi all
>>
>> I asked a related question before but couldn't get any response (see
>> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
>> here.
>>
>> Is there a way to invoke
>>
>> IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
>> in invokes the search/collect implicitly on individual shards of the
>> collection? If not, how does one do this explicitly?
>>
>> I have a usecase that was implemented using custom request handler in
>> standalone Solr and we're trying to move to SolrCloud.
>
>
> In your  custom request handler do you add any new "nodes" into response?
> or you just modifies the standard response structure?
>
> It is necessary for
>> us to understand how to do the above so we can use SolrCloud
>> functionality.
>>
>> Thanks and would *really really* appreciate ANY help.
>>
>> Regards
>> CV
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Alessandro Benedetti
Talking about performances you should take a look to the difference in
performance between :


- disjunction of K sorted arrays ( n*k*log(k)) in Lucene - where *k* are
the disjunction clauses and *n* the average posting list size (just learned
today from an expert lucene committer))

- conjunction of K sorted arrays - not 100 % sure about the complexity, i
should check concretely the algorithm, but i suggest there is no
difference, or not so much difference ( I would be glad someone here to
show the resources, or knowledge) .

basically when dealing with union or intersection of sorted arrays, the
algorithm that solve the two problems are quite comparable in term of
performances.

I would say that the performance difference is irrelevant but i would like
someone to contradict me .

Cheers



2015-07-15 17:34 GMT+01:00 Steven White :

> Hi Erick,
>
> I understand there are variables that will impact ranking.  However, if I
> leave my edismax setting as is and simply switch from AND to OR as the
> default Boolean, now if a user types "apples oranges" (without quotes) will
> the ranking be the same as when I had AND?  Will the performance be the
> same as when I had AND as the default?
>
> Thanks
>
> Steve
>
> On Wed, Jul 15, 2015 at 12:26 PM, Erick Erickson 
> wrote:
>
> > This is really an apples/oranges comparison. They're essentially
> different
> > queries, and scores aren't comparable across different queries.
> >
> > If you're asking "if doc 1 and doc 2 are returned by defaulting to AND or
> > OR,
> > are they in the same position relative to each other?" then I'm pretty
> > sure the
> > answer is "you can't count on it". You'll match on different fields
> > depending on
> > what the default is, and with boosting you just don't know.
> >
> > Best,
> > Erick
> >
> > On Wed, Jul 15, 2015 at 9:14 AM, Steven White 
> > wrote:
> > > By the way, using OR as the default, other than returning more results
> as
> > > more words are entered, the ranking and performance of the search
> remains
> > > the same right?
> > >
> > > Steve
> > >
> > > On Wed, Jul 15, 2015 at 12:12 PM, Steven White 
> > wrote:
> > >
> > >> Thank you all.  Looks like OR is a better choice vs. AND.
> > >>
> > >> Charles: I don't understand what you mean by the "spellcheck
> component".
> > >> Do you mean OR works best with spell checker?
> > >>
> > >> Steve
> > >>
> > >> On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles <
> > >> charles.reit...@tiaa-cref.org> wrote:
> > >>
> > >>> A common approach to this problem is to include the spellcheck
> > component
> > >>> and, if there are corrections, include a "Did you mean ..." link in
> the
> > >>> results page.
> > >>>
> > >>> -Original Message-
> > >>> From: Walter Underwood [mailto:wun...@wunderwood.org]
> > >>> Sent: Wednesday, July 15, 2015 10:36 AM
> > >>> To: solr-user@lucene.apache.org
> > >>> Subject: Re: Which default Boolean operator to set, AND or OR?
> > >>>
> > >>> The AND default has one big problem. If the user misspells a single
> > word,
> > >>> they get no results. About 10% of queries are misspelled, so that
> > means a
> > >>> lot more failures.
> > >>>
> > >>> wunder
> > >>> Walter Underwood
> > >>> wun...@wunderwood.org
> > >>> http://observer.wunderwood.org/  (my blog)
> > >>>
> > >>>
> > >>> On Jul 15, 2015, at 7:21 AM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > It is simply precision (AND) vs. recall (OR) - the former tries to
> > >>> > limit the total result count, while the latter tries to focus on
> > >>> > relevancy of the top results even if the total result count is
> > higher.
> > >>> >
> > >>> > Recall is good for discovery and browsing, where you sort of know
> > what
> > >>> > you generally want, but not exactly with any great precision.
> > >>> >
> > >>> > Recall will include results that almost meet the query terms, but
> > >>> > maybe some are missing.
> > >>> >
> > >>> > Precision will guarantee and insist that all query terms are
> present.
> > >>> >
> > >>> > One great example for recall is a plagiarism query - enter all the
> > >>> > terms for a passage and then find documents that most closely
> > >>> > approximate the passage without being necessarily exact matches.
> IOW,
> > >>> > the plagiarizer changes a word here and there.
> > >>> >
> > >>> > -- Jack Krupansky
> > >>> >
> > >>> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White <
> swhite4...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> >> Hi Everyone,
> > >>> >>
> > >>> >> Out-of-the box, Solr (Lucene?) is set to use OR as the default
> > >>> >> Boolean operator.  Can someone tell me the advantages /
> > disadvantages
> > >>> >> of using OR or AND as the default?
> > >>> >>
> > >>> >> I'm leaning toward AND as the default because the more words a
> user
> > >>> >> types, the narrower the result set should be.
> > >>> >>
> > >>> >> Thanks
> > >>> >>
> > >>> >> Steve
> > >>> >>
> > >>>
> > >>>
> > >>>
> > **

Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Erick Erickson
bq: now if a user types "apples oranges" (without quotes) will
the ranking be the same as when I had AND?

You haven't defined "same". But at root I think this is a red
herring, you haven't stated why you care. They're different queries
so I think the question is really which is more or less satisfactory
when considering the use-case.


Best,
Erick

On Wed, Jul 15, 2015 at 10:06 AM, Alessandro Benedetti
 wrote:
> Talking about performances you should take a look to the difference in
> performance between :
>
>
> - disjunction of K sorted arrays ( n*k*log(k)) in Lucene - where *k* are
> the disjunction clauses and *n* the average posting list size (just learned
> today from an expert lucene committer))
>
> - conjunction of K sorted arrays - not 100 % sure about the complexity, i
> should check concretely the algorithm, but i suggest there is no
> difference, or not so much difference ( I would be glad someone here to
> show the resources, or knowledge) .
>
> basically when dealing with union or intersection of sorted arrays, the
> algorithm that solve the two problems are quite comparable in term of
> performances.
>
> I would say that the performance difference is irrelevant but i would like
> someone to contradict me .
>
> Cheers
>
>
>
> 2015-07-15 17:34 GMT+01:00 Steven White :
>
>> Hi Erick,
>>
>> I understand there are variables that will impact ranking.  However, if I
>> leave my edismax setting as is and simply switch from AND to OR as the
>> default Boolean, now if a user types "apples oranges" (without quotes) will
>> the ranking be the same as when I had AND?  Will the performance be the
>> same as when I had AND as the default?
>>
>> Thanks
>>
>> Steve
>>
>> On Wed, Jul 15, 2015 at 12:26 PM, Erick Erickson 
>> wrote:
>>
>> > This is really an apples/oranges comparison. They're essentially
>> different
>> > queries, and scores aren't comparable across different queries.
>> >
>> > If you're asking "if doc 1 and doc 2 are returned by defaulting to AND or
>> > OR,
>> > are they in the same position relative to each other?" then I'm pretty
>> > sure the
>> > answer is "you can't count on it". You'll match on different fields
>> > depending on
>> > what the default is, and with boosting you just don't know.
>> >
>> > Best,
>> > Erick
>> >
>> > On Wed, Jul 15, 2015 at 9:14 AM, Steven White 
>> > wrote:
>> > > By the way, using OR as the default, other than returning more results
>> as
>> > > more words are entered, the ranking and performance of the search
>> remains
>> > > the same right?
>> > >
>> > > Steve
>> > >
>> > > On Wed, Jul 15, 2015 at 12:12 PM, Steven White 
>> > wrote:
>> > >
>> > >> Thank you all.  Looks like OR is a better choice vs. AND.
>> > >>
>> > >> Charles: I don't understand what you mean by the "spellcheck
>> component".
>> > >> Do you mean OR works best with spell checker?
>> > >>
>> > >> Steve
>> > >>
>> > >> On Wed, Jul 15, 2015 at 11:07 AM, Reitzel, Charles <
>> > >> charles.reit...@tiaa-cref.org> wrote:
>> > >>
>> > >>> A common approach to this problem is to include the spellcheck
>> > component
>> > >>> and, if there are corrections, include a "Did you mean ..." link in
>> the
>> > >>> results page.
>> > >>>
>> > >>> -Original Message-
>> > >>> From: Walter Underwood [mailto:wun...@wunderwood.org]
>> > >>> Sent: Wednesday, July 15, 2015 10:36 AM
>> > >>> To: solr-user@lucene.apache.org
>> > >>> Subject: Re: Which default Boolean operator to set, AND or OR?
>> > >>>
>> > >>> The AND default has one big problem. If the user misspells a single
>> > word,
>> > >>> they get no results. About 10% of queries are misspelled, so that
>> > means a
>> > >>> lot more failures.
>> > >>>
>> > >>> wunder
>> > >>> Walter Underwood
>> > >>> wun...@wunderwood.org
>> > >>> http://observer.wunderwood.org/  (my blog)
>> > >>>
>> > >>>
>> > >>> On Jul 15, 2015, at 7:21 AM, Jack Krupansky <
>> jack.krupan...@gmail.com>
>> > >>> wrote:
>> > >>>
>> > >>> > It is simply precision (AND) vs. recall (OR) - the former tries to
>> > >>> > limit the total result count, while the latter tries to focus on
>> > >>> > relevancy of the top results even if the total result count is
>> > higher.
>> > >>> >
>> > >>> > Recall is good for discovery and browsing, where you sort of know
>> > what
>> > >>> > you generally want, but not exactly with any great precision.
>> > >>> >
>> > >>> > Recall will include results that almost meet the query terms, but
>> > >>> > maybe some are missing.
>> > >>> >
>> > >>> > Precision will guarantee and insist that all query terms are
>> present.
>> > >>> >
>> > >>> > One great example for recall is a plagiarism query - enter all the
>> > >>> > terms for a passage and then find documents that most closely
>> > >>> > approximate the passage without being necessarily exact matches.
>> IOW,
>> > >>> > the plagiarizer changes a word here and there.
>> > >>> >
>> > >>> > -- Jack Krupansky
>> > >>> >
>> > >>> > On Wed, Jul 15, 2015 at 8:16 AM, Steven White <
>> swhite4...@gmail.c

Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Erick Erickson
bq:  does that preclude us from doing what we're trying to do above?

Not at all. You just have to process each response and combine them
perhaps.

In this case, you might be able to get away with just specifying the
shards parameter to the query and having the app layer deal with
the responses. At least that's what I'd start with, keeping in mind that
more complex processing may be necessary eventually.

Best,
Erick

On Wed, Jul 15, 2015 at 10:00 AM, Chetan Vora  wrote:
> Mikhail
>
> We do add new nodes with our custom results in some cases... just curious-
>  does that preclude us from doing what we're trying to do above? FWIW, we
> can avoid the custom nodes if we had to.
>
> Chetan
>
> On Wed, Jul 15, 2015 at 12:39 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>>
>>
>> On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora 
>> wrote:
>>
>>> Hi all
>>>
>>> I asked a related question before but couldn't get any response (see
>>> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
>>> here.
>>>
>>> Is there a way to invoke
>>>
>>> IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
>>> in invokes the search/collect implicitly on individual shards of the
>>> collection? If not, how does one do this explicitly?
>>>
>>> I have a usecase that was implemented using custom request handler in
>>> standalone Solr and we're trying to move to SolrCloud.
>>
>>
>> In your  custom request handler do you add any new "nodes" into response?
>> or you just modifies the standard response structure?
>>
>> It is necessary for
>>> us to understand how to do the above so we can use SolrCloud
>>> functionality.
>>>
>>> Thanks and would *really really* appreciate ANY help.
>>>
>>> Regards
>>> CV
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> 
>> 
>>


Re: Possible memory leak? Help!

2015-07-15 Thread Timothy Potter
What are your cache sizes? Max doc?

Also, what GC settings are you using? 6GB isn't all that much for a
memory-intensive app like Solr, esp. given the number of facet fields
you have. Lastly, are you using docvalues for your facet fields? That
should help reduce the amount of heap needed to compute facets.

On Tue, Jul 14, 2015 at 2:33 PM, Yael Gurevich  wrote:
> Hi,
>
> We're running Solr 4.10.1 on Linux using Tomcat. Distributed environment,
> 40 virtual servers with high resources. Concurrent queries that are quite
> complex (may be hundreds of terms), NRT indexing and a few hundreds of
> facet fields which might have many (hundreds of thousands) distinct values.
>
> We've configured a 6GB JVM heap, and after quite a bit of work, it seems to
> be pretty well configured GC parameter-wise (we're using CMS and ParNew).
>
> The following problem occurs -
> Once every couple of hours, suddenly start getting
> "concurrent-mode-failure" on one or more servers, the memory starts
> climbing up further and further and "concurrent-mode-failure" continues.
> Naturally, during this time, SOLR is unresponsive and the queries are
> timed-out. Eventually it might pass (GC will succeed), after 5-10 minutes.
> Sometimes this phenomenon can occur for a great deal of time, one server
> goes up and then another and so forth.
>
> Memory dumps point to ConcurrentLRUCache (used in filterCache and
> fieldValueCache). Mathematically speaking, the sizes I see in the dumps do
> not make sense. The configured sizes shouldn't take up more than a few
> hunderds of MBs.
>
> Any ideas? Anyone seen this kind of problem?


DIH Not Indexing Two Documents

2015-07-15 Thread Paden
Hello, 

I've ran into quite the snag and I'm wondering if anyone can help me out
here. So the situation. 
I am using the DataImportHandler to pull from a database and a Linux file
system. The database has the metadata. The file system the document text. I
thought it had indexed all the files I had in the file system just fine.
HOWEVER, when I was trying to filter out bad documents I realized there were
two documents that existed in the file system that the DIH was not index.
Well I guess I shouldn't say that. When I run a faceted query with the
Authors facet enabled and the query as *:* to get all the results. It only
comes out with 279 out of the 281 documents indexed. And at the bottom when
I look at the authors of those two documents it comes out as 

   "Author #278",
0,
  "Author #279",
0

They have real names those are just filler. So this got me thinking that

Debug Idea number 1. The documents do not exist in the file system. or the
link is bad. It's pulling the metadata information just not the document
text. But no. The link is right. They both work. They exist and the links
are good. 

Debug Idea number 2. It isn't pulling the text so I can't search it. Okay.
So run a debug query when I use the dataimport handler.

So the debug query only indexes the first ten documents. Which I assume in
the default so let me know if I'm wrong. 

And the strangest part. When I run a query on the debug import session. I
can search for my documents. And it includes them in the faceted search.
There are only 8 documents to search through (it throws out 2 because they
only exist as hardcopies). And I can look for them. 

   "Author #278",
1,
  "Author #279",
1

What is going on? Cause I am so very confused. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH Not Indexing Two Documents

2015-07-15 Thread Paden
That should be author 280 and 281. Sorry



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Rerank queries and grouping

2015-07-15 Thread Diego Ceccarelli
Hi Everyone,

I need to use a RankQuery within a grouping [1].
I did some experiments with RerankQuery [2]  and solr 4.10.2 and it seems
that
if you group on a field, the reranking query is completely ignored
(on the cloud, and on a single instance).
I would expect to see the results in each group reranked using the
RerankQuery.

I had a look at the grouping code and documentation and,
if I correctly understood, the grouping works in two steps:

1) first the top groups are retrieved
2) top documents for each group in the top groups are retrieved.

I thought that the collector generated by a RankQuery could be injected
in 2), i.e., for each group set a rerank collector... but I'm not sure if
this solution
is feasable since the collectors are set in Lucene
(AbstractSecondPassGroupingCollector)
and a RankQuery is defined in Solr...

Any suggestion?

Thanks,
Diego

[1] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
[2] https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking


Disable transaction log with SOLRCloud without replicas

2015-07-15 Thread SolrUser1543
from here :
https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
we can learn that Transaction Log is needed when replicas are used in SOLR
cloud . 

Do I need it if I am not using a replicas ?
Could it be disabled for performance improvement ?  

What are negative influence may be in this case ?  





 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-transaction-log-with-SOLRCloud-without-replicas-tp4217554.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to ignore unavailable shards

2015-07-15 Thread SolrUser1543
I have handler configured in solr.config as shard.tolerant = true , which
means ignore unavailable shards when returning a results .

Sometime shards are not really down,but doing GC or heavy commit . 

Is it possible and how to ignore them   ? I prefer to get a partial result
instead of timeout error . 

I am using solr 4.10 with many shards and intensive indexing .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-ignore-unavailable-shards-tp4217556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Mikhail Khludnev
On Wed, Jul 15, 2015 at 12:00 PM, Chetan Vora  wrote:

> Mikhail
>
> We do add new nodes with our custom results in some cases... just curious-
>  does that preclude us from doing what we're trying to do above? FWIW, we
> can avoid the custom nodes if we had to.
>
If your custom component doesn't modify the standard response structure,
default components logic do everything for you in SolrCloud (at least, you
need to remember shards.df trick, if it still necessary). Otherwise, you
need to implement shards' results merging yourself eg like in
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L680



>
> Chetan
>
> On Wed, Jul 15, 2015 at 12:39 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>>
>>
>> On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora 
>> wrote:
>>
>>> Hi all
>>>
>>> I asked a related question before but couldn't get any response (see
>>> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
>>> here.
>>>
>>> Is there a way to invoke
>>>
>>> IndexSearcher.search(Query, Collector) over a SolrCloud collection so
>>> that
>>> in invokes the search/collect implicitly on individual shards of the
>>> collection? If not, how does one do this explicitly?
>>>
>>> I have a usecase that was implemented using custom request handler in
>>> standalone Solr and we're trying to move to SolrCloud.
>>
>>
>> In your  custom request handler do you add any new "nodes" into response?
>> or you just modifies the standard response structure?
>>
>> It is necessary for
>>> us to understand how to do the above so we can use SolrCloud
>>> functionality.
>>>
>>> Thanks and would *really really* appreciate ANY help.
>>>
>>> Regards
>>> CV
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> 
>> 
>>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Disable transaction log with SOLRCloud without replicas

2015-07-15 Thread Erick Erickson
bq: Do I need it if I am not using a replicas

Yes. The other function of transaction logs is
to recover documents indexed to segments
that haven't been closed in the event of
abnormal termination (i.e. somebody pulls
the plug).

Here's some info you might find useful:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

If you don't care about that capability you
an shut tlogs off though.

Best,
Erick

On Wed, Jul 15, 2015 at 11:42 AM, SolrUser1543  wrote:
> from here :
> https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
> we can learn that Transaction Log is needed when replicas are used in SOLR
> cloud .
>
> Do I need it if I am not using a replicas ?
> Could it be disabled for performance improvement ?
>
> What are negative influence may be in this case ?
>
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Disable-transaction-log-with-SOLRCloud-without-replicas-tp4217554.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Migrating from solr cores to collections

2015-07-15 Thread tedsolr
After playing with SolrCloud I answered my own question: multiple collections
can live on the same node. Following the how-to in the solr-ref-guide was
getting me confused.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Migrating-from-solr-cores-to-collections-tp4217346p4217558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH Not Indexing Two Documents

2015-07-15 Thread Erick Erickson
My first guess is that somehow these two documents have
the same  as some other documents so later
docs are replacing newer docs. Although not conclusive,
looking at the admin page for the cores in question may
show numDocs=278 and maxDoc=280 or some such in
which case that would be what's happening. This is not
conclusive though since segment merging may bring these
two numbers back to equality even if.

Best,
Erick

On Wed, Jul 15, 2015 at 11:19 AM, Paden  wrote:
> That should be author 280 and 281. Sorry
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217547.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disable transaction log with SOLRCloud without replicas

2015-07-15 Thread Shawn Heisey
On 7/15/2015 12:42 PM, SolrUser1543 wrote:
> from here :
> https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance
> we can learn that Transaction Log is needed when replicas are used in SOLR
> cloud . 
>
> Do I need it if I am not using a replicas ?
> Could it be disabled for performance improvement ?  
>
> What are negative influence may be in this case ?  

Have you benchmarked performance with and without the log to see how
much faster it actually is?  I'm sure it's faster, but unless your
documents are particularly large, I doubt it's a LOT faster, and you may
lower your reliability.

Unless it's causing you problems, it's a good idea to have that feature
enabled.  You can keep transaction log size under control (with no
change in searcher functionality) by simply configuring autoCommit with
a short maxTime (five minutes or less) and openSearcher=false.

If you want the log disabled, you might want to also consider changing
your directoryFactory in solrconfig.xml from the default
NRTCachingDirectoryFactory to MMapDirectoryFactory, so you can be
absolutely sure that all commits are flushed to disk and not cached in RAM.

Thanks,
Shawn



Re: DIH Not Indexing Two Documents

2015-07-15 Thread Paden
You were 100 percent right. I went back and checked the metadata looking for
multiple instances of the same file path. Both of the files had an extra set
of metadata with the same filepath. Thank you very much. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546p4217569.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: copying data from one collection to another collection (solr cloud 521)

2015-07-15 Thread Reitzel, Charles
Sorry in advance if I am beating a dead horse here ...

Here is an article by Mark Miller that gives some background and examples:
http://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/

In particular, see the section entitled "Update Alias".

-Original Message-
From: Raja Pothuganti [mailto:rpothuga...@competitrack.com] 
Sent: Wednesday, July 15, 2015 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: copying data from one collection to another collection (solr cloud 
521)

Hi Charles,
Thank you for the response. We will be using aliasing. Looking into ways to 
avoid ingestion into each of the collections as you have mentioned "For 
example, would it be faster to make a file system copy of the most recent 
collection ..² 

MapReduceIndexerTool is not an option at this point.


One option is to Backup each shard from current_stuff collection at the end of 
week to a particular location( say directory /opt/data/) and then
1) empty/delete existing documents in previous_stuff_1 collection
2) restore each corresponding shard from /opt/data/ to previous_stuff_1 
collection by using backup & restore as suggested 
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backu
ps+of+SolrCores


Trying to find if there are any better ways than above option.

Thanks
Raja




On 7/15/15, 10:23 AM, "Reitzel, Charles" 
wrote:

>Since they want explicitly search within a given "version" of the data, 
>this seems like a textbook application for collection aliases.
>
>You could have N public collection names: current_stuff,
>previous_stuff_1, previous_stuff_2, ...   At any given time, these will
>be aliased to reference the "actual" collection names:
>   current_stuff -> stuff_20150712,
>   previous_stuff_1 -> stuff_20150705,
>   previous_stuff_2 -> stuff_20150628,
>   ...
>
>Every weekend, you create a new collection and index everything current 
>into it.  Once done, reset all the aliases to point to the newest N 
>collections and dropping the oldest:
>   current_stuff -> stuff_20150719
>   previous_stuff_1 -> stuff_20150712,
>   previous_stuff_2 -> stuff_20150705,
>   ...
>
>Collections API: Create or modify an Alias for a Collection 
>https://cwiki.apache.org/confluence/display/solr/Collections+API#Collec
>tio
>nsAPI-api4
>
>Thus, you can keep the IDs the same and use them to compare to previous
>versions of any given document.   Useful, if only for debugging purposes.
>
>Curious if there are opportunities for optimization here.  For example, 
>would it be faster to make a file system copy of the most recent 
>collection and load only changed documents (assuming the delta is 
>available from the source system)?
>
>-Original Message-
>From: Erick Erickson [mailto:erickerick...@gmail.com]
>Sent: Monday, July 13, 2015 11:55 PM
>To: solr-user@lucene.apache.org
>Subject: Re: copying data from one collection to another collection 
>(solr cloud 521)
>
>bq: does offline
>
>No. I'm talking about "collection aliasing". You can create an entirely 
>new collection, index to it however  you want then switch to using that 
>new collection.
>
>bq: Any updates to EXISTING document in the LIVE collection should NOT 
>be replicated to the previous week(s) snapshot(s)
>
>then give it a new ID maybe?
>
>Best,
>Erick
>
>On Mon, Jul 13, 2015 at 3:21 PM, Raja Pothuganti 
> wrote:
>> Thank you Erick
>>>Actually, my question is why do it this way at all? Why not index 
>>>directly to your "live" nodes? This is what SolrCloud is built for.
>>>You an use "implicit" routing to create shards say, for each week and 
>>>age out the ones that are "too old" as well.
>>
>>
>> Any updates to EXISTING document in the LIVE collection should NOT be 
>> replicated to the previous week(s) snapshot(s). Think of the
>> snapshot(s) as an archive of sort and searchable independent of LIVE.
>> We're aiming to support at most 2 archives of data in the past.
>>
>>
>>>Another option would be to use "collection aliasing" to keep an 
>>>offline index up to date then switch over when necessary.
>>
>> Does offline indexing refers to this link
>> https://github.com/cloudera/search/tree/0d47ff79d6ccc0129ffadcb50f9fe
>> 0
>> b271f
>> 102aa/search-mr
>>
>>
>> Thanks
>> Raja
>>
>>
>>
>> On 7/13/15, 3:14 PM, "Erick Erickson"  wrote:
>>
>>>Actually, my question is why do it this way at all? Why not index 
>>>directly to your "live" nodes? This is what SolrCloud is built for.
>>>
>>>There's the new backup/restore functionality that's still a work in 
>>>progress, see: https://issues.apache.org/jira/browse/SOLR-5750
>>>
>>>You an use "implicit" routing to create shards say, for each week and 
>>>age out the ones that are "too old" as well.
>>>
>>>Another option would be to use "collection aliasing" to keep an 
>>>offline index up to date then switch over when necessary.
>>>
>>>I'd really like to know this isn't an XY problem though, what's the 
>>>high-level problem

Re: Querying Nested documents

2015-07-15 Thread Alessandro Benedetti
Thanks Mikhail, the post is really useful!
I will study it in details.

A slight change in the syntax change the query parsed.
Anyway just tried again q=(image_uri_s:somevalue) OR (-image_uri_s:*)
 query approach .

And actually it is working as expected:

q=(name:nome) OR (-name:*) ( give me all the documents containing a
specific name OR documents not containing the name at all )

response":{"numFound":3,"start":0,"docs":[
  {
"id":"999",
"name":"nome",
"_version_":150680568384128},
  {
"id":"99912",
"_version_":1506804469258518528},
  {
"id":"9992",
"_version_":1506805787028094976}]

in my simple example dataset .

Anyway, happy the user solved the problem!

Cheers



2015-07-15 17:38 GMT+01:00 Mikhail Khludnev :

> ok. I checked with with my data
>
> color:orlean  => "numFound": 1,
> -color:[* TO *] => "numFound": 602096 (it used to return 0 until 'pure
> negational' (sic) queries were delivered)
> color:orlean -color:[* TO *] => "numFound": 0,
> color:orlean (*:* -color:[* TO *])  => "numFound": 602097,
>
> fyi
> https://lucidworks.com/blog/why-not-and-or-and-not/
>
>
> On Wed, Jul 15, 2015 at 10:55 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > 2015-07-15 16:01 GMT+01:00 Mikhail Khludnev  >:
> >
> > > 1. I can't get your explanation.
> > >
> > > 2. childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)
> > > is not correct, lacks of quotes , and pointless (selecting some term,
> and
> > > negating all terms gives nothing).
> >
> >
> > Not considering the syntax,
> > We are talking about union of sets, not intersection.
> > Why this query should give nothing ?
> > Should return the union of all the children with "some value" in
> image_uri
> > and the set with no value at all in that field .
> >
> >
> > > Thus, considerable syntax can be only
> > > childFilter="other_field:somevalue -image_uri_s:*"
> > >
> >
> > I have to check, but probably you can answer me directly, is it not
> > possible to express disjunctions there ?
> >
> >
> > >
> > > 3.  I can only guess that you are asking about something like:
> > > http://localhost:8983/solr/demo/select?q={!parent
> > > which='type:parent'}image_uri_s:somevalue&fl=*,[child
> > > parentFilter=type:parent
> > > childFilter=-type:parent]&indent=true
> > >
> > >
> > > On Tue, Jul 14, 2015 at 11:56 PM, Ramesh Nuthalapati <
> > > ramesh.nuthalap...@gmail.com> wrote:
> > >
> > > > Yes you are right.
> > > >
> > > > So the query you are saying should be like below .. or did I
> > > misunderstood
> > > > it
> > > >
> > > > http://localhost:8983/solr/demo/select?q= {!parent
> > > > which='type:parent'}&fl=*,[child parentFilter=type:parent
> > > > childFilter=(image_uri_s:somevalue) OR (-image_uri_s:*)]&indent=true
> > > >
> > > > If so, I am getting an error with parsing field name.
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Querying-Nested-documents-tp4217169p4217348.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > 
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Rerank queries and grouping

2015-07-15 Thread Joel Bernstein
As you've seen RankQueries won't currently have any effect on Grouping
queries.

A RankQuery can be combined with Collapse and Expand though. You may want
to review Collapse and Expand and see if it meets your use case.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jul 15, 2015 at 2:36 PM, Diego Ceccarelli <
diego.ceccare...@gmail.com> wrote:

> Hi Everyone,
>
> I need to use a RankQuery within a grouping [1].
> I did some experiments with RerankQuery [2]  and solr 4.10.2 and it seems
> that
> if you group on a field, the reranking query is completely ignored
> (on the cloud, and on a single instance).
> I would expect to see the results in each group reranked using the
> RerankQuery.
>
> I had a look at the grouping code and documentation and,
> if I correctly understood, the grouping works in two steps:
>
> 1) first the top groups are retrieved
> 2) top documents for each group in the top groups are retrieved.
>
> I thought that the collector generated by a RankQuery could be injected
> in 2), i.e., for each group set a rerank collector... but I'm not sure if
> this solution
> is feasable since the collectors are set in Lucene
> (AbstractSecondPassGroupingCollector)
> and a RankQuery is defined in Solr...
>
> Any suggestion?
>
> Thanks,
> Diego
>
> [1] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [2] https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking
>


Request for Wiki edit rights

2015-07-15 Thread Dikshant Shahi
Hi,

Can you please provide me the privilege to edit Wiki pages.

My Wiki username is Dikshant.

Thanks,
Dikshant


Re: Request for Wiki edit rights

2015-07-15 Thread Erick Erickson
I added you to the Solr Wiki, if you need Lucene Wiki access let us know.

Erick

On Wed, Jul 15, 2015 at 7:59 PM, Dikshant Shahi  wrote:
> Hi,
>
> Can you please provide me the privilege to edit Wiki pages.
>
> My Wiki username is Dikshant.
>
> Thanks,
> Dikshant


Re: Request for Wiki edit rights

2015-07-15 Thread Dikshant Shahi
Thanks Erick! This is good for now.

On Thu, Jul 16, 2015 at 9:54 AM, Erick Erickson 
wrote:

> I added you to the Solr Wiki, if you need Lucene Wiki access let us know.
>
> Erick
>
> On Wed, Jul 15, 2015 at 7:59 PM, Dikshant Shahi 
> wrote:
> > Hi,
> >
> > Can you please provide me the privilege to edit Wiki pages.
> >
> > My Wiki username is Dikshant.
> >
> > Thanks,
> > Dikshant
>