RE: Interesting search question! How to match documents based on the least number of fields that match all query terms?

2014-01-23 Thread Franck Brisbart
Hi Daniel,

you can also consider using negative boosts.
This can't be done with solr, but docs which don't match the metadata
can be boosted.

This might do what you want :
-metadata1:(term1 AND ... AND termN)^2
-metadata2:(term1 AND ... AND termN)^2
.
-metadataN:(term1 AND ... AND termN)^2
allMetadatas :(term1 AND ... AND termN)^0.5


Franck Brisbart



Le mercredi 22 janvier 2014 à 19:38 +, Petersen, Robert a écrit :
> Hi Daniel,
> 
> How about trying something like this (you'll have to play with the boosts to 
> tune this), search all the fields with all the terms using edismax and use 
> the minimum should match parameter, but require all terms to match in the 
> allMetadata field.
> https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29
> 
> Lucene query syntax below to give you the general idea, but this query would 
> require all terms to be in one of the metadata fields to get the boost.
> 
> metadata1:(term1 AND ... AND termN)^2
> metadata2:(term1 AND ... AND termN)^2
> .
> metadataN:(term1 AND ... AND termN)^2
> allMetadatas :(term1 AND ... AND termN)^0.5
> 
> That should do approximately what you want,
> Robi
> 
> -Original Message-
> From: Daniel Shane [mailto:sha...@lexum.com] 
> Sent: Tuesday, January 21, 2014 8:42 AM
> To: solr-user@lucene.apache.org
> Subject: Interesting search question! How to match documents based on the 
> least number of fields that match all query terms?
> 
> I have an interesting solr/lucene question and its quite possible that some 
> new features in solr might make this much easier that what I am about to try. 
> If anyone has a clever idea on how to do this search, please let me know!
> 
> Basically, lets state that I have an index in which each documents has a 
> content and several metadata fields.
> 
> Document Fields:
> 
> content
> metadata1
> metadata2
> .
> metadataN
> allMetadatas (all the terms indexed in metadata1...N are concatenated in this 
> field) 
> 
> Assuming that I am searching for documents that contains a certain number of 
> terms (term1 to termN) in their metadata fields, I would like to build a 
> search query that will return document that satisfy these requirement:
> 
> a) All search terms must be present in a metadata field. This is quite easy, 
> we can simply search in the field allMetadatas and that will work fine.
> 
> b) Now for the hard part, we prefer document in which we found the metadatas 
> in the *least number of different fields*. So if one document contains all 
> the search terms in 10 different fields, but another document contains all 
> search terms but in only 8 fields, we would like those to sort first. 
> 
> My first idea was to index terms in the allMetadatas using payloads. Each 
> indexed term would also have the specific metadataN field from which they 
> originate. Then I can write a scorer to score based on these payloads. 
> 
> However, if there is a way to do this without payloads I'm all ears!
> 




Solr in non-persistent mode

2014-01-23 Thread Per Steffensen

Hi

In Solr 4.0.0 I used to be able to run with persistent=false (in 
solr.xml). I can see 
(https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml) 
that persistent is no longer supported in solr.xml. Does this mean that 
you cannot run in non-persistent mode any longer, or does it mean that I 
have to configure it somewhere else?


Thanks!

Regards, Per Steffensen


How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
Dear Solr-Experts,

I am using Solr for my current web-application on my server successfully.
Now I would like to use it in my second web-application that is hosted
on the same server. Is it possible in any way to create two independent
instances/databases in Solr? I know that I could create another set of
fields with alternated field names, but I would prefer to be independent
on my field naming for all my projects.

Also I would like to be able to have one state of my development version
and one state of my production version on my server so that I can do
tests on my development-state without interference on my production-version.
What is the best-practice to achieve this or how can this be done in
general?

I have searched google but could not get any usefull results because I
don't even know what terms to search for with solr.
A minimal-example would be most helpfull.

Thanks a lot!

Stavros


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Gora Mohanty
On 23 January 2014 14:06, Stavros Delisavas  wrote:
> Dear Solr-Experts,
>
> I am using Solr for my current web-application on my server successfully.
> Now I would like to use it in my second web-application that is hosted
> on the same server. Is it possible in any way to create two independent
> instances/databases in Solr? I know that I could create another set of
> fields with alternated field names, but I would prefer to be independent
> on my field naming for all my projects.
[...]

Use two cores: http://wiki.apache.org/solr/CoreAdmin
These are isolated from each other, and should serve your purpose.

Regards,
Gora


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
If you are not worried about them stepping on each other's toes
(performance, disk space, etc), just create multiple collections.
There are examples of that in standard distribution (e.g. badly named
example/multicore).

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas  wrote:
> Dear Solr-Experts,
>
> I am using Solr for my current web-application on my server successfully.
> Now I would like to use it in my second web-application that is hosted
> on the same server. Is it possible in any way to create two independent
> instances/databases in Solr? I know that I could create another set of
> fields with alternated field names, but I would prefer to be independent
> on my field naming for all my projects.
>
> Also I would like to be able to have one state of my development version
> and one state of my production version on my server so that I can do
> tests on my development-state without interference on my production-version.
> What is the best-practice to achieve this or how can this be done in
> general?
>
> I have searched google but could not get any usefull results because I
> don't even know what terms to search for with solr.
> A minimal-example would be most helpfull.
>
> Thanks a lot!
>
> Stavros


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Furkan KAMACI
Hi;

Firstly you should read here and learn the terminology of Solr:
http://wiki.apache.org/solr/SolrTerminology

Thanks;
Furkan KAMACI


2014/1/23 Alexandre Rafalovitch 

> If you are not worried about them stepping on each other's toes
> (performance, disk space, etc), just create multiple collections.
> There are examples of that in standard distribution (e.g. badly named
> example/multicore).
>
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
> wrote:
> > Dear Solr-Experts,
> >
> > I am using Solr for my current web-application on my server successfully.
> > Now I would like to use it in my second web-application that is hosted
> > on the same server. Is it possible in any way to create two independent
> > instances/databases in Solr? I know that I could create another set of
> > fields with alternated field names, but I would prefer to be independent
> > on my field naming for all my projects.
> >
> > Also I would like to be able to have one state of my development version
> > and one state of my production version on my server so that I can do
> > tests on my development-state without interference on my
> production-version.
> > What is the best-practice to achieve this or how can this be done in
> > general?
> >
> > I have searched google but could not get any usefull results because I
> > don't even know what terms to search for with solr.
> > A minimal-example would be most helpfull.
> >
> > Thanks a lot!
> >
> > Stavros
>


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
Thanks for the fast responses. Looks like exactly what I was looking for!




Am 23.01.2014 09:46, schrieb Furkan KAMACI:
> Hi;
>
> Firstly you should read here and learn the terminology of Solr:
> http://wiki.apache.org/solr/SolrTerminology
>
> Thanks;
> Furkan KAMACI
>
>
> 2014/1/23 Alexandre Rafalovitch 
>
>> If you are not worried about them stepping on each other's toes
>> (performance, disk space, etc), just create multiple collections.
>> There are examples of that in standard distribution (e.g. badly named
>> example/multicore).
>>
>> Regards,
>>   Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>> wrote:
>>> Dear Solr-Experts,
>>>
>>> I am using Solr for my current web-application on my server successfully.
>>> Now I would like to use it in my second web-application that is hosted
>>> on the same server. Is it possible in any way to create two independent
>>> instances/databases in Solr? I know that I could create another set of
>>> fields with alternated field names, but I would prefer to be independent
>>> on my field naming for all my projects.
>>>
>>> Also I would like to be able to have one state of my development version
>>> and one state of my production version on my server so that I can do
>>> tests on my development-state without interference on my
>> production-version.
>>> What is the best-practice to achieve this or how can this be done in
>>> general?
>>>
>>> I have searched google but could not get any usefull results because I
>>> don't even know what terms to search for with solr.
>>> A minimal-example would be most helpfull.
>>>
>>> Thanks a lot!
>>>
>>> Stavros



Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
Which is why it is curious that you did not find it. Looking back at
it now, do you have a suggestion of what could be improved to insure
people find this easier in the future?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  wrote:
> Thanks for the fast responses. Looks like exactly what I was looking for!
>
>
>
>
> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>> Hi;
>>
>> Firstly you should read here and learn the terminology of Solr:
>> http://wiki.apache.org/solr/SolrTerminology
>>
>> Thanks;
>> Furkan KAMACI
>>
>>
>> 2014/1/23 Alexandre Rafalovitch 
>>
>>> If you are not worried about them stepping on each other's toes
>>> (performance, disk space, etc), just create multiple collections.
>>> There are examples of that in standard distribution (e.g. badly named
>>> example/multicore).
>>>
>>> Regards,
>>>   Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all
>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>> book)
>>>
>>>
>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>>> wrote:
 Dear Solr-Experts,

 I am using Solr for my current web-application on my server successfully.
 Now I would like to use it in my second web-application that is hosted
 on the same server. Is it possible in any way to create two independent
 instances/databases in Solr? I know that I could create another set of
 fields with alternated field names, but I would prefer to be independent
 on my field naming for all my projects.

 Also I would like to be able to have one state of my development version
 and one state of my production version on my server so that I can do
 tests on my development-state without interference on my
>>> production-version.
 What is the best-practice to achieve this or how can this be done in
 general?

 I have searched google but could not get any usefull results because I
 don't even know what terms to search for with solr.
 A minimal-example would be most helpfull.

 Thanks a lot!

 Stavros
>


Re: Solr middle-ware?

2014-01-23 Thread Furkan KAMACI
Hi;

I've written a Search API in front of my SolrCloud. When a user sends a
query it goes to my Search API (that uses Solrj). Query is validated, fixed
and filled with some default parameters that a user can not change and
after that query goes to the SolrCloud.

It allows me to expose my index via JAX-RS, JAX-WS and OpenSearch format
(Atom, RSS, XHTML). Beside the security issues I allow a user to write
queries something like that within my API:

*title: title to search*
*url: url to search*

It is not a hard task to implement something like that. There is just one
thing to consider: API should run fast otherwise it causes a bottleneck in
front of the SolrCloud. On the other hand there are some security layers
that is provided with hardware at my architecture.

Thanks;
Furkan KAMACI


2014/1/23 

> I've been thinking of using nodejs as a thin layer between the client and
> solr servers.  it seems pretty handy for adding features like throttling,
> load balancing and basic authentications. -lianyi
>
> On Wed, Jan 22, 2014 at 7:36 PM, Alexandre Rafalovitch  >
> wrote:
>
> > I thought about Go, but that does not give the advantages of spanning
> > client and server like Dart and Node/Javascript. Which is why Dart
> > felt a bit more interesting, especially with tree-shaking of unused
> > code.
> > But then, neither language has enough adoption to be an answer to my
> > original question right now (existing middleware for new people to
> > pick). So, that's a more theoretical part of the discussion.
> > Regards,
> >Alex.
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all
> > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > book)
> > On Thu, Jan 23, 2014 at 4:29 AM, Jorge Luis Betancourt González
> >  wrote:
> >> I would love to see some proxy-like application implemented in go
> (partly for my desire of having time to check out go).
> >>
> >> - Original Message -
> >> From: "Shawn Heisey" 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, January 22, 2014 10:38:34 AM
> >> Subject: Re: Solr middle-ware?
> >>
> >> On 1/22/2014 12:25 AM, Raymond Wiker wrote:
> >>> Speaking for myself, I avoid using "client apis" like SolrNet, SolrJ
> and
> >>> FAST DSAPI for the simple reason that I feel that the abstractions they
> >>> offer are so thin that I may just as well talk directly to the HTTP
> >>> interface. Doing that also lets me build web applications that maintain
> >>> their own state, which makes for more responsive and more robust
> >>> applications (although I'm sure there will be differing opinions on
> this).
> >>
> >> If you have the programming skill, this is absolutely a great way to go.
> >>  It does require a lot of knowledge and expertise, though.
> >>
> >> If you want to hammer out a quick program and be reasonably sure it's
> >> right, a client API handles a lot of the hard stuff for you.  When
> >> something changes in a new version of Solr that breaks a client API,
> >> just upgrading the client API is often enough to make the same code work
> >> again.
> >>
> >> I love SolrJ.  It's part of Solr itself, used internally for SolrCloud,
> >> and probably replication too.  It's thoroughly tested with the Solr test
> >> suite, and if used correctly, it's pretty much guaranteed to be
> >> compatible with the same version of Solr.  In most cases, it will work
> >> with other versions too.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> 
> >> III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero
> del 2014. Ver www.uci.cu
>


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
I didn't know that the "core"-term is associated with this use case. I
expected it to be some technical feature that allows to run more
solr-instances for better multithread-cpu-usage. For example to activate
two solr-cores when two cpu-cores are available on the server.

So in general, I have the feeling that the term "core" is somewhat
confusing for solr-beginners like me.



Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
> Which is why it is curious that you did not find it. Looking back at
> it now, do you have a suggestion of what could be improved to insure
> people find this easier in the future?
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
> wrote:
>> Thanks for the fast responses. Looks like exactly what I was looking for!
>>
>>
>>
>>
>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>>> Hi;
>>>
>>> Firstly you should read here and learn the terminology of Solr:
>>> http://wiki.apache.org/solr/SolrTerminology
>>>
>>> Thanks;
>>> Furkan KAMACI
>>>
>>>
>>> 2014/1/23 Alexandre Rafalovitch 
>>>
 If you are not worried about them stepping on each other's toes
 (performance, disk space, etc), just create multiple collections.
 There are examples of that in standard distribution (e.g. badly named
 example/multicore).

 Regards,
   Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
 wrote:
> Dear Solr-Experts,
>
> I am using Solr for my current web-application on my server successfully.
> Now I would like to use it in my second web-application that is hosted
> on the same server. Is it possible in any way to create two independent
> instances/databases in Solr? I know that I could create another set of
> fields with alternated field names, but I would prefer to be independent
> on my field naming for all my projects.
>
> Also I would like to be able to have one state of my development version
> and one state of my production version on my server so that I can do
> tests on my development-state without interference on my
 production-version.
> What is the best-practice to achieve this or how can this be done in
> general?
>
> I have searched google but could not get any usefull results because I
> don't even know what terms to search for with solr.
> A minimal-example would be most helpfull.
>
> Thanks a lot!
>
> Stavros



Re: Solr/Lucene Faceted Search Too Many Unique Values?

2014-01-23 Thread Yago Riveiro
Im my case I need to know the number os unique number os visitors and number of 
visits in a period of time.

I need to render the data in a table with pagination. To know the number of 
unique elements to calculate the total os pages the only way I found was return 
facets=-1.




/yago





—
/Yago Riveiro

On Thu, Jan 23, 2014 at 1:39 AM, Erick Erickson 
wrote:

> A legitimate question that only you can answer is
> "what's the value of faceting on fields with so many unique values?"
> Consider the ridiculous case of faceting on . There's
> almost exactly zero value in faceting on it, since all counts will be 1.
> By analogy, with millions of tag values, will there ever be more than a very
> small count of for any facet? And will showing those be useful to the
> user?
> They may be, and Yago has a use-case where the answer is "yes". Before
> trying to make Solr perform in this insance, though, I'd review the use-case
> to see if it makes sense
> Erick
> On Wed, Jan 22, 2014 at 5:09 PM, Yago Riveiro  wrote:
>> You will need to use DocValues if you want to use facets with this amount of 
>> terms and not blow the heap.
>>
>> I have facets with ~39M of unique terms, the response time is about 10 ~ 40 
>> seconds, in my case is not a problem.
>>
>> --
>> Yago Riveiro
>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>
>>
>> On Wednesday, January 22, 2014 at 10:59 PM, Bing Hua wrote:
>>
>>> Hi,
>>>
>>> I am going to evaluate some Lucene/Solr capabilities on handling faceted
>>> queries, in particular, with a single facet field that contains large number
>>> (say up to 1 million) of distinct values. Does anyone have some experience
>>> on how lucene performs in this scenario?
>>>
>>> e.g.
>>> Doc1 has tags A B C D 
>>> Doc2 has tags B C D E 
>>> etc etc millions of docs and there can be millions of distinct tag values.
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Solr-Lucene-Faceted-Search-Too-Many-Unique-Values-tp4112860.html
>>> Sent from the Solr - User mailing list archive at Nabble.com 
>>> (http://Nabble.com).
>>>
>>>
>>
>>

Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
You are right on that one. Collection is the new term. Which is why
basic example is "collection1". Core is the physical representation
and it gets a bit confusing at that level with shards and all that.
The documentation is in a transition.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Jan 23, 2014 at 4:10 PM, Stavros Delisavas  wrote:
> I didn't know that the "core"-term is associated with this use case. I
> expected it to be some technical feature that allows to run more
> solr-instances for better multithread-cpu-usage. For example to activate
> two solr-cores when two cpu-cores are available on the server.
>
> So in general, I have the feeling that the term "core" is somewhat
> confusing for solr-beginners like me.
>
>
>
> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
>> Which is why it is curious that you did not find it. Looking back at
>> it now, do you have a suggestion of what could be improved to insure
>> people find this easier in the future?
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
>> wrote:
>>> Thanks for the fast responses. Looks like exactly what I was looking for!
>>>
>>>
>>>
>>>
>>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
 Hi;

 Firstly you should read here and learn the terminology of Solr:
 http://wiki.apache.org/solr/SolrTerminology

 Thanks;
 Furkan KAMACI


 2014/1/23 Alexandre Rafalovitch 

> If you are not worried about them stepping on each other's toes
> (performance, disk space, etc), just create multiple collections.
> There are examples of that in standard distribution (e.g. badly named
> example/multicore).
>
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
> wrote:
>> Dear Solr-Experts,
>>
>> I am using Solr for my current web-application on my server successfully.
>> Now I would like to use it in my second web-application that is hosted
>> on the same server. Is it possible in any way to create two independent
>> instances/databases in Solr? I know that I could create another set of
>> fields with alternated field names, but I would prefer to be independent
>> on my field naming for all my projects.
>>
>> Also I would like to be able to have one state of my development version
>> and one state of my production version on my server so that I can do
>> tests on my development-state without interference on my
> production-version.
>> What is the best-practice to achieve this or how can this be done in
>> general?
>>
>> I have searched google but could not get any usefull results because I
>> don't even know what terms to search for with solr.
>> A minimal-example would be most helpfull.
>>
>> Thanks a lot!
>>
>> Stavros
>


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Toke Eskildsen
On Thu, 2014-01-23 at 09:36 +0100, Stavros Delisavas wrote:
> I am using Solr for my current web-application on my server successfully.
> Also I would like to be able to have one state of my development version
> and one state of my production version on my server so that I can do
> tests on my development-state without interference on my production-version.
> What is the best-practice to achieve this or how can this be done in
> general?

I highly recommend keeping development on another machine than
production. Solr (and any other heavy application, really) taxes all the
different resources on the system - CPU, memory & IO. A run-amok
development Solr can easily influence a production Solr on the same
machine.

Running on the same machine is also prone to human errors: Accidental
shutdown of the production server, sending test data into prod etc.


If you really need to do it, run development under a different user than
prod and see if it is possible to block access to the ports used by prod
for the development user.

- Toke Eskildsen, State and University Library, Denmark




Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
So far, I successfully managed to create a core from my existing
configuration by opening this URL in my browser:

http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr

New status from http://localhost:8080/solr/admin/cores?action=STATUS is:



0
4




/usr/share/solr/./
/var/lib/solr/data/
2014-01-23T08:42:39.087Z
3056197

4401029
4401029
1370010628806
12
true
false

org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801

2013-10-29T14:17:22Z



glPrototypeCore
/etc/solr/
/var/lib/solr/data/
2014-01-23T09:29:30.019Z
245267

4401029
4401029
1370010628806
12
true
false

org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862

2013-10-29T14:17:22Z






>From my understanding I now have an unnamed core and a core named
"glPrototypeCore" which uses the same configuration.

I copied the files data-config.xml, schema.xml into a new directory
"/etc/solr/glinstance" and tried to create another core but this always
throws me error 400. I even tried by adding the schema- and
config-parameters with full path, but this did not lead to any
difference. Also I don't understand what the "dataDir"-parameter is for.
I could not find any data-directories in /etc/solr/ but the creation of
the first core worked anyway.

Can someone help? Is there any better place for my new
instance-directory and what files do I really need?





Am 23.01.2014 10:10, schrieb Stavros Delisavas:
> I didn't know that the "core"-term is associated with this use case. I
> expected it to be some technical feature that allows to run more
> solr-instances for better multithread-cpu-usage. For example to activate
> two solr-cores when two cpu-cores are available on the server.
>
> So in general, I have the feeling that the term "core" is somewhat
> confusing for solr-beginners like me.
>
>
>
> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
>> Which is why it is curious that you did not find it. Looking back at
>> it now, do you have a suggestion of what could be improved to insure
>> people find this easier in the future?
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
>> wrote:
>>> Thanks for the fast responses. Looks like exactly what I was looking for!
>>>
>>>
>>>
>>>
>>> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
 Hi;

 Firstly you should read here and learn the terminology of Solr:
 http://wiki.apache.org/solr/SolrTerminology

 Thanks;
 Furkan KAMACI


 2014/1/23 Alexandre Rafalovitch 

> If you are not worried about them stepping on each other's toes
> (performance, disk space, etc), just create multiple collections.
> There are examples of that in standard distribution (e.g. badly named
> example/multicore).
>
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
> wrote:
>> Dear Solr-Experts,
>>
>> I am using Solr for my current web-application on my server successfully.
>> Now I would like to use it in my second web-application that is hosted
>> on the same server. Is it possible in any way to create two independent
>> instances/databases in Solr? I know that I could create another set of
>> fields with alternated field names, but I would prefer to be independent
>> on my field naming for all my projects.
>>
>> Also I would like to be able to have one state of my development version
>> and one state of my production version on my server so that I can do
>> tests on my development-state without interference on my
> production-version.
>> What is the best-practice to achieve this or how can this be done in
>> general?
>>
>> I have searched google but could not get any usefull results because I
>> don't even know what terms to search for with solr.
>> A minimal-example would be most helpfull.
>>
>> Thanks a lot!
>>
>> Stavros



Re: Possible regression for Solr 4.6.0 - commitWithin does not work with replicas

2014-01-23 Thread Varun Thacker
Hi Elodie,

Thanks for pointing it out. I have created a Jira for this (
https://issues.apache.org/jira/browse/SOLR-5658 )

You could track the progress of it there.



On Wed, Dec 11, 2013 at 3:11 PM, Elodie Sannier wrote:

> Hello,
>
> I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with
> two collections.
>
> collection fr_blue:
> - shard1 -> server-01 (replica1), server-01 (replica2)
> - shard2 -> server-02 (replica1), server-02 (replica2)
>
> collection fr_green:
> - shard1 -> server-01 (replica1), server-01 (replica2)
> - shard2 -> server-02 (replica1), server-02 (replica2)
>
> I add documents using solrj CloudSolrServer and using commitWithin feature
> :
> int commitWithinMs = 3;
> SolrServer server = new CloudSolrServer(zkHost);
> server.add(doc, commitWithinMs);
>
> When I query an instance,  for 5 indexed documents, the numFound value
> changes for each call, randomly 0,1,4 or 5.
> When I query the instances with distrib=false, I have:
> - leader shard1: numFound=1
> - leader shard2: numFound=4
> - replica shard1: numFound=0
> - replica shard1: numFound=0
>
> The documents are not commited in the replicas, even after waiting more
> than 30 seconds.
>
> If I force a commit usinghttp://server-01:8080/solr/update/?commit=true,
> the documents are commited in the replicas and numFound=5.
> I suppose that the leader forwards the documents to the replica, but they
> are not commited.
>
> Is it a new bug with commitWithin feature for distributed mode ?
>
> This problem does not occur with the version 4.5.1.
>
> Elodie Sannier
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>



-- 


Regards,
Varun Thacker
http://www.vthacker.in/


RE: AIOOBException on trunk since 21st or 22nd build

2014-01-23 Thread Markus Jelsma
Yeah, i can now also reproduce the problem with a build of the 20th! Again the 
same nodes leader and replica. The problem seems to be in the data we're 
sending to Solr. I'll check it out an file an issue.
Cheers

-Original message-
> From:Mark Miller 
> Sent: Wednesday 22nd January 2014 18:56
> To: solr-user 
> Subject: Re: AIOOBException on trunk since 21st or 22nd build
> 
> Looking at the list of changes on the 21st and 22nd, I don’t see a smoking 
> gun.
> 
> - Mark  
> 
> 
> 
> On Jan 22, 2014, 11:13:26 AM, Markus Jelsma  
> wrote: Hi - this likely belongs to an existing open issue. We're seeing the 
> stuff below on a build of the 22nd. Until just now we used builds of the 20th 
> and didn't have the issue. This is either a bug or did some data format in 
> Zookeeper change? Until now only two cores of the same shard through the 
> error, all other nodes in the cluster are clean.
> 
> 2014-01-22 15:32:48,826 ERROR [solr.core.SolrCore] - [http-8080-exec-5] - : 
> java.lang.ArrayIndexOutOfBoundsException: 1
> at 
> org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:291)
> at 
> org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58)
> at 
> org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:961)
> at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
> at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:347)
> at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278)
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1915)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:785)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:203)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at 
> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
> at 
> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
> at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> 


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
You need config-dir level schema.xml, and solrconfig.xml. For multiple
collections, you also need a top-level solr.xml. And unless the config
files a lot of references to other files, you need nothing else.

For examples, check the example directory in the distribution. Or have
a look at examples from my book:
https://github.com/arafalov/solr-indexing-book/tree/master/published .
This shows the solr.xml that points at a lot of collections. The first
nearly minimal collection is collection1, but you can then explore
others for various degree of complexity.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas  wrote:
> So far, I successfully managed to create a core from my existing
> configuration by opening this URL in my browser:
>
> http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr
>
> New status from http://localhost:8080/solr/admin/cores?action=STATUS is:
>
> 
> 
> 0
> 4
> 
> 
> 
> 
> /usr/share/solr/./
> /var/lib/solr/data/
> 2014-01-23T08:42:39.087Z
> 3056197
> 
> 4401029
> 4401029
> 1370010628806
> 12
> true
> false
> 
> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801
> 
> 2013-10-29T14:17:22Z
> 
> 
> 
> glPrototypeCore
> /etc/solr/
> /var/lib/solr/data/
> 2014-01-23T09:29:30.019Z
> 245267
> 
> 4401029
> 4401029
> 1370010628806
> 12
> true
> false
> 
> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862
> 
> 2013-10-29T14:17:22Z
> 
> 
> 
> 
>
>
> From my understanding I now have an unnamed core and a core named
> "glPrototypeCore" which uses the same configuration.
>
> I copied the files data-config.xml, schema.xml into a new directory
> "/etc/solr/glinstance" and tried to create another core but this always
> throws me error 400. I even tried by adding the schema- and
> config-parameters with full path, but this did not lead to any
> difference. Also I don't understand what the "dataDir"-parameter is for.
> I could not find any data-directories in /etc/solr/ but the creation of
> the first core worked anyway.
>
> Can someone help? Is there any better place for my new
> instance-directory and what files do I really need?
>
>
>
>
>
> Am 23.01.2014 10:10, schrieb Stavros Delisavas:
>> I didn't know that the "core"-term is associated with this use case. I
>> expected it to be some technical feature that allows to run more
>> solr-instances for better multithread-cpu-usage. For example to activate
>> two solr-cores when two cpu-cores are available on the server.
>>
>> So in general, I have the feeling that the term "core" is somewhat
>> confusing for solr-beginners like me.
>>
>>
>>
>> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
>>> Which is why it is curious that you did not find it. Looking back at
>>> it now, do you have a suggestion of what could be improved to insure
>>> people find this easier in the future?
>>>
>>> Regards,
>>>Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all
>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>> book)
>>>
>>>
>>> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
>>> wrote:
 Thanks for the fast responses. Looks like exactly what I was looking for!




 Am 23.01.2014 09:46, schrieb Furkan KAMACI:
> Hi;
>
> Firstly you should read here and learn the terminology of Solr:
> http://wiki.apache.org/solr/SolrTerminology
>
> Thanks;
> Furkan KAMACI
>
>
> 2014/1/23 Alexandre Rafalovitch 
>
>> If you are not worried about them stepping on each other's toes
>> (performance, disk space, etc), just create multiple collections.
>> There are examples of that in standard distribution (e.g. badly named
>> example/multicore).
>>
>> Regards,
>>   Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas 
>> wrote:
>>> Dear Solr-Experts,
>>>
>>> I am using Solr for my current web-application on my server 
>>> successfully.
>>> Now I would like to use it in my second web-application that is hosted
>>> on the same server. Is it possible in a

Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
Thanks a lot,
those are great examples. I managed to get my cores working. What I
noticed so far is that the first (auto-created) core is symlinking files
to /etc/solr/...  or to /var/lib/solr/...

I now am not sure where my self made-collections should be. Shall I
create folders in /usr/share/solr/ and symlink to my
files in /etc/solr or can I have hard-copies in my collection-folders?
Is /usr/share/solr/ a good place for my collection-folders at all?



Am 23.01.2014 12:16, schrieb Alexandre Rafalovitch:
> You need config-dir level schema.xml, and solrconfig.xml. For multiple
> collections, you also need a top-level solr.xml. And unless the config
> files a lot of references to other files, you need nothing else.
>
> For examples, check the example directory in the distribution. Or have
> a look at examples from my book:
> https://github.com/arafalov/solr-indexing-book/tree/master/published .
> This shows the solr.xml that points at a lot of collections. The first
> nearly minimal collection is collection1, but you can then explore
> others for various degree of complexity.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas  
> wrote:
>> So far, I successfully managed to create a core from my existing
>> configuration by opening this URL in my browser:
>>
>> http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr
>>
>> New status from http://localhost:8080/solr/admin/cores?action=STATUS is:
>>
>> 
>> 
>> 0
>> 4
>> 
>> 
>> 
>> 
>> /usr/share/solr/./
>> /var/lib/solr/data/
>> 2014-01-23T08:42:39.087Z
>> 3056197
>> 
>> 4401029
>> 4401029
>> 1370010628806
>> 12
>> true
>> false
>> 
>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801
>> 
>> 2013-10-29T14:17:22Z
>> 
>> 
>> 
>> glPrototypeCore
>> /etc/solr/
>> /var/lib/solr/data/
>> 2014-01-23T09:29:30.019Z
>> 245267
>> 
>> 4401029
>> 4401029
>> 1370010628806
>> 12
>> true
>> false
>> 
>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862
>> 
>> 2013-10-29T14:17:22Z
>> 
>> 
>> 
>> 
>>
>>
>> From my understanding I now have an unnamed core and a core named
>> "glPrototypeCore" which uses the same configuration.
>>
>> I copied the files data-config.xml, schema.xml into a new directory
>> "/etc/solr/glinstance" and tried to create another core but this always
>> throws me error 400. I even tried by adding the schema- and
>> config-parameters with full path, but this did not lead to any
>> difference. Also I don't understand what the "dataDir"-parameter is for.
>> I could not find any data-directories in /etc/solr/ but the creation of
>> the first core worked anyway.
>>
>> Can someone help? Is there any better place for my new
>> instance-directory and what files do I really need?
>>
>>
>>
>>
>>
>> Am 23.01.2014 10:10, schrieb Stavros Delisavas:
>>> I didn't know that the "core"-term is associated with this use case. I
>>> expected it to be some technical feature that allows to run more
>>> solr-instances for better multithread-cpu-usage. For example to activate
>>> two solr-cores when two cpu-cores are available on the server.
>>>
>>> So in general, I have the feeling that the term "core" is somewhat
>>> confusing for solr-beginners like me.
>>>
>>>
>>>
>>> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
 Which is why it is curious that you did not find it. Looking back at
 it now, do you have a suggestion of what could be improved to insure
 people find this easier in the future?

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas  
 wrote:
> Thanks for the fast responses. Looks like exactly what I was looking for!
>
>
>
>
> Am 23.01.2014 09:46, schrieb Furkan KAMACI:
>> Hi;
>>
>> Firstly you should read here and learn the terminology of Solr:
>> http://wiki.apache.org/solr/SolrTerminology
>>
>> Thanks;
>> Furkan KAMACI
>>
>>
>> 2014/1/23 Alexandre Rafalovitch 
>>
>>> If you are not worried about them stepping on each other's toes
>>> (performance, disk space, etc), just create multiple collections.
>>> There are examples of that in standard distribution (e.g. badly named
>>> example/

how to write an efficient query with a subquery to restrict the search space?

2014-01-23 Thread svante karlsson
I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.


my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)&rows=100&fl=*

but what I think I get is
.  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)&rows=100&fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?


Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?



/svante


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
You are not doing this on a download distribution, do you? You are
using Bitnami stack or something. That's why you are not seeing the
examples folder, etc.

I recommend step back, use downloaded distribution and do your
learning and setup using that. Then, go and see where your production
stack put various bits of Solr. Otherwise, you are doing two (15?)
things at once.

Regards,
   Alex.
P.s. If you like the examples, the book actually explains what they
do. You could be quarter way to mastery in less than 24 hours...
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Jan 23, 2014 at 6:38 PM, Stavros Delisavas  wrote:
> Thanks a lot,
> those are great examples. I managed to get my cores working. What I
> noticed so far is that the first (auto-created) core is symlinking files
> to /etc/solr/...  or to /var/lib/solr/...
>
> I now am not sure where my self made-collections should be. Shall I
> create folders in /usr/share/solr/ and symlink to my
> files in /etc/solr or can I have hard-copies in my collection-folders?
> Is /usr/share/solr/ a good place for my collection-folders at all?
>
>
>
> Am 23.01.2014 12:16, schrieb Alexandre Rafalovitch:
>> You need config-dir level schema.xml, and solrconfig.xml. For multiple
>> collections, you also need a top-level solr.xml. And unless the config
>> files a lot of references to other files, you need nothing else.
>>
>> For examples, check the example directory in the distribution. Or have
>> a look at examples from my book:
>> https://github.com/arafalov/solr-indexing-book/tree/master/published .
>> This shows the solr.xml that points at a lot of collections. The first
>> nearly minimal collection is collection1, but you can then explore
>> others for various degree of complexity.
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas  
>> wrote:
>>> So far, I successfully managed to create a core from my existing
>>> configuration by opening this URL in my browser:
>>>
>>> http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr
>>>
>>> New status from http://localhost:8080/solr/admin/cores?action=STATUS is:
>>>
>>> 
>>> 
>>> 0
>>> 4
>>> 
>>> 
>>> 
>>> 
>>> /usr/share/solr/./
>>> /var/lib/solr/data/
>>> 2014-01-23T08:42:39.087Z
>>> 3056197
>>> 
>>> 4401029
>>> 4401029
>>> 1370010628806
>>> 12
>>> true
>>> false
>>> 
>>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801
>>> 
>>> 2013-10-29T14:17:22Z
>>> 
>>> 
>>> 
>>> glPrototypeCore
>>> /etc/solr/
>>> /var/lib/solr/data/
>>> 2014-01-23T09:29:30.019Z
>>> 245267
>>> 
>>> 4401029
>>> 4401029
>>> 1370010628806
>>> 12
>>> true
>>> false
>>> 
>>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862
>>> 
>>> 2013-10-29T14:17:22Z
>>> 
>>> 
>>> 
>>> 
>>>
>>>
>>> From my understanding I now have an unnamed core and a core named
>>> "glPrototypeCore" which uses the same configuration.
>>>
>>> I copied the files data-config.xml, schema.xml into a new directory
>>> "/etc/solr/glinstance" and tried to create another core but this always
>>> throws me error 400. I even tried by adding the schema- and
>>> config-parameters with full path, but this did not lead to any
>>> difference. Also I don't understand what the "dataDir"-parameter is for.
>>> I could not find any data-directories in /etc/solr/ but the creation of
>>> the first core worked anyway.
>>>
>>> Can someone help? Is there any better place for my new
>>> instance-directory and what files do I really need?
>>>
>>>
>>>
>>>
>>>
>>> Am 23.01.2014 10:10, schrieb Stavros Delisavas:
 I didn't know that the "core"-term is associated with this use case. I
 expected it to be some technical feature that allows to run more
 solr-instances for better multithread-cpu-usage. For example to activate
 two solr-cores when two cpu-cores are available on the server.

 So in general, I have the feeling that the term "core" is somewhat
 confusing for solr-beginners like me.



 Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch:
> Which is why it is curious that you did not find it. Looking back at
> it now, do you have a suggestion of what could be improved to insure
> people find this easier in the future?
>
> Regards,
>Alex.
> Personal webs

java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory

2014-01-23 Thread saurish

Hi,
I am new to solr and successfully did a basic search. Now i am trying to do
classification of the search results using carrrot's support which comes
with solr 4.5.1. Would appreciate if someone tells me what is that i am
missing...may be a trivial issue??!!!

I am getting the below error..*java.lang.NoClassDefFoundError:
org/carrot2/core/ControllerFactory*. I know this error might be because of
carrot2 classes not getting loaded. But if you look below the jars in the
"../contrib/clustering/lib" directory are being loaded. but still i am
getting the error. what might be the reason?

I am working with Solr 4.5.1 on tomcat 7.0.47.




INFO  - 2014-01-23 16:02:50.865; org.apache.solr.core.CorePropertiesLocator;
Looking for core definitions underneath D:\Work\x\solr
INFO  - 2014-01-23 16:02:51.288; org.apache.solr.core.CorePropertiesLocator;
Found core collection1 in D:\Work\x\solr\collection1\
INFO  - 2014-01-23 16:02:51.911; org.apache.solr.core.CorePropertiesLocator;
Found 1 core definitions
INFO  - 2014-01-23 16:02:51.916; org.apache.solr.core.CoreContainer;
Creating SolrCore 'collection1' using instanceDir:
D:\Work\x\solr\collection1
INFO  - 2014-01-23 16:02:51.918; org.apache.solr.core.SolrResourceLoader;
new SolrResourceLoader for directory: 'D:\Work\x\solr\collection1\'

* Look at the librarries being loaded* 

INFO  - 2014-01-23 16:02:52.408; org.apache.solr.core.SolrConfig; Adding
specified lib dirs to ClassLoader
INFO  - 2014-01-23 16:02:52.482; org.apache.solr.core.SolrResourceLoader; 

* INFO  - 2014-01-23 16:02:52.634; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/D:/Work/x/solr/contrib/clustering/lib/attributes-binder-1.2.0.jar'
to classloader
INFO  - 2014-01-23 16:02:52.637; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/D:/Work/x/solr/contrib/clustering/lib/carrot2-mini-3.8.0.jar' to
classloader
INFO  - 2014-01-23 16:02:52.639; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/Work/x/solr/contrib/clustering/lib/hppc-0.5.2.jar' to
classloader
INFO  - 2014-01-23 16:02:52.642; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/D:/Work/x/solr/contrib/clustering/lib/jackson-core-asl-1.7.4.jar'
to classloader
INFO  - 2014-01-23 16:02:52.644; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/D:/Work/x/solr/contrib/clustering/lib/jackson-mapper-asl-1.7.4.jar'
to classloader
INFO  - 2014-01-23 16:02:52.645; org.apache.solr.core.SolrResourceLoader;
Adding
'file:/D:/Work/x/solr/contrib/clustering/lib/mahout-collections-1.0.jar'
to classloader
INFO  - 2014-01-23 16:02:52.649; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/Work/x/solr/contrib/clustering/lib/mahout-math-0.6.jar'
to classloader
INFO  - 2014-01-23 16:02:52.653; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/Work/x/solr/contrib/clustering/lib/simple-xml-2.7.jar'
to classloader
INFO  - 2014-01-23 16:02:52.660; org.apache.solr.core.SolrResourceLoader;
Adding 'file:/D:/Work/x/solr/dist/solr-clustering-4.5.1.jar' to
classloader  
INFO  - 2014-01-23 16:02:52.664; org.apache.solr.core.SolrResourceLoader;* 
Adding 'file:/D:/Work/x/solr/contrib/langid/lib/jsonic-1.2.7.jar' to
classloader
INFO  - 2014-01-23 16:02:52.665; org.apache.solr.core.SolrResourceLoader; 
INFO  - 2014-01-23 16:02:58.237; org.apache.solr.core.SolrConfig; Loaded
SolrConfig: solrconfig.xml
INFO  - 2014-01-23 16:02:58.430; org.apache.solr.schema.IndexSchema; Reading
Solr Schema from schema.xml
INFO  - 2014-01-23 16:02:58.762; org.apache.solr.schema.IndexSchema;
[collection1] Schema name=nutch
INFO  - 2014-01-23 16:03:02.138; org.apache.solr.schema.IndexSchema; default
search field in schema is text
INFO  - 2014-01-23 16:03:02.141; org.apache.solr.schema.IndexSchema; query
parser default operator is OR
INFO  - 2014-01-23 16:03:02.145; org.apache.solr.schema.IndexSchema; unique
key field: url
INFO  - 2014-01-23 16:03:03.765; org.apache.solr.core.SolrCore;
solr.NRTCachingDirectoryFactory
INFO  - 2014-01-23 16:03:03.797; org.apache.solr.core.SolrCore;
[collection1] Opening new SolrCore at D:\Work\x\solr\collection1\,
dataDir=D:/Work/x/solr/data\
INFO  - 2014-01-23 16:03:03.936; org.apache.solr.core.JmxMonitoredMap; JMX
monitoring is enabled. Adding Solr mbeans to JMX Server:
com.sun.jmx.mbeanserver.JmxMBeanServer@2a5ab9
INFO  - 2014-01-23 16:03:04.461; org.apache.solr.core.SolrCore;
[collection1] Added SolrEventListener for newSearcher:
org.apache.solr.core.QuerySenderListener{queries=[]}
INFO  - 2014-01-23 16:03:04.464; org.apache.solr.core.SolrCore;
[collection1] Added SolrEventListener for firstSearcher:
org.apache.solr.core.QuerySenderListener{queries=[{q=static firstSearcher
warming in solrconfig.xml}]}
INFO  - 2014-01-23 16:03:04.699;
org.apache.solr.core.CachingDirectoryFactory; return new directory for
D:\Work\x\solr\data
INFO  - 2014-01-23 16:03:04.700; org.apache.solr.core.SolrCore; New index
directory detected: old=null new=D:/Work/x/solr/data\in

Re: How to use Solr for two different projects on one server

2014-01-23 Thread Stavros Delisavas
I installed solr via apt-get and followed the online tutorials that I
found to adjust the existing schema.xml and created dataconfig.xml the
way I needed them.

Was this the wrong approach? I don't know what Bitname stack is.




Am 23.01.2014 12:50, schrieb Alexandre Rafalovitch:
> You are not doing this on a download distribution, do you? You are
> using Bitnami stack or something. That's why you are not seeing the
> examples folder, etc.
>
> I recommend step back, use downloaded distribution and do your
> learning and setup using that. Then, go and see where your production
> stack put various bits of Solr. Otherwise, you are doing two (15?)
> things at once.
>
> Regards,
>Alex.
> P.s. If you like the examples, the book actually explains what they
> do. You could be quarter way to mastery in less than 24 hours...
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Thu, Jan 23, 2014 at 6:38 PM, Stavros Delisavas  
> wrote:
>> Thanks a lot,
>> those are great examples. I managed to get my cores working. What I
>> noticed so far is that the first (auto-created) core is symlinking files
>> to /etc/solr/...  or to /var/lib/solr/...
>>
>> I now am not sure where my self made-collections should be. Shall I
>> create folders in /usr/share/solr/ and symlink to my
>> files in /etc/solr or can I have hard-copies in my collection-folders?
>> Is /usr/share/solr/ a good place for my collection-folders at all?
>>
>>
>>
>> Am 23.01.2014 12:16, schrieb Alexandre Rafalovitch:
>>> You need config-dir level schema.xml, and solrconfig.xml. For multiple
>>> collections, you also need a top-level solr.xml. And unless the config
>>> files a lot of references to other files, you need nothing else.
>>>
>>> For examples, check the example directory in the distribution. Or have
>>> a look at examples from my book:
>>> https://github.com/arafalov/solr-indexing-book/tree/master/published .
>>> This shows the solr.xml that points at a lot of collections. The first
>>> nearly minimal collection is collection1, but you can then explore
>>> others for various degree of complexity.
>>>
>>> Regards,
>>>Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all
>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>> book)
>>>
>>>
>>> On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas  
>>> wrote:
 So far, I successfully managed to create a core from my existing
 configuration by opening this URL in my browser:

 http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr

 New status from http://localhost:8080/solr/admin/cores?action=STATUS is:

 
 
 0
 4
 
 
 
 
 /usr/share/solr/./
 /var/lib/solr/data/
 2014-01-23T08:42:39.087Z
 3056197
 
 4401029
 4401029
 1370010628806
 12
 true
 false
 
 org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801
 
 2013-10-29T14:17:22Z
 
 
 
 glPrototypeCore
 /etc/solr/
 /var/lib/solr/data/
 2014-01-23T09:29:30.019Z
 245267
 
 4401029
 4401029
 1370010628806
 12
 true
 false
 
 org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862
 
 2013-10-29T14:17:22Z
 
 
 
 


 From my understanding I now have an unnamed core and a core named
 "glPrototypeCore" which uses the same configuration.

 I copied the files data-config.xml, schema.xml into a new directory
 "/etc/solr/glinstance" and tried to create another core but this always
 throws me error 400. I even tried by adding the schema- and
 config-parameters with full path, but this did not lead to any
 difference. Also I don't understand what the "dataDir"-parameter is for.
 I could not find any data-directories in /etc/solr/ but the creation of
 the first core worked anyway.

 Can someone help? Is there any better place for my new
 instance-directory and what files do I really need?





 Am 23.01.2014 10:10, schrieb Stavros Delisavas:
> I didn't know that the "core"-term is associated with this use case. I
> expected it to be some technical feature that allows to run more
> solr-instances for better multithread-cpu-usage. For example to activate
> two solr-cores when two cpu-cores are available on the server.

Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-23 Thread Raymond Wiker
Maybe you could move (field2:val2 or field4:val4) into a filter? E.g,

q=(field1:val1 OR field2:val2 OR field3:val3 OR
field4:val4)&fq=(field2:val2 OR field4:val4)

If I have this correctly, the fq part should be evaluated first, and may
even be found in the filter cache.



On Thu, Jan 23, 2014 at 12:42 PM, svante karlsson  wrote:

> I have a solr db containing 1 billion records that I'm trying to use in a
> NoSQL fashion.
>
> What I want to do is find the best matches using all search terms but
> restrict the search space to the most unique terms
>
> In this example I know that val2 and val4 is rare terms and val1 and val3
> are more common. In my real scenario I'll have 20 fields that I want to
> include or exclude in the inner query depending on the uniqueness of the
> requested value.
>
>
> my first approach was:
> q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
> OR field4:val4)&rows=100&fl=*
>
> but what I think I get is
> .  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
> OR'ed with the rest
>
> if I write
> q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
> (field2:val2 OR field4:val4)&rows=100&fl=*
>
> then what I think I get is two sub-queries that is evaluated separately and
> then joined - performance wise this is bad.
>
> Whats the best way to write these types of queries?
>
>
> Are there any performance issues when running it on several solrcloud nodes
> vs a single instance or should it scale?
>
>
>
> /svante
>


Re: How to use Solr for two different projects on one server

2014-01-23 Thread Alexandre Rafalovitch
Just download Solr stack from the download page and practice on that.
That has all the startup scripts and relative paths set up.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Jan 23, 2014 at 7:00 PM, Stavros Delisavas  wrote:
> I installed solr via apt-get and followed the online tutorials that I
> found to adjust the existing schema.xml and created dataconfig.xml the
> way I needed them.
>
> Was this the wrong approach? I don't know what Bitname stack is.
>
>
>
>
> Am 23.01.2014 12:50, schrieb Alexandre Rafalovitch:
>> You are not doing this on a download distribution, do you? You are
>> using Bitnami stack or something. That's why you are not seeing the
>> examples folder, etc.
>>
>> I recommend step back, use downloaded distribution and do your
>> learning and setup using that. Then, go and see where your production
>> stack put various bits of Solr. Otherwise, you are doing two (15?)
>> things at once.
>>
>> Regards,
>>Alex.
>> P.s. If you like the examples, the book actually explains what they
>> do. You could be quarter way to mastery in less than 24 hours...
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Thu, Jan 23, 2014 at 6:38 PM, Stavros Delisavas  
>> wrote:
>>> Thanks a lot,
>>> those are great examples. I managed to get my cores working. What I
>>> noticed so far is that the first (auto-created) core is symlinking files
>>> to /etc/solr/...  or to /var/lib/solr/...
>>>
>>> I now am not sure where my self made-collections should be. Shall I
>>> create folders in /usr/share/solr/ and symlink to my
>>> files in /etc/solr or can I have hard-copies in my collection-folders?
>>> Is /usr/share/solr/ a good place for my collection-folders at all?
>>>
>>>
>>>
>>> Am 23.01.2014 12:16, schrieb Alexandre Rafalovitch:
 You need config-dir level schema.xml, and solrconfig.xml. For multiple
 collections, you also need a top-level solr.xml. And unless the config
 files a lot of references to other files, you need nothing else.

 For examples, check the example directory in the distribution. Or have
 a look at examples from my book:
 https://github.com/arafalov/solr-indexing-book/tree/master/published .
 This shows the solr.xml that points at a lot of collections. The first
 nearly minimal collection is collection1, but you can then explore
 others for various degree of complexity.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas  
 wrote:
> So far, I successfully managed to create a core from my existing
> configuration by opening this URL in my browser:
>
> http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr
>
> New status from http://localhost:8080/solr/admin/cores?action=STATUS is:
>
> 
> 
> 0
> 4
> 
> 
> 
> 
> /usr/share/solr/./
> /var/lib/solr/data/
> 2014-01-23T08:42:39.087Z
> 3056197
> 
> 4401029
> 4401029
> 1370010628806
> 12
> true
> false
> 
> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801
> 
> 2013-10-29T14:17:22Z
> 
> 
> 
> glPrototypeCore
> /etc/solr/
> /var/lib/solr/data/
> 2014-01-23T09:29:30.019Z
> 245267
> 
> 4401029
> 4401029
> 1370010628806
> 12
> true
> false
> 
> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862
> 
> 2013-10-29T14:17:22Z
> 
> 
> 
> 
>
>
> From my understanding I now have an unnamed core and a core named
> "glPrototypeCore" which uses the same configuration.
>
> I copied the files data-config.xml, schema.xml into a new directory
> "/etc/solr/glinstance" and tried to create another core but this always
> throws me error 400. I even tried by adding the schema- and
> config-parameters with full path, but this did not lead to any
> difference. Also I don't understand what the "dataDir"-parameter is for.
> I could

support for and Why remove dismax handler remove in solr 4.3

2014-01-23 Thread Viresh Modi
i checked solrconfig.xml in solr 4.3 and solr 1.4
In both i have checked

*Solr 1.4::*


*Solr 4.3::*



so how to handle dismax query type(qt) in solr 4.3
in solr 1.4.1 we have used qt=dismax
but solr 4.3 there is no such configuration.


so both give different result.
-- 

Regards,
Viresh Modi


RE: AIOOBException on trunk since 21st or 22nd build

2014-01-23 Thread Markus Jelsma
Ignore or throw proper error message for bad delete containing bad composite ID
https://issues.apache.org/jira/browse/SOLR-5659

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Thursday 23rd January 2014 12:16
> To: solr-user@lucene.apache.org
> Subject: RE: AIOOBException on trunk since 21st or 22nd build
> 
> Yeah, i can now also reproduce the problem with a build of the 20th! Again 
> the same nodes leader and replica. The problem seems to be in the data we're 
> sending to Solr. I'll check it out an file an issue.
> Cheers
> 
> -Original message-
> > From:Mark Miller 
> > Sent: Wednesday 22nd January 2014 18:56
> > To: solr-user 
> > Subject: Re: AIOOBException on trunk since 21st or 22nd build
> > 
> > Looking at the list of changes on the 21st and 22nd, I don’t see a smoking 
> > gun.
> > 
> > - Mark  
> > 
> > 
> > 
> > On Jan 22, 2014, 11:13:26 AM, Markus Jelsma  
> > wrote: Hi - this likely belongs to an existing open issue. We're seeing the 
> > stuff below on a build of the 22nd. Until just now we used builds of the 
> > 20th and didn't have the issue. This is either a bug or did some data 
> > format in Zookeeper change? Until now only two cores of the same shard 
> > through the error, all other nodes in the cluster are clean.
> > 
> > 2014-01-22 15:32:48,826 ERROR [solr.core.SolrCore] - [http-8080-exec-5] - : 
> > java.lang.ArrayIndexOutOfBoundsException: 1
> > at 
> > org.apache.solr.common.cloud.CompositeIdRouter$KeyParser.getHash(CompositeIdRouter.java:291)
> > at 
> > org.apache.solr.common.cloud.CompositeIdRouter.sliceHash(CompositeIdRouter.java:58)
> > at 
> > org.apache.solr.common.cloud.HashBasedRouter.getTargetSlice(HashBasedRouter.java:33)
> > at 
> > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:218)
> > at 
> > org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:961)
> > at 
> > org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
> > at 
> > org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:347)
> > at 
> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > at 
> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at 
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at 
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1915)
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:785)
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:203)
> > at 
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> > at 
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> > at 
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> > at 
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> > at 
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at 
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> > at 
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> > at 
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> > at 
> > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
> > at 
> > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
> > at 
> > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2282)
> > at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:724)
> > 
> 


Re: Storing MYSQL DATETIME field in solr as String

2014-01-23 Thread manju16832003
Hi Tariq,
I'm glad that helped you :-).

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solved-Storing-MYSQL-DATETIME-field-in-solr-as-String-tp4106836p4112979.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing MYSQL DATETIME field in solr as String

2014-01-23 Thread tariq
Hello manju,

Thank you! It's really helpful for me.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solved-Storing-MYSQL-DATETIME-field-in-solr-as-String-tp4106836p4112977.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting not working

2014-01-23 Thread Ahmet Arslan
Hi Fatima,

Did you re-index after that chance? You need to re-index your documents.

Ahmet



On Thursday, January 23, 2014 7:31 AM, Fatima Issawi  wrote:
Hi,

I have stored=true for my "content" field, but I get an error saying there is a 
mismatch of settings on that field (I think) because of the "term*=true"  
settings.

Thanks again,
Fatima




> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Wednesday, January 22, 2014 5:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Highlighting not working
> 
> Hi Fatima,
> 
> To enable higlighting (both standard and fastvector) you need to make
> stored="true".
> 
> Term vectors may speed up standard highlighter. Plus they are mandatory
> for FastVectorHighligher.
> 
> https://cwiki.apache.org/confluence/display/solr/Field+Properties+by+Use+
> Case
> 
> Ahmet
> 
> 
> 
> 
> 
> On Wednesday, January 22, 2014 10:44 AM, Fatima Issawi
>  wrote:
> Also my highlighting defaults...
> 
>   
>      
> 
>        
>        on
>        content documentname
>        html
>        
>        
>        0
>        documentname
>        3
>        200
>        content
>        750
> 
> 
> > -Original Message-
> > From: Fatima Issawi [mailto:issa...@qu.edu.qa]
> > Sent: Wednesday, January 22, 2014 11:34 AM
> > To: solr-user@lucene.apache.org
> > Subject: Highlighting not working
> >
> > Hello,
> >
> > I'm trying to highlight content that is returned from a Solr query,
> > but I can't seem to get it working.
> >
> > I would like to highlight the "documentname" and the "pagetext" or
> > "content" results, but when I run the search I don't get anything
> > returned. I thought that the "content" field is supposed to be used for
> hightlighting?
> > And that [termVectors="true" termPositions="true" termOffsets="true"]
> > needs to be added to the fields that need to be highlighted? Is there
> > something else I'm missing?
> >
> >
> > Here is my schema:
> >
> >     > required="true" multiValued="false" />
> >     > omitNorms="true"/>
> >     > stored="true" termVectors="true"  termPositions="true"
> > termOffsets="true"/>
> >    
> >   
> >     >stored="true"/>
> >    
> >     >stored="true"/>/>
> >     > termVectors="true" termPositions="true" termOffsets="true"/>
> >
> >     > multiValued="true" termVectors="true" termPositions="true"
> > termOffsets="true"/>
> >
> >     > multiValued="true"/>
> >
> >    
> >    
> >    
> >    
> >    
> >    
> >    
> >
> >
> > Thanks,
> > Fatima


Re: support for and Why remove dismax handler remove in solr 4.3

2014-01-23 Thread Ahmet Arslan
Hi Viresh,

defType=dismax should do the trick. By the way, example solrconfig.xml has an 
example of edismax query parser usage.



On Thursday, January 23, 2014 2:34 PM, Viresh Modi  
wrote:
i checked solrconfig.xml in solr 4.3 and solr 1.4
In both i have checked

*Solr 1.4::*


*Solr 4.3::*



so how to handle dismax query type(qt) in solr 4.3
in solr 1.4.1 we have used qt=dismax
but solr 4.3 there is no such configuration.


so both give different result.
-- 

Regards,
Viresh Modi



Re: Solr/Lucene Faceted Search Too Many Unique Values?

2014-01-23 Thread Toke Eskildsen
On Wed, 2014-01-22 at 23:59 +0100, Bing Hua wrote:
> I am going to evaluate some Lucene/Solr capabilities on handling faceted
> queries, in particular, with a single facet field that contains large number
> (say up to 1 million) of distinct values. Does anyone have some experience
> on how lucene performs in this scenario?

We facet on Author (11.5M unique values) and Subject (3.8M unique
values) on our 12M documents. Each individual document typically has a
low amount of authors and subjects. Two indexes of about 50GB each, 3GB
heap, 5GB RAM free for disk cache, SSD, 4 core Intel Xeon L5420@2.50GHz.

Response time is around 1-200 ms for most queries, some queries taking
1-2 seconds and 1-2% of queries taking 3-10 seconds.

We use a home-grown faceting system under Lucene, but previous tests
shows performance and memory requirements to be quite similar to Solr
faceting, as they use the same algorithm (assuming facet.method=fc).
I do not know how our performance is compared to Lucene faceting.


The dreaded "Too Many Unique Values" is not a performance problem, but a
hard limit on the number of unique values imposed by Solr fc-faceting.
16M, as far as I remember. I do not know if Lucene faceting has the same
limit.

- Toke Eskildsen, State and University Library, Denmark




Re: Searching and scoring with block join

2014-01-23 Thread Mikhail Khludnev
> Yes, that's correct.
>
> I also already tried the query you brought as example, but I have problems
> with the scoring.
> I'm using edismax as defType, but I'm not quite sure how to use it with a
> {!parent } query.
>

nesting query parsers is shown at
http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html

try to start from the following:
title:Test _query_:"{!parent which=is_parent:true}{!dismax
qf=content_de}Test"
mind about local params referencing eg {!... v=$nest}&nest=...


> For example, if I do this query, the score is always 0
> {!parent which=is_parent:true}+content_de:Test


> The blog says: ToParentBlockJoinQuery supports a few modes of score
> calculations. {!parent} parser has None mode hardcoded.
> So, can I change the hardcoded mode somehow? I didn't find any further
> documentation about the parameters of {!parent}.
>
there is no such parm in
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/join/BlockJoinParentQParser.java#L67
Raise an feature request issue, at least, don't hesitate to contribute.


> If I'm doing this request, the score seems only be calculated by the
> results found in "title".
> title:Test _query_:"{!parent which=is_parent:true}+content_de:Test"
>
> Sorry if I ask stupid questions but I just have started to work with solr
> and some techniques are not very familiar.
>
> Thanks
> -Gesh
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread stevenNabble
Hello,

I am finding that if any fields in a document returned by a Solr query
(*wt=json* to get a JSON response) contain backslash *'\'* characters, they
are not being escaped (to make then valid JSON).

e.g. Solr returns this: 'A quoted value *\"XXX\"*, plus these are
backslashes *\r\n* which should be escaped but aren't :-('

Any ideas? I shouldn't need to escape these values before submitting to the
Solr index but I can't see any other way at the moment...

Regards
Steven



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-solr-JSONResponseWriter-not-escaping-backslash-characters-tp4112990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimizing index on Slave

2014-01-23 Thread Michael Della Bitta
I'm not really aware enough of the Solr/Lucene internals to tell you
whether that's possible or not.

One thing occurred to me: What happens if you take optimize out of the
replication triggers in the replication handler?

optimize


Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Thu, Jan 23, 2014 at 12:36 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> Unfortunately we can't do sharding right now.
>
> If we optimize on master and slave separately the file names and sizes are
> same. I think it's just the version no that is different. Maybe if there
> was a to copy master version to slave that would resolve this issue?
>


Re: Possible regression for Solr 4.6.0 - commitWithin does not work with replicas

2014-01-23 Thread Shawn Heisey
On 12/11/2013 2:41 AM, Elodie Sannier wrote:
> collection fr_blue:
> - shard1 -> server-01 (replica1), server-01 (replica2)
> - shard2 -> server-02 (replica1), server-02 (replica2)
> 
> collection fr_green:
> - shard1 -> server-01 (replica1), server-01 (replica2)
> - shard2 -> server-02 (replica1), server-02 (replica2)

I'm pretty sure this won't affect the issue you've mentioned, but it's
worth pointing out.

If this is really how you've arranged your shard replicas, your system
cannot survive a failure, because you've got both replicas for each
shard on the same server.  If that server dies, half of each collection
will be gone.

Thanks,
Shawn



Re: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory

2014-01-23 Thread Shawn Heisey
On 1/23/2014 4:57 AM, saurish wrote:
> I am new to solr and successfully did a basic search. Now i am trying to do
> classification of the search results using carrrot's support which comes
> with solr 4.5.1. Would appreciate if someone tells me what is that i am
> missing...may be a trivial issue??!!!
> 
> I am getting the below error..*java.lang.NoClassDefFoundError:
> org/carrot2/core/ControllerFactory*. I know this error might be because of
> carrot2 classes not getting loaded. But if you look below the jars in the
> "../contrib/clustering/lib" directory are being loaded. but still i am
> getting the error. what might be the reason?
> 
> I am working with Solr 4.5.1 on tomcat 7.0.47.

Have you defined the sharedLib setting in your solr.xml file to point at
these jars?  If you have, you'll need to remove that.

I've summarized the issues with jar loading and sharedLib on the
12/Nov/2013 comment for this issue:

https://issues.apache.org/jira/browse/SOLR-4852

My recommendation for anyone that needs extra jars with Solr:  Remove
all  directives from solrconfig.xml.  Copy all jars that you
actually require to ${solr.solr.home}/lib.  Because you're on 4.5.1,
remove the sharedLib setting from solr.xml.  If you were on 4.2.1 or
earlier, you'd need the sharedLib setting, with lib as the value.

Thanks,
Shawn



Re: support for and Why remove dismax handler remove in solr 4.3

2014-01-23 Thread Shawn Heisey
On 1/23/2014 5:33 AM, Viresh Modi wrote:
> i checked solrconfig.xml in solr 4.3 and solr 1.4
> In both i have checked
> 
> *Solr 1.4::*
> 
> 
> *Solr 4.3::*
> 
> 
> 
> so how to handle dismax query type(qt) in solr 4.3
> in solr 1.4.1 we have used qt=dismax
> but solr 4.3 there is no such configuration.

Ahmet's reply is good information.  Here's a little more.

Only the dismax *handler* has been removed from Solr.  The dismax query
parser is alive and well.  There is also the edismax query parser, which
is probably what you should be using.  You can set defType to dismax or
edismax in the defaults section of any handler, or you can send the
parameter with the query.

A rather important change was also made to handler names.  It is highly
recommended that you name them starting with a forward slash.  For
example, if you make a handler named "/foo" then you can simply use a
URL like this:

http://server:port/solr/corename/foo?q=test

This is why the example's main handler is now named "/select" instead of
"standard" as it was in older versions.  The SolrJ library now includes
a method for setting the request handler:

http://lucene.apache.org/solr/4_6_0/solr-solrj/org/apache/solr/client/solrj/SolrQuery.html#setRequestHandler%28java.lang.String%29

Switching handlers with the "qt" parameter is now deprecated.  It is
still supported if you set handleSelect="true" on the requestDispatcher
config element, but it's likely this capability will disappear in Solr 5.0.

https://cwiki.apache.org/confluence/display/solr/RequestDispatcher+in+SolrConfig

Thanks,
Shawn



Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread Chris Hostetter

: I am finding that if any fields in a document returned by a Solr query
: (*wt=json* to get a JSON response) contain backslash *'\'* characters, they
: are not being escaped (to make then valid JSON).

you're going to have to give us more concrete specifics on how you are 
indexing your data, and how you are looking at the response, because i 
can't reproduce anything close to what you are describing (see below)

https://wiki.apache.org/solr/UsingMailingLists




hossman@frisbee:~$ cat tmp/tmp.xml 

  
HOSS
quote: (") backslash: (\) backslash-quote: (\") 
newline: (
) backslash-n: (\n)
  



hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/update?commit=true' --data-binary 
@tmp/tmp.xml -H 'Content-Type: application/xml'


0678




hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=xml'




  
HOSS
quote: (") backslash: (\) backslash-quote: (\") 
newline: (
) backslash-n: (\n)
1458038035233898496




hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=json'
{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"name":"quote: (\") backslash: (\\) backslash-quote: (\\\") 
newline: (\n) backslash-n: (\\n)",
"_version_":1458038035233898496}]
  }}


hossman@frisbee:~$ cat tmp/tmp.json
[
 {"id" : "HOSS", 
  "name" : "quote: (\") backslash: (\\) backslash-quote: (\\\") newline: 
(\n) backslash-n: (\\n)"}
]


hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/update?commit=true' --data-binary 
@tmp/tmp.json -H 'Content-Type: application/json'
{"responseHeader":{"status":0,"QTime":605}}


hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=xml'




  
HOSS
quote: (") backslash: (\) backslash-quote: (\") 
newline: (
) backslash-n: (\n)
1458038130437259264




hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=json'
{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"name":"quote: (\") backslash: (\\) backslash-quote: (\\\") 
newline: (\n) backslash-n: (\\n)",
"_version_":1458038130437259264}]
  }}




-Hoss
http://www.lucidworks.com/


Re: Interesting search question! How to match documents based on the least number of fields that match all query terms?

2014-01-23 Thread Daniel Shane
Thanks Frank, Mikhail & Robert for your input!

I'm looking into your ideas, and running a few test queries to see how it works 
out. I have a feeling that it is more tricky that it sounds, for example, lets 
say I have 3 docs in my index:

Doc1:

m1: a b c d
m2: a b c
m3: a b
m4: a
mAll: a b c d / a b c / a b / a

Doc 2:

m1: a b c 
m2: b c d
m3: 
m4:
mAll: a b c / b c d

Doc 3:

m1: a 
m2: b
m3: c
m4: d
mAll: a / b / c / d

If the search terms are a b c d, then all 3 docs will match, since each of the 
search terms are in the metas. However, the sorting should give this order:

doc1 (1 field matches all terms)
doc2 (2 fields match all terms)
doc3 (4 fields match all terms)

I'll try out your ideas and let you know how it works out!

Daniel Shane



- Original Message -
From: "Franck Brisbart" 
To: solr-user@lucene.apache.org
Sent: Thursday, January 23, 2014 3:12:36 AM
Subject: RE: Interesting search question! How to match documents based on the 
least number of fields that match all query terms?

Hi Daniel,

you can also consider using negative boosts.
This can't be done with solr, but docs which don't match the metadata
can be boosted.

This might do what you want :
-metadata1:(term1 AND ... AND termN)^2
-metadata2:(term1 AND ... AND termN)^2
.
-metadataN:(term1 AND ... AND termN)^2
allMetadatas :(term1 AND ... AND termN)^0.5


Franck Brisbart



Le mercredi 22 janvier 2014 à 19:38 +, Petersen, Robert a écrit :
> Hi Daniel,
> 
> How about trying something like this (you'll have to play with the boosts to 
> tune this), search all the fields with all the terms using edismax and use 
> the minimum should match parameter, but require all terms to match in the 
> allMetadata field.
> https://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29
> 
> Lucene query syntax below to give you the general idea, but this query would 
> require all terms to be in one of the metadata fields to get the boost.
> 
> metadata1:(term1 AND ... AND termN)^2
> metadata2:(term1 AND ... AND termN)^2
> .
> metadataN:(term1 AND ... AND termN)^2
> allMetadatas :(term1 AND ... AND termN)^0.5
> 
> That should do approximately what you want,
> Robi
> 
> -Original Message-
> From: Daniel Shane [mailto:sha...@lexum.com] 
> Sent: Tuesday, January 21, 2014 8:42 AM
> To: solr-user@lucene.apache.org
> Subject: Interesting search question! How to match documents based on the 
> least number of fields that match all query terms?
> 
> I have an interesting solr/lucene question and its quite possible that some 
> new features in solr might make this much easier that what I am about to try. 
> If anyone has a clever idea on how to do this search, please let me know!
> 
> Basically, lets state that I have an index in which each documents has a 
> content and several metadata fields.
> 
> Document Fields:
> 
> content
> metadata1
> metadata2
> .
> metadataN
> allMetadatas (all the terms indexed in metadata1...N are concatenated in this 
> field) 
> 
> Assuming that I am searching for documents that contains a certain number of 
> terms (term1 to termN) in their metadata fields, I would like to build a 
> search query that will return document that satisfy these requirement:
> 
> a) All search terms must be present in a metadata field. This is quite easy, 
> we can simply search in the field allMetadatas and that will work fine.
> 
> b) Now for the hard part, we prefer document in which we found the metadatas 
> in the *least number of different fields*. So if one document contains all 
> the search terms in 10 different fields, but another document contains all 
> search terms but in only 8 fields, we would like those to sort first. 
> 
> My first idea was to index terms in the allMetadatas using payloads. Each 
> indexed term would also have the specific metadataN field from which they 
> originate. Then I can write a scorer to score based on these payloads. 
> 
> However, if there is a way to do this without payloads I'm all ears!
> 



SolrCloud 4.6.0: OutOfMemoryError on Shard Split

2014-01-23 Thread Will Butler
We have a 125GB shard that we are attempting to split, but each time we try to 
do so, we eventually run out of memory (java.lang.OutOfMemoryError: GC overhead 
limit exceeded). We have attempted it with the following heap sizes on the 
shard leader: 4GB, 6GB, 12GB, and 24GB. Even if it does eventually work with 
more heap, should I have to increase the heap size at all to do a split? Has 
anyone successfully done a split with a shard of this size using SolrCloud 
4.6.0?

Thanks,

Will

Berlin Buzzwords 2014: CfP is open

2014-01-23 Thread Isabel Drost-Fromm
I'm super happy to announce that the call for submissions for Berlin
Buzzwords 2013 is open. For those who don't know the conference - in
my "absolutely objective opinion" the event is the most exciting
conference on storing, processing and searching large amounts of
digital data for engineers.

The 5th edition of Berlin Buzzwords will take place on May 25-28,
2014 at Kulturbrauerei Berlin.

Berlin Buzzwords is looking for speakers who submit talks on the
following topics:

* Information Retrieval / Search i.e. Lucene, Solr, katta, ElasticSearch or
comparable solutions

* NoSQL and SQL i.e. CouchDB, MongoDB, Jackrabbit, Hbase and others

* Large Data Processing i.e. Hadoop itself, MapReduce, Cascading, Pig,
Spark and friends

Closely related topics not explicity listed above are welcome as well.

The Call for Submissions will be open until February 9! Be part of
Berlin Buzzwords and submit your session idea. Please register here:
.

Looking forward to lots of interesting proposals - and looking forward to
meeting all of you in Berlin later this year (did I mention that Berlin
rocks in summer?)


Isabel

PS: As always, any help with spreading the word is highly welcome.

PS2: One final hint - even though speakers of course get a complimentary
conference pass make sure to still check out our ticket page in
particular if you'd like to bring your children to the conference - we
do provide child day care on a donation basis but need your registration
for capacity planning: http://berlinbuzzwords.de/tickets



Re: SolrCloud 4.6.0: OutOfMemoryError on Shard Split

2014-01-23 Thread Shalin Shekhar Mangar
This is a known issue. Solr 4.7 will bring some relief.

See https://issues.apache.org/jira/browse/SOLR-5214


On Thu, Jan 23, 2014 at 10:10 PM, Will Butler  wrote:
> We have a 125GB shard that we are attempting to split, but each time we try 
> to do so, we eventually run out of memory (java.lang.OutOfMemoryError: GC 
> overhead limit exceeded). We have attempted it with the following heap sizes 
> on the shard leader: 4GB, 6GB, 12GB, and 24GB. Even if it does eventually 
> work with more heap, should I have to increase the heap size at all to do a 
> split? Has anyone successfully done a split with a shard of this size using 
> SolrCloud 4.6.0?
>
> Thanks,
>
> Will



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr in non-persistent mode

2014-01-23 Thread Mark Miller
Yeah, I think we removed support in the new solr.xml format. It should still 
work with the old format.  

If you have a good use case for it, I don’t know that we couldn’t add it back 
with the new format.

- Mark  



On Jan 23, 2014, 3:26:05 AM, Per Steffensen  wrote: Hi

In Solr 4.0.0 I used to be able to run with persistent=false (in
solr.xml). I can see
(https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml)
that persistent is no longer supported in solr.xml. Does this mean that
you cannot run in non-persistent mode any longer, or does it mean that I
have to configure it somewhere else?

Thanks!

Regards, Per Steffensen


Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread stevenNabble
Hi Chris,

thanks for the fast response. I'll try to be more specific about the
problem I am having.

# cat tmp.xml


9553522
quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)





# curl 'http://localhost:8983/solr/collection1/update?commit=true'
--data-binary @tmp.xml -H 'Content-Type: application/xml'


0134




# curl '
http://localhost:8983/solr/collection1/select?q=id:9553522&indent=true&omitHeader=true&wt=xml
'




  
9553522
quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)

  quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)


  quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)


  quote: (") backslash: (\)
backslash-quote: (\")
newline: (
) backslash-n: (\n)

1458042122530717696





# curl '
http://localhost:8983/solr/collection1/select?q=id:9553522&indent=true&omitHeader=true&wt=json&fl=id,comments,_version_
'
{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"9553522",
"comments":"quote: (\") backslash: (\\) \nbackslash-quote: (\\\")
\nnewline: ( \n) backslash-n: (\\n)",
"_version_":1458042122530717696}]
  }}



So my setup gives the same responses as yours.

The problem I have is if I try to parse this response in *php *using
*json_decode()* I get a syntax error because of the '*\n*' s that are in
the response. I could escape the before doing the *json_decode() *or at the
point of submitting to the index but this seems wrong...

I am probably doing something silly and a good nights sleep will reveal
what I am doing wrong ;-)

Thanks
Steven



On 23 January 2014 16:15, Chris Hostetter-3 [via Lucene] <
ml-node+s472066n4113017...@n3.nabble.com> wrote:

>
> : I am finding that if any fields in a document returned by a Solr query
> : (*wt=json* to get a JSON response) contain backslash *'\'* characters,
> they
> : are not being escaped (to make then valid JSON).
>
> you're going to have to give us more concrete specifics on how you are
> indexing your data, and how you are looking at the response, because i
> can't reproduce anything close to what you are describing (see below)
>
> https://wiki.apache.org/solr/UsingMailingLists
>
>
>
>
> hossman@frisbee:~$ cat tmp/tmp.xml
> 
>   
> HOSS
> quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)
>   
> 
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/update?commit=true' --data-binary
> @tmp/tmp.xml -H 'Content-Type: application/xml'
> 
> 
> 0 name="QTime">678
> 
>
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=xml'
> 
> 
>
> 
>   
> HOSS
> quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)
> 1458038035233898496
> 
> 
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=json'
> {
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"HOSS",
> "name":"quote: (\") backslash: (\\) backslash-quote: (\\\")
> newline: (\n) backslash-n: (\\n)",
> "_version_":1458038035233898496}]
>   }}
>
>
> hossman@frisbee:~$ cat tmp/tmp.json
> [
>  {"id" : "HOSS",
>   "name" : "quote: (\") backslash: (\\) backslash-quote: (\\\") newline:
> (\n) backslash-n: (\\n)"}
> ]
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/update?commit=true' --data-binary
> @tmp/tmp.json -H 'Content-Type: application/json'
> {"responseHeader":{"status":0,"QTime":605}}
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=xml'
> 
> 
>
> 
>   
> HOSS
> quote: (") backslash: (\) backslash-quote: (\")
> newline: (
> ) backslash-n: (\n)
> 1458038130437259264
> 
> 
>
>
> hossman@frisbee:~$ curl '
> http://localhost:8983/solr/collection1/select?q=id:HOSS&indent=true&omitHeader=true&wt=json'
> {
>   "response":{"numFound":1,"start":0,"docs":[
>   {
> "id":"HOSS",
> "name":"quote: (\") backslash: (\\) backslash-quote: (\\\")
> newline: (\n) backslash-n: (\\n)",
> "_version_":1458038130437259264}]
>   }}
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-solr-JSONResponseWriter-not-escaping-backslash-characters-tp4112990p4113017.html
>  To unsubscribe from Solr solr.JSONResponseWriter not escaping backslash
> '\' characters, click 
> here
> .
> NAML

Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread Mark Miller
Try changing your solrconfig.xml. Look for the following:

  
 
text/plain; charset=UTF-8
  

See if you have any luck changing that to application/json. The driving reason 
for this text/plain default is those newlines allow the browser to display a 
more formatted response to the user.

Trappy default for standard use unfortunetly.

- Mark

On Jan 23, 2014, at 9:20 AM, stevenNabble  wrote:

> Hello,
> 
> I am finding that if any fields in a document returned by a Solr query
> (*wt=json* to get a JSON response) contain backslash *'\'* characters, they
> are not being escaped (to make then valid JSON).
> 
> e.g. Solr returns this: 'A quoted value *\"XXX\"*, plus these are
> backslashes *\r\n* which should be escaped but aren't :-('
> 
> Any ideas? I shouldn't need to escape these values before submitting to the
> Solr index but I can't see any other way at the moment...
> 
> Regards
> Steven
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-solr-JSONResponseWriter-not-escaping-backslash-characters-tp4112990.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: core.properties and solr.xml

2014-01-23 Thread Steven Bower
For us we don't fully rely on cloud/collections api for creating and
deploying instances/etc.. we control this via an external mechanism so this
would allow me to have instances figure out what they should be based on an
external system.. we do this now but have to drop core.properties files all
over.. i'd like to not have to do that... its more of a desire for
cleanliness of my filesystem than anything else because this is all
automated at this point..


On Wed, Jan 15, 2014 at 1:49 PM, Mark Miller  wrote:

> What’s the benefit? So you can avoid having a simple core properties file?
> I’d rather see more value than that prompt exposing something like this to
> the user. It’s a can of warms that I personally have not seen a lot of
> value in yet.
>
> Whether we mark it experimental or not, this adds a burden, and I’m still
> wondering if the gains are worth it.
>
> - Mark
>
> On Jan 15, 2014, at 12:04 PM, Alan Woodward  wrote:
>
> > This is true.  But if we slap big "warning: experimental" messages all
> over it, then users can't complain too much about backwards-compat breaks.
>  My intention when pulling all this stuff into the CoresLocator interface
> was to allow other implementations to be tested out, and other suggestions
> have already come up from time to time on the list.  It seems a shame to
> *not* allow this to be opened up for advanced users.
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 15 Jan 2014, at 16:24, Mark Miller wrote:
> >
> >> I think these API’s are pretty new and deep to want to support them for
> users at this point. It constrains refactoring and can complicates things
> down the line, especially with SolrCloud. This same discussion has come up
> in JIRA issues before. At best, I think all the recent refactoring in this
> area needs to bake.
> >>
> >> - Mark
> >>
> >> On Jan 15, 2014, at 11:01 AM, Alan Woodward  wrote:
> >>
> >>> I think solr.xml is the correct place for it, and you can then set up
> substitution variables to allow it to be set by environment variables, etc.
>  But let's discuss on the JIRA ticket.
> >>>
> >>> Alan Woodward
> >>> www.flax.co.uk
> >>>
> >>>
> >>> On 15 Jan 2014, at 15:39, Steven Bower wrote:
> >>>
>  I will open up a JIRA... I'm more concerned over the core locator
> stuff vs
>  the solr.xml.. Should the specification of the core locator go into
> the
>  solr.xml or via some other method?
> 
>  steve
> 
> 
>  On Tue, Jan 14, 2014 at 5:06 PM, Alan Woodward 
> wrote:
> 
> > Hi Steve,
> >
> > I think this is a great idea.  Currently the implementation of
> > CoresLocator is picked depending on the type of solr.xml you have
> (new- vs
> > old-style), but it should be easy enough to extend the new-style
> logic to
> > optionally look up and instantiate a plugin implementation.
> >
> > Core loading and new core creation is all done through the CL now,
> so as
> > long as the plugin implemented all methods, it shouldn't break the
> > Collections API either.
> >
> > Do you want to open a JIRA?
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 14 Jan 2014, at 19:20, Erick Erickson wrote:
> >
> >> The work done as part of "new style" solr.xml, particularly by
> >> romsegeek should make this a lot easier. But no, there's no formal
> >> support for such a thing.
> >>
> >> There's also a desire to make ZK "the one source of truth" in Solr
> 5,
> >> although that effort is in early stages.
> >>
> >> Which is a long way of saying that I think this would be a good
> thing
> >> to add. Currently there's no formal way to specify one though. We'd
> >> have to give some thought as to what abstract methods are required.
> >> The current "old style" and "new style" classes . There's also the
> >> chicken-and-egg question; how does one specify the new class? This
> >> seems like something that would be in a (very small) solr.xml or
> >> specified as a sysprop. And knowing where to load the class from
> could
> >> be "interesting".
> >>
> >> A pluggable SolrConfig I think is a stickier wicket, it hasn't been
> >> broken out into nice interfaces like coreslocator has been. And it's
> >> used all over the place, passed in and recorded in constructors etc,
> >> as well as being possibly unique for each core. There's been some
> talk
> >> of sharing a single config object, and there's also talk about using
> >> "config sets" that might address some of those concerns, but neither
> >> one has gotten very far in 4x land.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Tue, Jan 14, 2014 at 1:41 PM, Steven Bower <
> smb-apa...@alcyon.net>
> > wrote:
> >>> Are there any plans/tickets to allow for pluggable SolrConf and
> >>> CoreLocator? In my use case my solr.xml is totally static, i have a
> >>> separate dataDir and my core.properties

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
Thanks for suggestions. After reading that document I feel even more
confused though because I always thought that hard commits should be less
frequent that hard commits.

Is there any way to configure autoCommit, softCommit values on a per
request basis? The majority of the time we have small flow of updates
coming in and we would like to see them in ASAP. However we occasionally
need to do some bulk indexing (once a week or less) and the need to see
those updates right away isn't as critical.

I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
and the other 5% is "Index-Heavy Query-Light/Heavy" mode.

Thanks


On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson wrote:

> When you're doing hard commits, is it with openSeacher = true or
> false? It should probably be false...
>
> Here's a rundown of the soft/hard commit consequences:
>
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> I suspect (but, of course, can't prove) that you're over-committing
> and hitting segment
> merges without meaning to...
>
> FWIW,
> Erick
>
> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev 
> wrote:
> > A suggestion would be to hard commit much less often, ie every 10
> > minutes, and see if there is a change.
> >
> > - Will try this
> >
> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
> cache ?
> >
> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
> > index size is only 5GB
> >
> > Ah, and what about network IO ? Could that be a limiting factor ?
> >
> > - What is the size of your documents ? A few KB, MB, ... ?
> >
> > Under 1MB
> >
> > - Again, total index size is only 5GB so I dont know if this would be a
> > problem
> >
> >
> >
> >
> >
> >
> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
> > wrote:
> >
> >> 1 node having more load should be the leader (because of the extra work
> >> of receiving and distributing updates, but my experiences show only a
> >> bit more CPU usage, and no difference in disk IO).
> >>
> >> A suggestion would be to hard commit much less often, ie every 10
> >> minutes, and see if there is a change.
> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
> cache
> >> ?
> >> What is the size of your documents ? A few KB, MB, ... ?
> >> Ah, and what about network IO ? Could that be a limiting factor ?
> >>
> >>
> >> André
> >>
> >>
> >> On 2014-01-21 23:40, Software Dev wrote:
> >>
> >>> Any other suggestions?
> >>>
> >>>
> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
> static.void@gmail.com>
> >>> wrote:
> >>>
> >>>  4.6.0
> 
> 
>  On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller   >wrote:
> 
>   What version are you running?
> >
> > - Mark
> >
> > On Jan 20, 2014, at 5:43 PM, Software Dev  >
> > wrote:
> >
> >  We also noticed that disk IO shoots up to 100% on 1 of the nodes. Do
> >> all
> >> updates get sent to one machine or something?
> >>
> >>
> >> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
> >>
> > static.void@gmail.com>wrote:
> >
> >> We commit have a soft commit every 5 seconds and hard commit every
> 30.
> >>>
> >> As
> >
> >> far as docs/second it would guess around 200/sec which doesn't seem
> >>>
> >> that
> >
> >> high.
> >>>
> >>>
> >>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
> >>>
> >> erickerick...@gmail.com>wrote:
> >
> >> Questions: How often do you commit your updates? What is your
>  indexing rate in docs/second?
> 
>  In a SolrCloud setup, you should be using a CloudSolrServer. If
> the
>  server is having trouble keeping up with updates, switching to
> CUSS
>  probably wouldn't help.
> 
>  So I suspect there's something not optimal about your setup that's
>  the culprit.
> 
>  Best,
>  Erick
> 
>  On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
> 
> >>> static.void@gmail.com>
> >
> >> wrote:
> 
> > We are testing our shiny new Solr Cloud architecture but we are
> > experiencing some issues when doing bulk indexing.
> >
> > We have 5 solr cloud machines running and 3 indexing machines
> >
>  (separate
> >
> >> from the cloud servers). The indexing machines pull off ids from a
> >
>  queue
> >
> >> then they index and ship over a document via a CloudSolrServer. It
> >
>  appears
> 
> > that the indexers are too fast because the load (particularly
> disk
> >
>  io)
> >
> >> on
> 
> > the solr cloud machines spikes through the roof making the entire
> >
>  cluster
> 
> > unusable. It's kind of odd because the total index size is not
> even
> >>>

Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
Also, any suggestions on debugging? What should I look for and how? Thanks


On Thu, Jan 23, 2014 at 10:01 AM, Software Dev wrote:

> Thanks for suggestions. After reading that document I feel even more
> confused though because I always thought that hard commits should be less
> frequent that hard commits.
>
> Is there any way to configure autoCommit, softCommit values on a per
> request basis? The majority of the time we have small flow of updates
> coming in and we would like to see them in ASAP. However we occasionally
> need to do some bulk indexing (once a week or less) and the need to see
> those updates right away isn't as critical.
>
> I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
> and the other 5% is "Index-Heavy Query-Light/Heavy" mode.
>
> Thanks
>
>
> On Wed, Jan 22, 2014 at 5:33 PM, Erick Erickson 
> wrote:
>
>> When you're doing hard commits, is it with openSeacher = true or
>> false? It should probably be false...
>>
>> Here's a rundown of the soft/hard commit consequences:
>>
>>
>> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> I suspect (but, of course, can't prove) that you're over-committing
>> and hitting segment
>> merges without meaning to...
>>
>> FWIW,
>> Erick
>>
>> On Wed, Jan 22, 2014 at 1:46 PM, Software Dev 
>> wrote:
>> > A suggestion would be to hard commit much less often, ie every 10
>> > minutes, and see if there is a change.
>> >
>> > - Will try this
>> >
>> > How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache ?
>> >
>> > - We have 18G of ram 12 dedicated to Solr but as of right now the total
>> > index size is only 5GB
>> >
>> > Ah, and what about network IO ? Could that be a limiting factor ?
>> >
>> > - What is the size of your documents ? A few KB, MB, ... ?
>> >
>> > Under 1MB
>> >
>> > - Again, total index size is only 5GB so I dont know if this would be a
>> > problem
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jan 22, 2014 at 12:26 AM, Andre Bois-Crettez
>> > wrote:
>> >
>> >> 1 node having more load should be the leader (because of the extra work
>> >> of receiving and distributing updates, but my experiences show only a
>> >> bit more CPU usage, and no difference in disk IO).
>> >>
>> >> A suggestion would be to hard commit much less often, ie every 10
>> >> minutes, and see if there is a change.
>> >> How much system RAM ? JVM Heap ? Enough space in RAM for system disk
>> cache
>> >> ?
>> >> What is the size of your documents ? A few KB, MB, ... ?
>> >> Ah, and what about network IO ? Could that be a limiting factor ?
>> >>
>> >>
>> >> André
>> >>
>> >>
>> >> On 2014-01-21 23:40, Software Dev wrote:
>> >>
>> >>> Any other suggestions?
>> >>>
>> >>>
>> >>> On Mon, Jan 20, 2014 at 2:49 PM, Software Dev <
>> static.void@gmail.com>
>> >>> wrote:
>> >>>
>> >>>  4.6.0
>> 
>> 
>>  On Mon, Jan 20, 2014 at 2:47 PM, Mark Miller >  >wrote:
>> 
>>   What version are you running?
>> >
>> > - Mark
>> >
>> > On Jan 20, 2014, at 5:43 PM, Software Dev <
>> static.void@gmail.com>
>> > wrote:
>> >
>> >  We also noticed that disk IO shoots up to 100% on 1 of the nodes.
>> Do
>> >> all
>> >> updates get sent to one machine or something?
>> >>
>> >>
>> >> On Mon, Jan 20, 2014 at 2:42 PM, Software Dev <
>> >>
>> > static.void@gmail.com>wrote:
>> >
>> >> We commit have a soft commit every 5 seconds and hard commit every
>> 30.
>> >>>
>> >> As
>> >
>> >> far as docs/second it would guess around 200/sec which doesn't seem
>> >>>
>> >> that
>> >
>> >> high.
>> >>>
>> >>>
>> >>> On Mon, Jan 20, 2014 at 2:26 PM, Erick Erickson <
>> >>>
>> >> erickerick...@gmail.com>wrote:
>> >
>> >> Questions: How often do you commit your updates? What is your
>>  indexing rate in docs/second?
>> 
>>  In a SolrCloud setup, you should be using a CloudSolrServer. If
>> the
>>  server is having trouble keeping up with updates, switching to
>> CUSS
>>  probably wouldn't help.
>> 
>>  So I suspect there's something not optimal about your setup
>> that's
>>  the culprit.
>> 
>>  Best,
>>  Erick
>> 
>>  On Mon, Jan 20, 2014 at 4:00 PM, Software Dev <
>> 
>> >>> static.void@gmail.com>
>> >
>> >> wrote:
>> 
>> > We are testing our shiny new Solr Cloud architecture but we are
>> > experiencing some issues when doing bulk indexing.
>> >
>> > We have 5 solr cloud machines running and 3 indexing machines
>> >
>>  (separate
>> >
>> >> from the cloud servers). The indexing machines pull off ids from a
>> >
>>  queue
>> >
>> >> then they index and ship over a document via a CloudSolrServer. It
>> >
>>  appears
>> 
>>

Re: Solr solr.JSONResponseWriter not escaping backslash '\' characters

2014-01-23 Thread Chris Hostetter

: The problem I have is if I try to parse this response in *php *using
: *json_decode()* I get a syntax error because of the '*\n*' s that are in
: the response. I could escape the before doing the *json_decode() *or at the
: point of submitting to the index but this seems wrong...

I don't really know anything about PHP, but i managed to muddle my way 
through both of the little experiments below and couldn't reporoduce any 
error from json_decode when the response contains "\n" (ie: the two byte 
sequence represnting an escaped newline character) inside of a JSON 
string, but i do get the expected error if a literal, one byte, newline 
character is in the string. (something that Solr doesn't do)

are you sure when you fetch the data from Solr you aren't pre-parsing it 
in some what that's evaluating hte "\n" and converting it to a real 
newline?

: I am probably doing something silly and a good nights sleep will reveal
: what I am doing wrong ;-)

Good luck.

### Experiment #1, locally crated strings, one bogus json

hossman@frisbee:~$ php -a
Interactive shell

php > $valid = '{"id": "newline: (\n)"}';
php > $bogus = "{\"id\": \"newline: (\n)\"}";
php > var_dump($valid);
string(23) "{"id": "newline: (\n)"}"
php > var_dump($bogus);
string(22) "{"id": "newline: (
)"}"
php > var_dump(json_decode($valid));
object(stdClass)#1 (1) {
  ["id"]=>
  string(12) "newline: (
)"
}
php > var_dump(json_decode($bogus));
NULL
php > var_dump(json_last_error());
int(4)


### Experiment #2, fetching json data from Solr...

hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&indent=true&omitHeader=true'
{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"name":"quote: (\") backslash: (\\) backslash-quote: (\\\") newline: 
(\n) backslash-n: (\\n)",
"_version_":1458038130437259264}]
  }}
hossman@frisbee:~$ php -a
Interactive shell

php > $data = 
file_get_contents('http://localhost:8983/solr/collection1/select?q=id:HOSS&wt=json&indent=true&omitHeader=true');
 
php > var_dump($data);
string(227) "{
  "response":{"numFound":1,"start":0,"docs":[
  {
"id":"HOSS",
"name":"quote: (\") backslash: (\\) backslash-quote: (\\\") newline: 
(\n) backslash-n: (\\n)",
"_version_":1458038130437259264}]
  }}
"
php > var_dump(json_decode($data));
object(stdClass)#1 (1) {
  ["response"]=>
  object(stdClass)#2 (3) {
["numFound"]=>
int(1)
["start"]=>
int(0)
["docs"]=>
array(1) {
  [0]=>
  object(stdClass)#3 (3) {
["id"]=>
string(4) "HOSS"
["name"]=>
string(78) "quote: (") backslash: (\) backslash-quote: (\") newline: (
) backslash-n: (\n)"
["_version_"]=>
int(1458038130437259264)
  }
}
  }
}



-Hoss
http://www.lucidworks.com/


Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Shawn Heisey

On 1/23/2014 11:01 AM, Software Dev wrote:

Is there any way to configure autoCommit, softCommit values on a per
request basis? The majority of the time we have small flow of updates
coming in and we would like to see them in ASAP. However we occasionally
need to do some bulk indexing (once a week or less) and the need to see
those updates right away isn't as critical.

I would say 95% of the time we are in "Index-Light Query-Light/Heavy" mode
and the other 5% is "Index-Heavy Query-Light/Heavy" mode.


One thing missing on that searchhub page is the commitWithin parameter.  
This is a parameter that will ensure that any documents added by that 
update request will be committed within the number of milliseconds 
given.  This is particularly useful for bursty updates, because if all 
your updates are done before the commitWithin time expires, a single 
commit will get all of them, not just the first one.


http://wiki.apache.org/solr/CommitWithin

Since Solr 4.0, commitWithin will result in a soft commit. With 4.2 and 
later, it can optionally be changed to a hard commit.


https://issues.apache.org/jira/browse/SOLR-4370

If you're using SolrCloud with a distributed index, some versions may 
not work as expected when using commitWithin:


https://issues.apache.org/jira/browse/SOLR-5658

Thanks,
Shawn



Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Otis Gospodnetic
Hi,

Have you tried maxWriteMBPerSec?

http://search-lucene.com/?q=maxWriteMBPerSec&fc_project=Solr

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 20, 2014 at 4:00 PM, Software Dev wrote:

> We are testing our shiny new Solr Cloud architecture but we are
> experiencing some issues when doing bulk indexing.
>
> We have 5 solr cloud machines running and 3 indexing machines (separate
> from the cloud servers). The indexing machines pull off ids from a queue
> then they index and ship over a document via a CloudSolrServer. It appears
> that the indexers are too fast because the load (particularly disk io) on
> the solr cloud machines spikes through the roof making the entire cluster
> unusable. It's kind of odd because the total index size is not even
> large..ie, < 10GB. Are there any optimization/enhancements I could try to
> help alleviate these problems?
>
> I should note that for the above collection we have only have 1 shard thats
> replicated across all machines so all machines have the full index.
>
> Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> updates get sent to 1 machine and 1 machine only? We could then remove this
> machine from our cluster than that handles user requests.
>
> Thanks for any input.
>


Re: Solr Cloud Bulk Indexing Questions

2014-01-23 Thread Software Dev
Does maxWriteMBPerSec apply to NRTCachingDirectoryFactory? I only
see maxMergeSizeMB and maxCachedMB as configuration values.


On Thu, Jan 23, 2014 at 11:05 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> Have you tried maxWriteMBPerSec?
>
> http://search-lucene.com/?q=maxWriteMBPerSec&fc_project=Solr
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Mon, Jan 20, 2014 at 4:00 PM, Software Dev  >wrote:
>
> > We are testing our shiny new Solr Cloud architecture but we are
> > experiencing some issues when doing bulk indexing.
> >
> > We have 5 solr cloud machines running and 3 indexing machines (separate
> > from the cloud servers). The indexing machines pull off ids from a queue
> > then they index and ship over a document via a CloudSolrServer. It
> appears
> > that the indexers are too fast because the load (particularly disk io) on
> > the solr cloud machines spikes through the roof making the entire cluster
> > unusable. It's kind of odd because the total index size is not even
> > large..ie, < 10GB. Are there any optimization/enhancements I could try to
> > help alleviate these problems?
> >
> > I should note that for the above collection we have only have 1 shard
> thats
> > replicated across all machines so all machines have the full index.
> >
> > Would we benefit from switching to a ConcurrentUpdateSolrServer where all
> > updates get sent to 1 machine and 1 machine only? We could then remove
> this
> > machine from our cluster than that handles user requests.
> >
> > Thanks for any input.
> >
>


SOLR 4.4 - Slave always replicates full index

2014-01-23 Thread sureshrk19
Hi,

I have configured single core master, slave nodes on 2 different machines.
The replication configuration is fine and it is working but, what I observed
is, on every change to master index full replication is being triggered on
slave. 
I was planning to get only incremental indexing done on every change.

*Master config:*


  
  startup
  commit
   schema.xml,stopwords.txt,elevate.xml
00:00:20
 
   1


*Slave config:*


  
http://:/solr/core0/replication
00:00:20
  



What I observed is, the index directory name is appended with timestamp
i.e., /index./ on slave instance. 

I have seen a similar issue on older version of SOLR and it is fixed in 4.2
(per description). So, not sure if this is related to the same.

https://issues.apache.org/jira/browse/SOLR-4471
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-amp-Index-versions-td4041256.html#a4041808


Any pointers would be highly appreciated.

Thanks,
Suresh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-4-Slave-always-replicates-full-index-tp4113089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solrcloud shards backup/restoration

2014-01-23 Thread Allan Mascarenhas
Any update on this ? 

I am also stuck with same problem, I want to install snapshot of master solr
server to my local environment. but i could't  :(

All most spend 2 days to figure it out the way. Please help!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-shards-backup-restoration-tp4088447p4113142.html
Sent from the Solr - User mailing list archive at Nabble.com.