Document field as an input for subquery params which contains whitespaces

2017-02-11 Thread Johannes
Hello.

I would like to query data depending on a value in a document. But it
works only, when it contains no whitespaces.

q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax
qf=org_fqdn_hierarchy v=$row.org_fqdn_hierarchy}&orgnav.fq=type:org

The first document looks like
{
 fqdn:"cn=user 1,ou=users",
 org_fqdn_hierarchy:"ou=sales,cn=dep 1"
}

The field org_fqdn_hierarchy is tokenized with
PathHierarchyTokenizerFactory, to retrieve all hierarchically belonging
documents.


I tried it with the following commands, but the orgnav-subquery stays
empty :(

q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax
qf=org_fqdn_hierachy v='$row.org_fqdn_hierachy'}&orgnav.fq=type:org

q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax
qf=org_fqdn_hierachy v="$row.org_fqdn_hierachy"}&orgnav.fq=type:org



Is there a trick or parameter to escape it?

Best regards
Johannes


Re: Document field as an input for subquery params which contains whitespaces

2017-02-11 Thread Johannes
SOLVED. Containing white spaces were not the problem. I wanted to see
the content of org_fqdn_hierarchy in the response and changed the field
definition stored from false to true. And my subquery returns the
desired results.

Best regards
Johannes

Am 11.02.2017 um 11:18 schrieb Johannes:
> Hello.
> 
> I would like to query data depending on a value in a document. But it
> works only, when it contains no whitespaces.
> 
> q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax
> qf=org_fqdn_hierarchy v=$row.org_fqdn_hierarchy}&orgnav.fq=type:org
> 
> The first document looks like
> {
>  fqdn:"cn=user 1,ou=users",
>  org_fqdn_hierarchy:"ou=sales,cn=dep 1"
> }
> 
> The field org_fqdn_hierarchy is tokenized with
> PathHierarchyTokenizerFactory, to retrieve all hierarchically belonging
> documents.
> 
> 
> I tried it with the following commands, but the orgnav-subquery stays
> empty :(
> 
> q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax
> qf=org_fqdn_hierachy v='$row.org_fqdn_hierachy'}&orgnav.fq=type:org
> 
> q=fqdn:"cn=user 1,ou=users"&fl=fqdn,orgnav:[subquery]&orgnav.q={!dismax
> qf=org_fqdn_hierachy v="$row.org_fqdn_hierachy"}&orgnav.fq=type:org
> 
> 
> 
> Is there a trick or parameter to escape it?
> 
> Best regards
> Johannes
> 


Permutations of entries in a multivalued field

2015-12-16 Thread Johannes Riedl

Hello all,

we are facing the following problem: we use a multivalued string field 
that contains entries of the kind A/B/C/, where A,B,C are terms.
We are now looking for a simple way to also find all permutations of 
A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains 
all entries alphabetically sorted and guarantee sorting on the user 
side. However - since this is limited in some ways - is there a simple 
way to either index in a way such that solely A/B/C and all permutations 
are found (using e.g. type=text is not an option since a term could 
occur in a different entry of the multivalued field) or trigger an 
alphabetical sorting of incoming queries.


Thanks a lot for your feedback, best regards

Johannes



Re: Permutations of entries in a multivalued field

2015-12-21 Thread Johannes Riedl

Thanks a lot for these useful hints.

Best,

Johannes

On 18.12.2015 20:59, Allison, Timothy B. wrote:

Duh, didn't realize you could set inOrder in Solr.  Y, that's the better 
solution.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, December 18, 2015 2:27 PM
To: solr-user 
Subject: Re: Permutations of entries in a multivalued field

The other thing to check is the ComplexPhraseQueryParser, see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

It uses the Span queries to build up the query...

Best,
Erick

On Fri, Dec 18, 2015 at 11:23 AM, Allison, Timothy B.
 wrote:

Hi Johannes,
   I suspect that Scott's answer would be more efficient than the following, 
and I may be misunderstanding the problem!

  This type of search is supported at the Lucene level by a SpanNearQuery with 
inOrder set to false.

  So, how do you get a SpanQuery in Solr?  You might want to look at the 
SurroundQueryParser, and I have an alternate (LUCENE-5205/SOLR-5410) here: 
https://github.com/tballison/lucene-addons.

  If you do find an appropriate parser, make sure that your position increment gap 
is > 0 on your text field definition, and then you'd never incorrectly get a 
hit across field entries of:

[0] A B
[1] C

Best,
Tim

On Wed, Dec 16, 2015 at 8:38 AM, Johannes Riedl < 
johannes.ri...@uni-tuebingen.de> wrote:


Hello all,

we are facing the following problem: we use a multivalued string
field that contains entries of the kind A/B/C/, where A,B,C are terms.
We are now looking for a simple way to also find all permutations of
A/B/C, so e.g. B/A/C. As a workaround we added a new field that
contains all entries alphabetically sorted and guarantee sorting on the user 
side.
However - since this is limited in some ways - is there a simple way
to either index in a way such that solely A/B/C and all permutations
are found (using e.g. type=text is not an option since a term could
occur in a different entry of the multivalued field) or trigger an
alphabetical sorting of incoming queries.

Thanks a lot for your feedback, best regards

Johannes




--
Scott Stults | Founder & Solutions Architect | OpenSource Connections,
LLC
| 434.409.2780
http://www.opensourceconnections.com




optimize cache-hit-ratio of filter- and query-result-cache

2015-11-30 Thread Johannes Siegert

Hi,

some of my solr indices have a low cache-hit-ratio.

1 Does sorting the parts of a single filter-query have impact on 
filter-cache- and query-result-cache-hit-ratio?
1.1 Example: fq=field1:(2 or 3 or 1) to fq=field1:(1 or 2 or 3) -> if 
1,2,3 are randomly sorted
2 Does sorting the parts of the query have impact on 
query-result-cache-hit-ratio?
2.1 Example: "q=abc&fq=field1:abc&sort=field1 
desc&fq=field2:xyz&sort=field2 asc" to 
"q=abc&fq=field1:abc&fq=field2:xyz&sort=field1 desc&sort=field2 asc" -> 
if the query parts are randomly sorted


Thanks!

Johannes



Re: optimize cache-hit-ratio of filter- and query-result-cache

2015-12-01 Thread Johannes Siegert
Thanks. The statements on 
http://wiki.apache.org/solr/SolrCaching#showItems are not explicitly 
enough for my question.




sort by given order

2015-03-12 Thread Johannes Siegert

Hi,

i want to sort my documents by a given order. The order is defined by a 
list of ids.


My current solution is:

list of ids: 15, 5, 1, 10, 3

query: q=*:*&fq=(id:((15) OR (5) OR (1) OR (10) OR 
(3)))&sort=query($idqsort) desc,id asc&idqsort=id:((15^5) OR (5^4) OR 
(1^3) OR (10^2) OR (3^1))&start=0&rows=5


Do you know an other solution to sort by a list of ids?

Thanks!

Johannes


Custom Query Implementation?

2015-04-28 Thread Johannes Ruscheinski
Hi,

I am entirely new to the world of SOLR programming and I have the
following questions:

In addition to our regular searches we need to implement a specialised
form of range search and ranking.  What I mean by this is that users can
search for one or more numeric ranges like "17:85,205:303" etc.  (These
are range-begin/range-end pairs.)  A small percentage of our records,
maybe less than 10% will have similar ranges, again, one or more, stored
in a SOLR field.  We need to apply a custom scoring function and filter
the matches, too.  (Not all ranges match and scores will typically
differ greatly.)   Where are all the places where we have to insert
code?  Also, any tips on how to develop and debug this?  I am using the
Linux command-line and Emacs.  I am linking against SOLR by using "javac
-cp solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant
but, I might mention it anyway: We are using SOLR as a part of VuFind.

I'd be greatful for any suggestions.

Thank you!

--Johannes

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Custom Scoring Question

2015-04-29 Thread Johannes Ruscheinski
Hi,

I am entirely new to the world of SOLR programming and I have the following 
questions:

In addition to our regular searches we need to implement a specialised form of 
range search and ranking. We have implemented a CustomScoreQuery and a 
CustomScoreProvider.  I now have a few questions:

1) Where and how do we let SOLR know that it should use this? (I presume that 
will be some XML config file.)
2) How do we "tag" our special queries to switch to the custom implementation.

Furthermore, only a small subset of our data will have the database field 
relevant to this type of query set.  A problem that I can see is that we want 
SOLR to prefilter, or suppress, any records that have no data in this field 
and, if the field is non-empty, to call a function provided by us to let it 
know whether to include said record in the result set or not.

Also, any tips on how to develop and debug this?  I am using the Linux 
command-line and Emacs.  I am linking against SOLR by using "javac -cp 
solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant but, I might 
mention it anyway: We are using SOLR as a part of VuFind.

I'd be greatful for any suggestions.

--Johannes

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Re: Antwort: Custom Scoring Question

2015-04-29 Thread Johannes Ruscheinski
Hi Stephan,

On 29/04/15 14:37, Stephan Schubert wrote:
> Hi Johannes,
>
> did you have a look on Solr edismax and function queries? 
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
Just read it.
>
> If I got you right, for the case you just want to ignore fields which have 
> not a value set on a specific field you can filter them out with a filter

Yes, that is a part of our problem.
>  
> query.
>
> Example: 
>
> fieldname: mycustomfield
>
> filterquery to ignore docs with mycustomfield not set: +mycustomfield:*

That seems really useful to us and solves one part of our problem,
thanks.  We still need to figure out how to invoke the custom scorer
that we wrote in Java.  Also, we would like the search to invoke another
custom function that filters out results that are not relevant to a
given query.

--Johannes
>
> Regards
>
> Stephan
>
>
>
> Von:Johannes Ruscheinski 
> An: solr-user@lucene.apache.org, 
> Kopie:  Oliver Obenland 
> Datum:  29.04.2015 14:10
> Betreff:Custom Scoring Question
>
>
>
> Hi,
>
> I am entirely new to the world of SOLR programming and I have the 
> following questions:
>
> In addition to our regular searches we need to implement a specialised 
> form of range search and ranking. We have implemented a CustomScoreQuery 
> and a CustomScoreProvider.  I now have a few questions:
>
> 1) Where and how do we let SOLR know that it should use this? (I presume 
> that will be some XML config file.)
> 2) How do we "tag" our special queries to switch to the custom 
> implementation.
>
> Furthermore, only a small subset of our data will have the database field 
> relevant to this type of query set.  A problem that I can see is that we 
> want SOLR to prefilter, or suppress, any records that have no data in this 
> field and, if the field is non-empty, to call a function provided by us to 
> let it know whether to include said record in the result set or not.
>
> Also, any tips on how to develop and debug this?  I am using the Linux 
> command-line and Emacs.  I am linking against SOLR by using "javac -cp 
> solr-core-4.2.1.jar:. my_code.java".  It is probably not relevant but, I 
> might mention it anyway: We are using SOLR as a part of VuFind.
>
> I'd be greatful for any suggestions.
>
> --Johannes
>

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Limit Results By Score?

2015-05-05 Thread Johannes Ruscheinski
Hi,

We have implemented a custom scoring function and also need to limit the
results by score.  How could we go about that?  Alternatively, can we
suppress the results early using some kind of custom filter?

--Johannes

-- 
Dr. Johannes Ruscheinski
Universitätsbibliothek Tübingen - IT-Abteilung -
Wilhelmstr. 32, 72074 Tübingen

Tel: +49 7071 29-72820
FAX: +49 7071 29-5069
Email: johannes.ruschein...@uni-tuebingen.de




Multilingual Solr

2016-06-05 Thread Riedl, Johannes
Hi all,

we are currently in search of a solution for switching between different 
languages in the query results and keeping the possibility to perform a search 
in several languages in parallel.  The overall aim would be a constant field 
name and a an additional Solr parameter "lang=XX_YY" that allows to return the 
results in the chosen language while searches are applied to all languages. 
Setting up several cores to obtain a generic field name is not an option. Does 
anyone know of a clean way to achieve this, particularly routing content 
indexed to a generic field (e.g. title) to a "background field" (e.g. title_en, 
title_fr) etc on the fly and retrieving it from there depending on the language 
chosen.

Background: So far, we have investigated the multi-language field approach 
offered by Trey Grainger in the code examples for "Solr in Action" 
(https://github.com/treygrainger/solr-in-action.git, chapter 14), an extension 
to the ordinary textField that allows to use a generic field name and the 
language is encoded at the beginning of the field content and appropriate index 
and query analyzers associated to dummy fields in schema.xml. If there is a way 
to store data in these dummy fields and additionally the lang parameter is 
added we might be done.

Thanks a lot, best regards

Johannes


Re: Multilingual Solr

2016-06-06 Thread Johannes Riedl

Hi Alessandro, hi Alexandre,

Thanks a lot for your reply and your considerations and hints. We use a 
web front end that comes bundled with Solr. It currently uses a single 
core approach. We would like to stick to the original setup as closely 
as possible to avoid administrative overhead and to not prevent the 
possible use of several cores in a different context in the future. This 
is the reason why we would like to hide the language fields completely 
from the front end apart from specifying an additional language 
parameter. Language detection on indexing is currently not an issue for 
us, as we get the input in a standardized format and thus can determine 
the language beforehand.


https://github.com/treygrainger/solr-in-action/blob/master/example-docs/ch14/cores/multi-language-field/conf/schema.xml 
shows an example how the multiText field type makes use of language 
specific field types to specify the analyzers that are being used. The 
core issue for us (pun intended ;-)) is to find out whether it is 
possible to extend this approach to only return the selected 
language(s), i.e. to transparently add something like nested documents.


Best regards

Johannes


On 06.06.2016 10:10, Alessandro Benedetti wrote:

Hi Johannes,
nothing out of the box unfortunately but could be a nice idea and
contribution.
If having a multi-core setup is not an option ( out of curiousity, can I
ask why ?)
you could proceed in this way :

1) you define in the schema N field variation per field you are interested
in.
N is the number of language you can support.
Given for example the text field you define :
text field not indexed, only stored
text_en indexed
text_fr indexed
text_it indexed ...

2) At indexing time you can develop a custom updateRequestProcessor that
will identify the language ( Solr internal libraries offer support for
that) and address the correct text field to index the content .
If you want to index also translations, you need to rely on some third
party libraries to do that.

3) At query time you can address in parallel all the fields you want, with
the edismax query parser for example .

4) For rendering the results, I don't have exactly clear, do you want to :

a) translate the document content in the language you want, you could
develop a custom DocTransformer that will take in input the language and
translate, but I don't see that much benefit in that.

b) return only the documents that originally were of that language. This
case is easy, you add a fq at queyTime to filter only the documents of the
language you want ( at indexing time you identify the language)

c) return the original content of the document, this is quite easy. You can
store the generic "text" field, and always return that.

Let us know for further discussion,

Cheers

On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes <
johannes.ri...@uni-tuebingen.de> wrote:


Hi all,

we are currently in search of a solution for switching between different
languages in the query results and keeping the possibility to perform a
search in several languages in parallel.  The overall aim would be a
constant field name and a an additional Solr parameter "lang=XX_YY" that
allows to return the results in the chosen language while searches are
applied to all languages. Setting up several cores to obtain a generic
field name is not an option. Does anyone know of a clean way to achieve
this, particularly routing content indexed to a generic field (e.g. title)
to a "background field" (e.g. title_en, title_fr) etc on the fly and
retrieving it from there depending on the language chosen.

Background: So far, we have investigated the multi-language field approach
offered by Trey Grainger in the code examples for "Solr in Action" (
https://github.com/treygrainger/solr-in-action.git, chapter 14), an
extension to the ordinary textField that allows to use a generic field name
and the language is encoded at the beginning of the field content and
appropriate index and query analyzers associated to dummy fields in
schema.xml. If there is a way to store data in these dummy fields and
additionally the lang parameter is added we might be done.

Thanks a lot, best regards

Johannes








Sharding vs single index vs separate collection

2017-06-08 Thread Johannes Knaus
Hi,
I have a solr cloud setup, with document routing (implicit routing with router 
field). As the index is about documents with a publication date, I routed 
according the publication year, as in my case, most of the search queries will 
have a year specified.


Now, what would be the best strategy -as regards performance (i.e. a huge 
amount of queries to be processed)- for search queries without any year 
specified? 

1 - Is it enough to define that these queries should go over all routes (i.e. 
route=year1, year2, ..., yearN)?

2 - Would it be better to add a separate node with a separate index that is not 
routed (but maybe sharded/splitted)? If so, how should I deal with such a 
separate index? Is it possible to add it to my existing Solr cloud? Would it go 
into a separate collection?

Thanks for your advice.

Johannes 

SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?

2017-08-30 Thread Johannes Knaus
I have a working SolrCloud-Setup with 38 nodes with a collection spanning over 
these nodes with 2 shards per node and replication factor 2 and a router field.

Now I got some new data for indexing which has the same structure and size as 
my existing index in the described collection.
However, although it has the same structure the new data to be indexed should 
not be mixed with the old data.

Do I have create another 38 new nodes and a new collection and index the new 
data or is there a better / more efficient way I could use the existing nodes?
Is it possible that the 2 collections could share the 38 nodes without the 
indexes being mixed?

Thanks for your help.

Johannes


Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same nodes possible?

2017-08-30 Thread Johannes Knaus
Thank you, Susheel, for the quick response.

So, that means that when I create a new collection, it shards will be newly 
created at each node, right?
Thus, if I have two collections with 
numShards=38, 
maxShardsPerNode=2 and 
replicationFactor=2 
on my 38 nodes, then this would result in each node "hosting" 4 shards (two 
from each collection).

If this is correct, I have two follow up questions:

1) As regards naming of the shards: Is using the same naming for the shards 
o.k. in this constellation? I.e. does it create trouble to have e.g. 
"Shard001", "Shard002", etc. in collection1 and "Shard001", "Shard002", etc. as 
well in collection2?

2) Performance: In my current single collection setup, I have 2 shards per 
node. After creating the second collection, there will be 4 shards per node. Do 
I have to edit the RAM per node value (raise the -m parameter when starting the 
node)? In my case, I am quite sure that the collections will never be queried 
simultaneously. So will the "running but idle" collection slow me down?

Johannes

-Ursprüngliche Nachricht-
Von: Susheel Kumar [mailto:susheel2...@gmail.com] 
Gesendet: Mittwoch, 30. August 2017 17:36
An: solr-user@lucene.apache.org
Betreff: Re: SolrCloud indexing -- 2 collections, 2 indexes, sharing the same 
nodes possible?

Yes, absolutely.  You can create as many as collections you need (like you 
would create table in relational world).

On Wed, Aug 30, 2017 at 10:13 AM, Johannes Knaus  wrote:

> I have a working SolrCloud-Setup with 38 nodes with a collection 
> spanning over these nodes with 2 shards per node and replication 
> factor 2 and a router field.
>
> Now I got some new data for indexing which has the same structure and 
> size as my existing index in the described collection.
> However, although it has the same structure the new data to be indexed 
> should not be mixed with the old data.
>
> Do I have create another 38 new nodes and a new collection and index 
> the new data or is there a better / more efficient way I could use the 
> existing nodes?
> Is it possible that the 2 collections could share the 38 nodes without 
> the indexes being mixed?
>
> Thanks for your help.
>
> Johannes
>


What does the replication factor parameter in collections api do?

2017-04-12 Thread Johannes Knaus
Hi,

I am still quite new to Solr. I have the following setup:
A SolrCloud setup with 
38 nodes, 
maxShardsPerNode=2, 
implicit routing with routing field, 
and replication factor=2.

Now, I want to add replica. This works fine by first increasing the 
maxShardsPerNode to a higher number and then add replicas.
So far, so good. I can confirm changes of the maxShardsPerNode parameter and 
added replicas in the Admin UI.
However, the Solr Admin UI still is showing me a replication factor of 2.
I am a little confused about what the replicationfactor parameter actually does 
in my case:

1) What does that mean? Does Solr make use of all replicas I have or only of 
two?
2) Do I need to increase the replication factor value as well to really have 
more replicas available and usable? If this is true, do I need to 
restart/reload the collection newly upload configs to Zookeeper or anything 
alike?
3) Or is replicationfactor just a parameter that is needed for the first start 
of SolrCloud and can be ignored afterwards?

Thank you very much for your help,
All the best,
Johannes



AW: What does the replication factor parameter in collections api do?

2017-04-13 Thread Johannes Knaus
Ok. Thank you for your quick reply. 
Though I still feel a little uneasy. Why is it possible then to alter 
replicationFactor via MODIFYCOLLECTION in the collections API? What would be 
the use case for this parameter at all then?


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Mittwoch, 12. April 2017 19:36
An: solr-user
Betreff: Re: What does the replication factor parameter in collections api do?

really <3>. replicationFactor is used to set up your collection initially, you 
have to be able to change your topology afterwards so it's ignored thereafter.

Once your replica is added, it's automatically made use of by the collection.

On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus  wrote:
> Hi,
>
> I am still quite new to Solr. I have the following setup:
> A SolrCloud setup with
> 38 nodes,
> maxShardsPerNode=2,
> implicit routing with routing field,
> and replication factor=2.
>
> Now, I want to add replica. This works fine by first increasing the 
> maxShardsPerNode to a higher number and then add replicas.
> So far, so good. I can confirm changes of the maxShardsPerNode parameter and 
> added replicas in the Admin UI.
> However, the Solr Admin UI still is showing me a replication factor of 2.
> I am a little confused about what the replicationfactor parameter actually 
> does in my case:
>
> 1) What does that mean? Does Solr make use of all replicas I have or only of 
> two?
> 2) Do I need to increase the replication factor value as well to really have 
> more replicas available and usable? If this is true, do I need to 
> restart/reload the collection newly upload configs to Zookeeper or anything 
> alike?
> 3) Or is replicationfactor just a parameter that is needed for the first 
> start of SolrCloud and can be ignored afterwards?
>
> Thank you very much for your help,
> All the best,
> Johannes
>


Re: AW: What does the replication factor parameter in collections api do?

2017-04-16 Thread Johannes Knaus
Thank you all very much for your answers. That definitely explains it.
All the best,
Johannes

> Am 13.04.2017 um 17:03 schrieb Erick Erickson :
> 
> bq: Why is it possible then to alter replicationFactor via
> MODIFYCOLLECTION in the collections API
> 
> Because MODIFYCOLLECTION just changes properties in the collection
> definition generically and replicationFactor just happens to be one.
> IOW there's no overarching reason.
> 
> It would be extra work to dis-allow that one case and possibly
> introduce errors without changing any functionality so nobody was
> willing to put in the effort.
> 
> Best,
> Erick
> 
>> On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey  wrote:
>>> On 4/13/2017 3:22 AM, Johannes Knaus wrote:
>>> Ok. Thank you for your quick reply. Though I still feel a little
>>> uneasy. Why is it possible then to alter replicationFactor via
>>> MODIFYCOLLECTION in the collections API? What would be the use case
>>> for this parameter at all then?
>> 
>> If you use a very specific storage method for your indexes -- HDFS --
>> then replicationFactor has meaning beyond initial collection creation,
>> in conjunction with the "autoAddReplicas" feature.
>> 
>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud
>> 
>> If you are NOT utilizing the very specific HDFS storage engine, then
>> everything you were told applies.  With standard storage mechanisms,
>> replicationFactor has zero meaning after initial collection creation,
>> and changing the value will have no effect.
>> 
>> Thanks,
>> Shawn
>> 


changed query behavior

2014-04-14 Thread Johannes Siegert

Hi,

I have updated my solr instance from 4.5.1 to 4.7.1.

Now my solr query failing some tests.

Query: q=*:*&fq=(title:((T&E)))?debug=true

Before the update:


*:*
*:*
MatchAllDocsQuery(*:*)
*:*

LuceneQParser

(title:((T&E)))


+title:t&e +title:t +title:e

...

After the update:


*:*
*:*
MatchAllDocsQuery(*:*)
*:*

LuceneQParser

(title:((T&E)))


+((title:t&e title:t)/no_coord) +title:e

...

Before update the query deliver only one result. Now the query deliver 
three results.


Do you have any idea why the parsed_filter_queries is "+((title:t&e 
title:t)/no_coord) +title:e" instead of "+title:t&e +title:t +title:e"?


"title"-field definition:

positionIncrementGap="100" omitNorms="true">

  

mapping="mapping.txt"/>


generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" 
splitOnNumerics="1" preserveOriginal="1" stemEnglishPossessive="0"/>


  
  

 mapping="mapping.txt"/>


synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
    generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" 
splitOnNumerics="0" preserveOriginal="1"/>


  


The default query operator is AND.

Thanks!

Johannes




Bug within the solr query parser (version 4.7.1)

2014-04-15 Thread Johannes Siegert

Hi,

I have updated my solr instance from 4.5.1 to 4.7.1. Now the parsed 
query seems to be not correct.


Query: /*q=*:*&fq=title:T&E&debug=true */

Before the update the parsed filter query is "*/+title:t&e +title:t 
+title:e/*". After the update the parsed filter query is "*/+((title:t&e 
title:t)/no_coord) +title:e/*". It seems like a bug within the query parser.


I also have validated the parsed filter query with the analysis 
component. The result was "*/+title:t&e +title:t +title:e/*".


The behavior is equal on all special characters that split words into 2 
parts.


I use the following WordDelimiterFilter on query side:

generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" 
preserveOriginal="1"/>


Thanks.

Johannes


Additional informations:

Debug before the update:


*:*
*:*
MatchAllDocsQuery(*:*)
*:*

LuceneQParser

(title:((T&E)))

* **
**+title:t&e +title:t +title:e **
** *
...

Debug after the update:


*:*
*:*
MatchAllDocsQuery(*:*)
*:*

LuceneQParser

(title:((T&E)))

* **
**+((title:t&e title:t)/no_coord) +title:e **
***
...

"title"-field definition:

positionIncrementGap="100" omitNorms="true">

  

mapping="mapping.txt"/>


generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" 
splitOnNumerics="1" preserveOriginal="1" stemEnglishPossessive="0"/>


  
  

 mapping="mapping.txt"/>


synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" 
splitOnNumerics="0" preserveOriginal="1"/>


  



FilteredQuery with TermsFilter swallowing results after upgrade to solr 4.8

2014-05-14 Thread Johannes Schlumberger
Hi,
I am in the process of upgrading an extension I made to QueryComponent from
solr 3.4 to solr 4.8.
I am wrapping the query into a filteredquery together with a termsfilter
that encapsulates a lot of Terms for up to two fields (potentially 10s of
thousands, only two in my simple test case).
My extension worked fine in solr 34 and I have used it for years. After
upgrading to solr 4.8 and compiling the extension against the new source
(Termsfilter API changed a little bit in how you pass in the terms), I am
no longer getting any records back when running a query.
The same query not involving the filter returns the expected results. A
semantically equivalent query using a lot of OR clauses in the fq query
parameter works fine, but is about 10 times slower, so I would really like
to get the TermsFilter to work.
I printed out the Query in solr 34 and in solr 48 and they differ
(Unfortunately I do not know how to read these lines.):
solr 34: filtered(+(bedbathbeyond:portlandia | title_pst:portlandia^1.5 |
license_plate:portlandia | title_tst:portlandia^2.0 |
description_pst:portlandia^0.8 | description_tst:portlandia |
phone_number:portlandia |
reference_number:portlandia))->org.apache.lucene.search.TermsFilter@66c21442

solr 48: filtered(+(+(reference_number:portlandia |
title_tst:portlandia^2.0 | license_plate:portlandia |
phone_number:portlandia | bedbathbeyond:portlandia |
title_pst:portlandia^1.5 | description_tst:portlandia |
description_pst:portlandia^0.8)))->property_group_id:984678480
property_id:984678954

(The query info for the latter line was: INFO: [property_test] webapp=/solr
path=/select
params={fl=*+score&start=0&q=portlandia&qf=title_pst^1.5+title_tst^2+description_pst^0.8+description_tst+bedbathbeyond+phone_number+reference_number+license_plate&properties=984678954&wt=ruby&groups=984678480&fq=type:Property&fq=visibility_s:visible&rows=25&defType=edismax}
hits=0 status=0 QTime=5 )

I attached a copy of my source code, and marked the changes I made to the
code of QueryComponent with comments - maybe there is something obviously
wrong.
Any help or pointers are appreciated, also please let me know if I should
rather write to the dev list than the users list.
thanks,
Johannes


default query operator ignored by edismax query parser

2014-06-25 Thread Johannes Siegert

Hi,

I have defined the following edismax query parser:

name="defaults">100%name="defType">edismax0.01name="ps">100*:*name="q.op">ANDfield1^2.0 field2name="rows">10*




My search query looks like:

q=(word1 word2) OR (word3 word4)

Since I specified AND as default query operator, the query should match 
documents by ((word1 AND word2) OR (word3 AND word4)) but the query 
matches documents by ((word1 OR word2) OR (word3 OR word4)).


Could anyone explain the behaviour?

Thanks!

Johannes

P.S. The query q=(word1 word2) match all documents by (word1 AND word2)


Re: default query operator ignored by edismax query parser

2014-06-25 Thread Johannes Siegert

Thanks Shawn!

In this case I will use operators everywhere.

Johannes


Am 25.06.2014 15:09, schrieb Shawn Heisey:

On 6/25/2014 1:05 AM, Johannes Siegert wrote:

I have defined the following edismax query parser:

100%edismax0.01100*:*ANDfield1^2.0 field210*



My search query looks like:

q=(word1 word2) OR (word3 word4)

Since I specified AND as default query operator, the query should match
documents by ((word1 AND word2) OR (word3 AND word4)) but the query
matches documents by ((word1 OR word2) OR (word3 OR word4)).

Could anyone explain the behaviour?

I believe that you are running into this bug:

https://issues.apache.org/jira/browse/SOLR-2649

It's a very old bug, coming up on three years.  The workaround is to not
use boolean operators at all, or to use operators EVERYWHERE so that
your intent is explicitly described.  It is not much of a workaround,
but it does work.

Thanks,
Shawn



wrong docFreq while executing query based on uniqueKey-field

2014-07-22 Thread Johannes Siegert

Hi.

My solr-index (version=4.7.2.) has an id-field:


...
id

The index will be updated once per hour.

I use the following query to retrieve some documents:

"q=id:2^2 id:1^1"

I would expect that the document(2) should be always before the 
document(1). But after many index updates document(1) is before document(2).


With debug=true I could see the problem. The document(1) has a 
docFreq=2, while the document(2) has a docFreq=1.


How could the docFreq of the uniqueKey-field be hight than 1? Could 
anyone explain this behavior to me?


Thanks!

Johannes



NGramTokenizer influence to length normalization?

2014-08-08 Thread Johannes Siegert

Hi,

does the NGramTokenizer have an influence to the length normalization?

Thanks.

Johannes


high memory usage with small data set

2014-01-29 Thread Johannes Siegert

Hi,

we are using Apache Solr Cloud within a production environment. If the 
maximum heap-space is reached the Solr access time slows down, because 
of the working garbage collector for a small amount of time.


We use the following configuration:

- Apache Tomcat as webserver to run the Solr web application
- 13 indices with about 150 entries (300 MB)
- 5 server with one replication per index (5 GB max heap-space)
- All indices have the following caches
   - maximum document-cache-size is 4096 entries, all other indices 
have between 64 and 1536 entries
   - maximum query-cache-size is 1024 entries, all other indices have 
between 64 and 768
   - maximum filter-cache-size is 1536 entries, all other i ndices have 
between 64 and 1024

- the directory-factory-implementation is NRTCachingDirectoryFactory
- the index is updated once per hour (no auto commit)
- ca. 5000 requests per hour per server
- large filter-queries (up to 15000 bytes and 1500 boolean operations)
- many facet-queries (30%)

Behaviour:

Started with 512 MB heap space. Over several days the heap-space grow 
up, until the 5 GB was reached. At this moment the described problem 
occurs. From this time on the heap-space-useage is between 50 and 90 
percent. No OutOfMemoryException occurs.


Questions:


1. Why does Solr use 5 GB ram, with this small amount of data?
2. Which impact does the large filter-queries have in relation to ram usage?

Thanks!

Johannes Siegert


Re: high memory usage with small data set

2014-02-05 Thread Johannes Siegert

Hi Erick,

thanks for your reply.

What do you exactly mean with "Do your used entries in your caches 
increase in parallel?"?


I update the indices every hour and commit the changes. So a new 
searcher with empty or autowarmed caches should be created and the old 
one should be removed.


Johannes

Am 30.01.2014 15:08, schrieb Erick Erickson:

Do your used entries in your caches increase in parallel? This would be the case
if you aren't updating your index and would explain it. BTW, take a look at your
cache statistics (from the admin page) and look at the cache hit ratios. If they
are very small (and my guess is that with 1,500 boolean operations, you aren't
getting significant re-use) then you're just wasting space, try the cache=false
option.

Also, how are you measuring memory? It's sometimes confusing that virtual
memory can be include, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Wed, Jan 29, 2014 at 7:49 AM, Johannes Siegert
 wrote:

Hi,

we are using Apache Solr Cloud within a production environment. If the
maximum heap-space is reached the Solr access time slows down, because of
the working garbage collector for a small amount of time.

We use the following configuration:

- Apache Tomcat as webserver to run the Solr web application
- 13 indices with about 150 entries (300 MB)
- 5 server with one replication per index (5 GB max heap-space)
- All indices have the following caches
- maximum document-cache-size is 4096 entries, all other indices have
between 64 and 1536 entries
- maximum query-cache-size is 1024 entries, all other indices have
between 64 and 768
- maximum filter-cache-size is 1536 entries, all other i ndices have
between 64 and 1024
- the directory-factory-implementation is NRTCachingDirectoryFactory
- the index is updated once per hour (no auto commit)
- ca. 5000 requests per hour per server
- large filter-queries (up to 15000 bytes and 1500 boolean operations)
- many facet-queries (30%)

Behaviour:

Started with 512 MB heap space. Over several days the heap-space grow up,
until the 5 GB was reached. At this moment the described problem occurs.
 From this time on the heap-space-useage is between 50 and 90 percent. No
OutOfMemoryException occurs.

Questions:


1. Why does Solr use 5 GB ram, with this small amount of data?
2. Which impact does the large filter-queries have in relation to ram usage?

Thanks!

Johannes Siegert



solr-query with NOT and OR operator

2014-02-11 Thread Johannes Siegert

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and 
documents matching to (field2:value2).


But solr deliver only documents, that are the result of (field2:value2). 
I receive several documents, if I request only for ((-(field1:value1))).


Thanks!

Johannes


Re: solr-query with NOT and OR operator

2014-02-11 Thread Johannes Siegert

Hi Jack,

thanks!

fq=((*:* -(field1:value1)))+OR+(field2:value2).

This is the solution.

Johannes

Am 11.02.2014 17:22, schrieb Jack Krupansky:
With so many parentheses in there, I wonder what you are really trying 
to do Try expressing your query in simple English first so that we 
can understand your goal.


But generally, a purely negative nested query must have a *:* term to 
apply the exclusion against:


fq=((*:* -(field1:value1)))+OR+(field2:value2).

-- Jack Krupansky

-Original Message- From: Johannes Siegert
Sent: Tuesday, February 11, 2014 10:57 AM
To: solr-user@lucene.apache.org
Subject: solr-query with NOT and OR operator

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and
documents matching to (field2:value2).

But solr deliver only documents, that are the result of (field2:value2).
I receive several documents, if I request only for ((-(field1:value1))).

Thanks!

Johannes


--
Johannes Siegert
Softwareentwickler

Telefon:  0351 - 418 894 -73
Fax:  0351 - 418 894 -99
E-Mail:   johannes.sieg...@marktjagd.de
Xing: https://www.xing.com/profile/Johannes_Siegert2

Webseite: http://www.marktjagd.de
Blog: http://blog.marktjagd.de
Facebook: http://www.facebook.com/marktjagd
Twitter:  http://twitter.com/Marktjagd
__

Marktjagd GmbH | Schützenplatz 14 | D - 01067 Dresden

Geschäftsführung: Jan Großmann
Sitz Dresden | Amtsgericht Dresden | HRB 28678



Replication of a corrupt master index

2014-12-02 Thread Charra, Johannes

Hi,

If I have a master/slave setup and the master index gets corrupted, will the 
slaves realize they should not replicate from the master anymore, since the 
master does not have a newer index version?

I'm using Solr version 4.2.1.

Regards,
Johannes




AW: Replication of a corrupt master index

2014-12-02 Thread Charra, Johannes
Thanks for your response, Erick. 

Do you think it is possible to corrupt an index merely with HTTP requests? I've 
been using the aforementioned m/s setup for years now and have never seen a 
master failure.

I'm trying to think of scenarios where this setup (1 master, 4 slaves) might 
have a total outage. The master runs on a h/a cluster.

Regards,
Johannes

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Dienstag, 2. Dezember 2014 15:54
An: solr-user@lucene.apache.org
Betreff: Re: Replication of a corrupt master index

No. The master is the master and will always stay the master unless you change 
it. This is one of the reasons I really like to keep the original source around 
in case I every have this problem.

Best,
Erick

On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes 
 wrote:
>
> Hi,
>
> If I have a master/slave setup and the master index gets corrupted, will the 
> slaves realize they should not replicate from the master anymore, since the 
> master does not have a newer index version?
>
> I'm using Solr version 4.2.1.
>
> Regards,
> Johannes
>
>


looking for working example defType=term

2013-08-12 Thread Johannes Elsinghorst
Hi,
can anyone provide a working example (solrconfig.xml,schema.xml) using the 
TermQParserPlugin? I always get a Nullpointer-Exception on startup:
8920 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore  û 
java.lang.NullPointerException
   at 
org.apache.solr.search.TermQParserPlugin$1.parse(TermQParserPlugin.java:55)
   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
   at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:142)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
   at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
   at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1693)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)

solarconfig.xml:
  
explicit
   term
   10
   id
 

Thanks,
Johannes




FW: looking for working example defType=term

2013-08-12 Thread Johannes Elsinghorst
Well, i  couldnt get it work  but maybe thats because im not a solr expert. 
What im trying to do is:
I have an index with only one indexed field. This field is an id so I don't 
want the standard queryparser to try to break it up in tokens. On the client 
side I use solrj like this:
SolrQuery solrQuery = new SolrQuery().setQuery(""); QueryResponse 
queryResponse = getSolrServer().query(solrQuery);

I'd like to configure the TermQParserPlugin on the server side to minimize my 
queries.

Johannes
-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Montag, 12. August 2013 17:10
To: Johannes Elsinghorst
Subject: Re: looking for working example defType=term

How are you using the term query parser?   The term query parser requires a 
field to be specified.

I use it this way:

   q=*:*&fq={!term f=category}electronics

The "term" query parser would never make sense as a defType query parser, I 
don't think (you have to set the field through local params).

Erik


On Aug 12, 2013, at 11:01 , Johannes Elsinghorst wrote:

> Hi,
> can anyone provide a working example (solrconfig.xml,schema.xml) using the 
> TermQParserPlugin? I always get a Nullpointer-Exception on startup:
> 8920 [searcherExecutor-4-thread-1] ERROR org.apache.solr.core.SolrCore  û 
> java.lang.NullPointerException
>   at 
> org.apache.solr.search.TermQParserPlugin$1.parse(TermQParserPlugin.java:55)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:142)
>   at 
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:142)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
>   at 
> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
>   at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1693)
>   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> 
> solarconfig.xml:
>  
>explicit
>   term
>   10
>   id
> 
> 
> Thanks,
> Johannes
> 
> 





Re: Multi CPU Cores

2011-10-15 Thread Johannes Goll
Did you try to submit multiple search requests in parallel? The apache ab tool 
is great tool to simulate simultaneous load using (-n and -c).
Johannes

On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:

> Hi,
> 
> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
> shows that java is only apparently using 1 and maxing it out.
> 
> Is there anything that can be done to take advantage of more CPU cores?
> 
> Solr 3.4 under Tomcat
> 
> [root@solr01 ~]# java -version
> java version "1.6.0_20"
> OpenJDK Runtime Environment (IcedTea6 1.9.8)
> (rhel-1.22.1.9.8.el5_6-x86_64)
> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> 
> 
> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
> 1.08
> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
> buffers
> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
> 
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND   
>   
>
> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
> java  
>   
>   
> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
> openvpn   
>   
>
> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
> top   
>   
>
>1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
> init 
> 
> 
> 


Re: Multi CPU Cores

2011-10-16 Thread Johannes Goll
Try using -useParallelGc as vm option. 

Johannes

On Oct 16, 2011, at 7:51 AM, Ken Krugler  wrote:

> 
> On Oct 16, 2011, at 1:44pm, Rob Brown wrote:
> 
>> Looks like I checked the load during a quiet period, ab -n 1 -c 1000
>> saw a decent 40% load on each core.
>> 
>> Still a little confused as to why 1 core stays at 100% constantly - even
>> during the quiet periods?
> 
> Could be background GC, depending on what you've got your JVM configured to 
> use.
> 
> Though that shouldn't stay at 100% for very long.
> 
> -- Ken
> 
> 
>> -Original Message-
>> From: Johannes Goll 
>> Reply-to: solr-user@lucene.apache.org
>> To: solr-user@lucene.apache.org 
>> Subject: Re: Multi CPU Cores
>> Date: Sat, 15 Oct 2011 21:30:11 -0400
>> 
>> Did you try to submit multiple search requests in parallel? The apache ab 
>> tool is great tool to simulate simultaneous load using (-n and -c).
>> Johannes
>> 
>> On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:
>> 
>>> Hi,
>>> 
>>> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
>>> shows that java is only apparently using 1 and maxing it out.
>>> 
>>> Is there anything that can be done to take advantage of more CPU cores?
>>> 
>>> Solr 3.4 under Tomcat
>>> 
>>> [root@solr01 ~]# java -version
>>> java version "1.6.0_20"
>>> OpenJDK Runtime Environment (IcedTea6 1.9.8)
>>> (rhel-1.22.1.9.8.el5_6-x86_64)
>>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>>> 
>>> 
>>> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
>>> 1.08
>>> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
>>> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
>>> buffers
>>> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
>>> 
>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>>> COMMAND 
>>> 
>>>
>>> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
>>> java
>>> 
>>>   
>>> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
>>> openvpn 
>>> 
>>>
>>> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
>>> top 
>>> 
>>>
>>>  1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
>>> init 
>>> 
>>> 
>>> 
>> 
> 
> --
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
> 
> 
> 


Re: Multi CPU Cores

2011-10-16 Thread Johannes Goll
we use the the following in production

java -server -XX:+UseParallelGC -XX:+AggressiveOpts
-XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
-Dsolr.solr.home= jar start.jar

more information
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Johannes


Re: Multi CPU Cores

2011-10-17 Thread Johannes Goll
Yes, same thing. This was for the jetty servlet container not tomcat. I would 
refer to the tomcat documentation on how to modify/configure the java runtime 
environment (JRE) arguments for your running instance.
Johannes

On Oct 17, 2011, at 4:01 AM, Robert Brown  wrote:

> Where exactly do you set this up?  We're running Solr3.4 under tomcat,
> OpenJDK 1.6.0.20
> 
> btw, is the JRE just a different name for the VM?  Apologies for such a
> newbie Java question.
> 
> 
> 
> On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll
>  wrote:
>> we use the the following in production
>> 
>> java -server -XX:+UseParallelGC -XX:+AggressiveOpts
>> -XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
>> -Dsolr.solr.home= jar start.jar
>> 
>> more information
>> http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
>> 
>> Johannes
> 


Re: Hierarchical faceting in UI

2012-01-23 Thread Johannes Goll
another way is to store the original hierarchy in a sql database (in
the form: id, parent_id, name, level) and in the Lucene index store
the complete
hierarchy (from root to leave node) for each document in one field
using the ids of the sql database. In that way you can get documents
at any level of
the hierarchy. You can use the sql database to dynamically expand the
tree by building facet queries to fetch document collections of
child-nodes.

Johannes


 from the root level down to leave node in one field "1 13 32 42 23 12"

2012/1/23  :
>
> On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao 
> wrote:
>> Programmatically, something like this might work: for each facet field,
>> add another hidden field that identifies its parent.  Then, program
>> additional logic in the UI to show only the facet terms at the currently
>> selected level.  For example, if one filters on "cat:electronics", the
> new
>> UI logic would apply the additional filter "cat_parent:electronics".
> Can
>> this be done?
>
> Yes. This is how I do it.
>
>> Would it be a lot of work?
> No. Its not a lot of work, simply represent your hierarchy as parent/child
> relations in the document fields and in your UI drill down by issuing new
> faceted searches. Use the current facet (tree level) as the parent:
> in the next query. Its much easier than other suggestions for this.
>
>> Is there a better way?
> Not in my opinion, there isn't. This is the simplest to implement and
> understand.
>
>>
>> By the way, Flamenco (another faceted browser) has built-in support for
>> hierarchies, and it has worked well for my data in this aspect (but less
>> well than Solr in others).  I'm looking for the same kind of
> hierarchical
>> UI feature in Solr.


AW: Preferred query notation for alternative field values

2012-11-28 Thread Charra, Johannes
Thanks for the hint. You are right: Both queries are identical after parsing.

>>> -Ursprüngliche Nachricht-
>>> Von: Upayavira [mailto:u...@odoko.co.uk]
>>> Gesendet: Mittwoch, 28. November 2012 12:04
>>> An: solr-user@lucene.apache.org
>>> Betreff: Re: Preferred query notation for alternative field values
>>> 
>>> Use debugQuery=true to see the format of the parsed query.
>>> 
>>> Solr will parse the query that you provide into Lucene Query objects, which 
>>> are
>>> then used to execute the query. The parsed query info provided by
>>> debugQuery=true is basically these Query objects converted back into a 
>>> string
>>> representation, showing exactly what the query was parsed into.
>>> 
>>> I bet you they are both parsed to more or less the same thing, and thus no 
>>> real
>>> impact on query time.
>>> 
>>> Upayavira
>>> 
>>> On Wed, Nov 28, 2012, at 10:54 AM, Charra, Johannes wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Is there any reason to prefer a query
>>> >
>>> > field:value1 OR field:value2 OR field:value3 OR field:value4
>>> >
>>> > over
>>> >
>>> > field:(value1 OR value2 OR value3 OR value4)
>>> >
>>> > in terms of performance? From what I perceive, there is no difference,
>>> > so I'd prefer the second query for readability reasons.
>>> >
>>> > Regards,
>>> > Johannes


Index-time synonyms and trailing wildcard issue

2013-02-13 Thread Johannes Rodenwald
Hi,

I use Solr 3.6.0 with a synonym filter as the last filter at index time, using 
a list of stemmed terms. When i do a wildcard search that matches a part of an 
entry on the synonym list, the synonyms found are used by solr to generate the 
search results. I am trying to disable that behaviour, but with no success.

Example:

Stemmed synonyms: 
apfelsin, orang

Search term:
apfel*

Matches:
Apfelkuchen, Apfelsaft, Apfelsine... (good, i want these matches)
Orange (bad, i dont want this match)

My questions are:
- Why does the synonym filter react on a wildcard query? For it is not a 
multiterm-aware component (see 
http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/MultiTermAwareComponent.html)
- How can i disable this behaviour, so that "Orange" is no longer returned by 
the query for "apfel*"?

Regards,

Johannes


Re: Index-time synonyms and trailing wildcard issue

2013-02-14 Thread Johannes Rodenwald
Hello Jack,

Thanks for your answer, it helped me gaining a deeper understandig what happens 
at index time, and finding a solution myself:

It seems that putting the synonym filter in both filter chains (index and 
query), setting expand="false", and putting the desired synonym first in the 
row, does the trick:
Synonyms line (reversed order!):
orange, apfelsine

All documents containing "apfelsine" are now mapped to "orange", so there are 
no more documets containing "apfelsine" that would match a wildcard-query for 
"apfel*"  ("Apfelsine" is a true synonym for "Orange" in german, meaning 
"chinese apple". "Apfel" = apple, shouldnt match oranges).

Problem solved, thanks again for the help!

Johannes Rodenwald 

- Ursprüngliche Mail -
Von: "Jack Krupansky" 
An: solr-user@lucene.apache.org
Gesendet: Mittwoch, 13. Februar 2013 17:17:40
Betreff: Re: Index-time synonyms and trailing wildcard issue

By doing synonyms at index time, you cause "apfelsin" to be added to 
documents that contain only "orang", so of course documents that previously 
only contained "orang" will now match for "apfelsin" or any term query that 
matches "apfelsin", such as a wildcard. At query time, Lucene cannot tell 
whether your original document contained "apfelsin" or if "apfelsin" was 
added when the document was indexed due to an index-time synonym.

Solution: Either disable index time synonyms, or have a parallel field (via 
copyField) that does not have the index-time synonyms.

But... perhaps you should clarify what you really intend to happen with 
these pseudo-synonyms.

-- Jack Krupansky




Re: Solr Grouping and empty fields

2013-02-22 Thread Johannes Rodenwald
Hi Oussama,

If you have only a few distinct, unchanging values in the field that you group 
upon, you could implement a FilterQuery (query parameter "fq") and add it to 
the query, allowing all valid values, but not an empty field. For example:

fq=my_grouping_string_field:( value_a OR value_b OR value_c OR value_d ) 

If you use SOLR 4.x, you should be able to group upon an integer field, 
allowing a range filter:
(I still work with 3.6 which can only group on string fields, so i didnt test 
this one)

fq=my_grouping_integer_field:[1 TO *]

--
Johannes Rodenwald 


- Ursprüngliche Mail -
Von: "Oussama Jilal" 
An: solr-user@lucene.apache.org
Gesendet: Freitag, 22. Februar 2013 12:32:13
Betreff: Solr Grouping and empty fields

Hi,

I need to group some results in solr based on a field, but I don't want 
documents having that field empty to be grouped together, does anyone 
know how to achieve that ?

-- 
Oussama Jilal



Update Solr Schema To Store Field

2012-02-01 Thread Johannes Goll
Hi,

I am running apache-solr-3.1.0 and would like to change a field
attribute from stored="false" to
stored="true".

I have several hundred cores that have been indexed without storing
the field which is fine
as I only would like to retrieve the value for new data that I plan to
index with the
updated schema.

My question is whether this change affects the query behavior for the
existing indexed
documents which were loaded with stored ="false"

Thanks a lot,
Johannes


Re: summing facets on a specific field

2012-02-06 Thread Johannes Goll
you can use the StatsComponent

http://wiki.apache.org/solr/StatsComponent

with stats=true&stats.price=category&stats.facet=category

and pull the sum fields from the resulting stats facets.

Johannes

2012/2/5 Paul Kapla :
> Hi everyone,
> I'm pretty new to solr and I'm not sure if this can even be done. Is there
> a way to sum a specific field per each item in a facet. For example, you
> have an ecommerce site that has the following documents:
>
> id,category,name,price
> 1,books,'solr book', $10.00
> 2,books,'lucene in action', $12.00
> 3.video, 'cool video', $20.00
>
> so instead of getting (when faceting on category)
> books(2)
> video(1)
>
> I'd like to get:
> books ($22)
> video ($20)
>
> Is this something that can be even done? Any feedback would be much
> appreciated.



-- 
Dipl.-Ing.(FH)
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878
USA


Re: summing facets on a specific field

2012-02-06 Thread Johannes Goll
I meant

stats=true&stats.field=price&stats.facet=category

2012/2/6 Johannes Goll :
> you can use the StatsComponent
>
> http://wiki.apache.org/solr/StatsComponent
>
> with stats=true&stats.price=category&stats.facet=category
>
> and pull the sum fields from the resulting stats facets.
>
> Johannes
>
> 2012/2/5 Paul Kapla :
>> Hi everyone,
>> I'm pretty new to solr and I'm not sure if this can even be done. Is there
>> a way to sum a specific field per each item in a facet. For example, you
>> have an ecommerce site that has the following documents:
>>
>> id,category,name,price
>> 1,books,'solr book', $10.00
>> 2,books,'lucene in action', $12.00
>> 3.video, 'cool video', $20.00
>>
>> so instead of getting (when faceting on category)
>> books(2)
>> video(1)
>>
>> I'd like to get:
>> books ($22)
>> video ($20)
>>
>> Is this something that can be even done? Any feedback would be much
>> appreciated.
>
>
>
> --
> Dipl.-Ing.(FH)
> Johannes Goll
> 211 Curry Ford Lane
> Gaithersburg, Maryland 20878
> USA



-- 
Dipl.-Ing.(FH)
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878
USA


Re: UI

2012-05-21 Thread Johannes Goll
yes, I am using this library and it works perfectly so far. If
something does not work you can just modify it
http://code.google.com/p/solr-php-client/

Johannes
2012/5/21 Tolga :
> Hi,
>
> Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Solr 1.4.1 stats component count not matching facet count for multi valued field

2010-12-23 Thread Johannes Goll
Hi,

I have a facet field called option which may be multi-valued and
a weight field which is single-valued.

When I use the Solr 1.4.1 stats component with a facet field, i.e.

q=*:*&version=2.2&stats=true&
stats.field=weight&stats.facet=option

I get conflicting results for the stats count result
1

when compared with the faceting counts obtained by

q=*:*&version=2.2&facet=true&facet.field=option

I would expect the same count for either method.

This happens if multiple values are stored in the options field.

It seem that for a multiple values only the last entered value is being
considered in the stats component? What am I doing wrong here?

Thanks,
Johannes


solrconfig luceneMatchVersion 2.9.3

2011-01-06 Thread Johannes Goll
Hi,

our index files have been created using Lucene 2.9.3 and solr 1.4.1.

I am trying to use a patched version of the current trunk (solr 1.5.0 ? ).
The patched version works fine with newly generated index data but
not with our existing data:

After adjusting the solrconfig.xml  - I added the line

  LUCENE_40

also tried

  LUCENE_30

I am getting the following exception

"java.lang.RuntimeException:
org.apache.lucene.index.IndexFormatTooOldException:
Format version is not supported in file '_q.fdx': 1 (needs to be
between 2 and 2)"

When I try to change it to

  LUCENE_29

or

  2.9

or

  2.9.3

I am getting

"SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
'2.9', valid values are: [LUCENE_30, LUCENE_31, LUCENE_40, LUCENE_CURRENT]
or a string in format 'V.V'"

Do you know a way to make this work with Lucene version 2.9.3 ?

Thanks,
Johannes


Re: solrconfig luceneMatchVersion 2.9.3

2011-01-07 Thread Johannes Goll
according to
http://www.mail-archive.com/solr-user@lucene.apache.org/msg40491.html

there is no more trunk support for 2.9 indexes.

So I tried the suggested solution to execute an optimize to convert a 2.9.3
index to a 3.x index.

However, when I tried to the optimize a 2.9.3 index using the Solr 4.0 trunk
version with luceneMatchVersion set to LUCENE_30 in the solrconfig.xml,
I am getting

SimplePostTool: POSTing file optimize.xml
SimplePostTool: FATAL: Solr returned an error: Severe errors in solr
configuration.  Check your log files for more detailed information on what
may be wrong.  -
java.lang.RuntimeException:
org.apache.lucene.index.IndexFormatTooOldException: Format version is not
supported in file '_0.fdx': 1 (needs to be between 2 and 2). This version of
Lucene only supports indexes created with release 3.0 and later.

Is there any other mechanism for converting index files to 3.x?



2011/1/6 Johannes Goll 

> Hi,
>
> our index files have been created using Lucene 2.9.3 and solr 1.4.1.
>
> I am trying to use a patched version of the current trunk (solr 1.5.0 ? ).
> The patched version works fine with newly generated index data but
> not with our existing data:
>
> After adjusting the solrconfig.xml  - I added the line
>
>   LUCENE_40
>
> also tried
>
>   LUCENE_30
>
> I am getting the following exception
>
> "java.lang.RuntimeException: 
> org.apache.lucene.index.IndexFormatTooOldException:
> Format version is not supported in file '_q.fdx': 1 (needs to be between 2 
> and 2)"
>
> When I try to change it to
>
>   LUCENE_29
>
> or
>
>   2.9
>
> or
>
>   2.9.3
>
> I am getting
>
> "SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion
> '2.9', valid values are: [LUCENE_30, LUCENE_31, LUCENE_40, LUCENE_CURRENT]
> or a string in format 'V.V'"
>
> Do you know a way to make this work with Lucene version 2.9.3 ?
>
> Thanks,
> Johannes
>



-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878


Re: Tuning StatsComponent

2011-01-13 Thread Johannes Goll
What field type do you recommend for a  float stats.field for optimal Solr
1.4.1 StatsComponent performance ?

float, pfloat or tfloat ?

Do you recommend to index the field ?


2011/1/12 stockii 

>
> my field Type  is "double" maybe "sint" is better ? but i need double ...
> =(
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-tp2225809p2241903.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Adding weightage to the facets count

2011-01-25 Thread Johannes Goll
Hi Siva,

try using the Solr Stats Component
http://wiki.apache.org/solr/StatsComponent

similar to
select/?&q=*:*&stats=true&stats.field={your-weight-field}&stats.facet={your-facet-field}

and get the sum field from the response. You may need to resort the weighted
facet counts to get a descending list of facet counts.

Note, there is a bug for using the Stats Component with multi-valued facet
fields.

For details see
https://issues.apache.org/jira/browse/SOLR-1782

Johannes

2011/1/24 Chris Hostetter 

>
> : prod1 has tag called “Light Weight” with weightage 20,
> : prod2 has tag called “Light Weight” with weightage 100,
> :
> : If i get facet for “Light Weight” , i will get Light Weight (2) ,
> : here i need to consider the weightage in to account, and the result will
> be
> : Light Weight (120)
> :
> : How can we achieve this?Any ideas are really helpful.
>
>
> It's not really possible with Solr out of the box.  Faceting is fast and
> efficient in Solr because it's all done using set intersections (and most
> of the sets can be kept in ram very compactly and reused).  For what you
> are describing you'd need to no only assocaited a weighted payload with
> every TermPosition, but also factor that weight in when doing the
> faceting, which means efficient set operations are now out the window.
>
> If you know java it would be probably be possible to write a custom
> SolrPlugin (a SearchComponent) to do this type of faceting in special
> cases (assuming you indexed in a particular way) but i'm not sure off hte
> top of my head how well it would scale -- the basic algo i'm thinking of
> is (after indexing each facet term wit ha weight payload) to iterate over
> the DocSet of all matching documents in parallel with an iteration over
> a TermPositions, skipping ahead to only the docs that match the query, and
> recording the sum of the payloads for each term.
>
> Hmmm...
>
> except TermPositions iterates over >> tuples,
> so you would have to iterate over every term, and for every term then loop
> over all matching docs ... like i said, not sure how efficient it would
> wind up being.
>
> You might be happier all arround if you just do some sampling -- store the
> tag+weight pairs so thta htey cna be retireved with each doc, and then
> when you get your top facet constraints back, look at the first page of
> results, and figure out what the sun "weight" is for each of those
> constraints based solely on the page#1 results.
>
> i've had happy users using a similar appraoch in the past.
>
> -Hoss




-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878


Re: solr upgrade question

2011-03-31 Thread Johannes Goll
Hi Alexander,

I have posted same question a few month ago. The only solution that came up
was to regenerate the index files using the new  version. How did you do
this exactly with
luke 1.0.1 ? Would you mind sharing some of that magic ?

Best,
Johannes


2011/3/31 Alexander Aristov 

> Didn't get any responses.
>
> But I tried luke 1.0.1 and it did the magic. I run optimization and after
> that solr got up.
>
> Best Regards
> Alexander Aristov
>
>
> On 30 March 2011 15:47, Alexander Aristov  >wrote:
>
> > People
> >
> > Is were way to upgrade existsing index from solr 1.4 to solr 4(trunk).
> When
> > I configured solr 4 and launched it complained about incorrect lucence
> file
> > version (3 instead of old 2)
> >
> > Are there any procedures to convert index?
> >
> >
> > Best Regards
> > Alexander Aristov
> >
>


apache-solr-3.1 slow stats component queries

2011-04-05 Thread Johannes Goll
Hi,

thank you for making the new apache-solr-3.1 available.

I have installed the version from

http://apache.tradebit.com/pub//lucene/solr/3.1.0/

and am running into very slow stats component queries (~ 1 minute)
for fetching the computed sum of the stats field

url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight

52825

#documents: 78,359,699
total RAM: 256G
vm arguments:  -server -xmx40G

the stats.field specification is as follows:


filter queries that narrow down the #docs help to reduce it -
QTime seems to be proportional to the number of docs being returned
by a filter query.

Is there any way to improve the performance of such stats queries ?
Caching only helped to improve the filter query performance but if
larger subsets are being returned, QTime increases unacceptably.

Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
I have created a custom 3.1 version that does only return the sum. But this
only slightly improved the performance. Of course I could somehow cache
the larger sum queries on the client side but I want to do this only as a
last resort.

Thank you very much in advance for any ideas/suggestions.

Johannes


Re: apache-solr-3.1 slow stats component queries

2011-04-18 Thread Johannes Goll
any ideas why in this case the stats summaries are so slow  ?  Thank you
very much in advance for any ideas/suggestions. Johannes

2011/4/5 Johannes Goll 

> Hi,
>
> thank you for making the new apache-solr-3.1 available.
>
> I have installed the version from
>
> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>
> and am running into very slow stats component queries (~ 1 minute)
> for fetching the computed sum of the stats field
>
> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>
> 52825
>
> #documents: 78,359,699
> total RAM: 256G
> vm arguments:  -server -xmx40G
>
> the stats.field specification is as follows:
>  stored="false" required="true" multiValued="false"
> default="1"/>
>
> filter queries that narrow down the #docs help to reduce it -
> QTime seems to be proportional to the number of docs being returned
> by a filter query.
>
> Is there any way to improve the performance of such stats queries ?
> Caching only helped to improve the filter query performance but if
> larger subsets are being returned, QTime increases unacceptably.
>
> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
> I have created a custom 3.1 version that does only return the sum. But this
> only slightly improved the performance. Of course I could somehow cache
> the larger sum queries on the client side but I want to do this only as a
> last resort.
>
> Thank you very much in advance for any ideas/suggestions.
>
> Johannes
>
>


-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878


Re: apache-solr-3.1 slow stats component queries

2011-05-05 Thread Johannes Goll
Hi,

I bench-marked the slow stats queries (6 point estimate) using the same
hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which
returns only the sum and count for statistics component results. Solr/Lucene
is run on jetty.

The relationship between query time and set of found documents is linear
when using the stats component (R^2 0.99). I guess this is expected as the
application needs to scan/sum-up the stat field for all matching documents?

Are there any plans for caching stat results for a certain stat field along
with the documents that match a filter query ? Any other ideas that could
help to improve this (hardware/software configuration) ?  Even for a subset
of 10M entries, the stat search takes on the order of 10 seconds.

Thanks in advance.
Johannes



2011/4/18 Johannes Goll 

> any ideas why in this case the stats summaries are so slow  ?  Thank you
> very much in advance for any ideas/suggestions. Johannes
>
>
> 2011/4/5 Johannes Goll 
>
>> Hi,
>>
>> thank you for making the new apache-solr-3.1 available.
>>
>> I have installed the version from
>>
>> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>>
>> and am running into very slow stats component queries (~ 1 minute)
>> for fetching the computed sum of the stats field
>>
>> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>>
>> 52825
>>
>> #documents: 78,359,699
>> total RAM: 256G
>> vm arguments:  -server -xmx40G
>>
>> the stats.field specification is as follows:
>> > stored="false" required="true" multiValued="false"
>> default="1"/>
>>
>> filter queries that narrow down the #docs help to reduce it -
>> QTime seems to be proportional to the number of docs being returned
>> by a filter query.
>>
>> Is there any way to improve the performance of such stats queries ?
>> Caching only helped to improve the filter query performance but if
>> larger subsets are being returned, QTime increases unacceptably.
>>
>> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
>> I have created a custom 3.1 version that does only return the sum. But
>> this
>> only slightly improved the performance. Of course I could somehow cache
>> the larger sum queries on the client side but I want to do this only as a
>> last resort.
>>
>> Thank you very much in advance for any ideas/suggestions.
>>
>> Johannes
>>
>>
>
>
> --
> Johannes Goll
> 211 Curry Ford Lane
> Gaithersburg, Maryland 20878
>


Re: Huge performance drop in distributed search w/ shards on the same server/container

2011-06-12 Thread Johannes Goll
Hi Fred,

we are having similar issues of scaling Solr 3.1 distributed searches on a
single box with 18 cores. We use the StatsComponent which seems to be mainly
CPU bound. Using distributed searches resulted in a 9 fold decrease in
response time. However, sporadically, Jetty 6.1.2X (shipped with  Solr 3.1.)
sporadically throws Socket connect exceptions when executing distributed
searches. Our next step is to switch from from jetty to tomcat.  Did you
find a solution for improving the CPU utilization and requests per second
for your system?

Johannes


2011/5/26 pravesh 

> Do you really require multi-shards? Single core/shard will do for even
> millions of documents and the search will be faster than searching on
> multi-shards.
>
> Consider multi-shard when you cannot scale-up on a single
> shard/machine(e.g,
> CPU,RAM etc. becomes major block).
>
> Also read through the SOLR distributed search wiki to check on all tuning
> up
> required at application server(Tomcat) end, like maxHTTP request settings.
> For a single request in a multi-shard setup internal HTTP requests are made
> through all queried shards, so, make sure you set this parameter higher.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Huge-performance-drop-in-distributed-search-w-shards-on-the-same-server-container-tp2938421p2988464.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878


Re: Huge performance drop in distributed search w/ shards on the same server/container

2011-06-14 Thread Johannes Goll
I increased the maximum POST size and headerBufferSize to 10MB ; lowThreads
to 50, maxThreads to 10 and lowResourceMaxIdleTime=15000. We tried
tomcat 6 using the following Connnector settings :



I am getting the same exception as for jetty

SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: java.net.SocketException:
Connection reset

This seem to point towards a Solr specific issue (solrj.SolrServerException
during individual shard searches).  I monitored the CPU utilization
executing sequential distributed searches and noticed that in the beginning
all CPUs are getting used for a short period of time (multiple lines for
shard searches are shown in the log with isShard=true arguments), then all
CPU except one become idle and the request is being processed by this one
CPU for the longest period of time.

I also noticed in the logs that while most of the individual shard searches
(isShard=true) have low QTimes (5-10), a minority has extreme QTimes
(104402-105126). All shards are fairly similar in size and content (1.2 M
documents) and the StatsComponent is being used
[stats=true&stats.field=weight&stats.facet=library_id]. Here library_id
equals the shard/core name.

Is there an internal timeout for gathering shard results or other fixed
resource limitation ?

Johannes






2011/6/13 Yonik Seeley 

> On Sun, Jun 12, 2011 at 9:10 PM, Johannes Goll 
> wrote:
> > However, sporadically, Jetty 6.1.2X (shipped with  Solr 3.1.)
> > sporadically throws Socket connect exceptions when executing distributed
> > searches.
>
> Are you using the exact jetty.xml that shipped with the solr example
> server,
> or did you make any modifications?
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Johannes Goll
211 Curry Ford Lane
Gaithersburg, Maryland 20878


refiltering search results

2012-08-28 Thread Johannes . Schwendinger
Hello,

Im trying to develop a search component to filter the search results agein 
with current data so that the user only sess results he is permitted to 
see.

Can someone give me a hint where to start and how to do this? Is a Search 
Component the right place to do this?

Regards
Johannes

Antwort: Re: refiltering search results

2012-08-28 Thread Johannes . Schwendinger
The main idea is to filter results as much as possible with solr an then 
check this result again. 
To do this I have to read some information from some fields of the 
documents in the result. 
At the moment I am trying to do this in the process method of a Search 
Component. But I even dont know 
how to get access to the search results or the index Fields of the 
documents. 
I have thought of ResponseBuilder.getResults() but after I have the 
DocListandSet Object I get stuck. 

I know the time of the search will increase but security has priority

Regards,
Johannes



Von:
Alexandre Rafalovitch 
An:
solr-user@lucene.apache.org
Datum:
28.08.2012 16:48
Betreff:
Re: refiltering search results



I think there was a JOIN example (for version 4) somewhere with the
permission restrictions. Or, if you have very broad categories, you
can use different search handlers with restriction queries baked in.

These might be enough. Otherwise, you have to send the list of IDs
back and forth and it could be expensive.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Aug 28, 2012 at 9:28 AM,   wrote:
> Hello,
>
> Im trying to develop a search component to filter the search results 
agein
> with current data so that the user only sess results he is permitted to
> see.
>
> Can someone give me a hint where to start and how to do this? Is a 
Search
> Component the right place to do this?
>
> Regards
> Johannes



Antwort: Re: Antwort: Re: refiltering search results

2012-08-29 Thread Johannes . Schwendinger
Von:
Ahmet Arslan 
An:
solr-user@lucene.apache.org
Datum:
29.08.2012 10:50
Betreff:
Re: Antwort: Re: refiltering search results


Thanks for the answer. 

My next question is how can i filter the result or how to replace the old 
ResponseBuilder Result with a new one?


--- On Wed, 8/29/12, johannes.schwendin...@blum.com 
 wrote:

> From: johannes.schwendin...@blum.com 
> Subject: Antwort: Re: refiltering search results
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 29, 2012, 8:22 AM
> The main idea is to filter results as
> much as possible with solr an then 
> check this result again. 
> To do this I have to read some information from some fields
> of the 
> documents in the result. 
> At the moment I am trying to do this in the process method
> of a Search 
> Component. But I even dont know 
> how to get access to the search results or the index Fields
> of the 
> documents. 
> I have thought of ResponseBuilder.getResults() but after I
> have the 
> DocListandSet Object I get stuck. 


You can read information from some fields using DocListandSet with

org.apache.solr.util.SolrPluginUtils#docListToSolrDocumentList

method.



LateBinding

2012-08-29 Thread Johannes . Schwendinger
Hello,

Has anyone ever implementet the security feature called late-binding? 

I am trying this but I am very new to solr and I would be very glad if I 
would get some hints to this.

Regards,
Johannes

Query during a query

2012-08-30 Thread Johannes . Schwendinger
Hi list,

I want to get distinct data from a single solr field when ever a search 
query is started by an user. 

How can I do this?

Regards,
Johannes

Antwort: Re: Query during a query

2012-08-30 Thread Johannes . Schwendinger
Thanks for the answer, but I want to know how I can do a seperate query 
before the main query. 
And I only want this data in my programm. The user won't see it. 
I need the values from one field to get some information from an external 
source while the main query is executed.

pravesh  schrieb am 31.08.2012 07:42:48:

> Von:
> 
> pravesh 
> 
> An:
> 
> solr-user@lucene.apache.org
> 
> Datum:
> 
> 31.08.2012 07:43
> 
> Betreff:
> 
> Re: Query during a query
> 
> Did you checked SOLR Field Collapsing/Grouping.
> http://wiki.apache.org/solr/FieldCollapsing
> http://wiki.apache.org/solr/FieldCollapsing 
> If this is what you are looking for.
> 
> 
> Thanx
> Pravesh
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/
> Query-during-a-query-tp4004624p4004631.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Antwort: Re: Antwort: Re: Query during a query

2012-09-02 Thread Johannes . Schwendinger
The problem is, that I don't know how to do this. :P

My sequence: the user enters his search words. This is sent to solr. There 
I need to make another query first to get metadata from the index. with 
this metadata I have to connect to an external source to get some 
information about the user. With this information and the first search 
words I query then the solr index to get the search result.

I hope its clear now wheres my problem and what I want to do

Regards,
Johannes



Von:
"Jack Krupansky" 
An:

Datum:
31.08.2012 15:03
Betreff:
Re: Antwort: Re: Query during a query



So, just do another query before doing the main query. What's the problem? 

Be more specific. Walk us through the sequence of processing that you 
need.

-- Jack Krupansky

-Original Message- 
From: johannes.schwendin...@blum.com
Sent: Friday, August 31, 2012 1:52 AM
To: solr-user@lucene.apache.org
Subject: Antwort: Re: Query during a query

Thanks for the answer, but I want to know how I can do a seperate query
before the main query.
And I only want this data in my programm. The user won't see it.
I need the values from one field to get some information from an external
source while the main query is executed.

pravesh  schrieb am 31.08.2012 07:42:48:

> Von:
>
> pravesh 
>
> An:
>
> solr-user@lucene.apache.org
>
> Datum:
>
> 31.08.2012 07:43
>
> Betreff:
>
> Re: Query during a query
>
> Did you checked SOLR Field Collapsing/Grouping.
> http://wiki.apache.org/solr/FieldCollapsing
> http://wiki.apache.org/solr/FieldCollapsing
> If this is what you are looking for.
>
>
> Thanx
> Pravesh
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/
> Query-during-a-query-tp4004624p4004631.html
> Sent from the Solr - User mailing list archive at Nabble.com. 




Solr Cell Questions

2012-09-24 Thread Johannes . Schwendinger
Hi,

Im currently experimenting with Solr Cell to index files to Solr. During 
this some questions came up.

1. Is it possible (and wise) to connect to Solr Cell with multiple Threads 
at the same time to index several documents at the same time?
This question came up because my prrogramm takes about 6hours to index 
round 35000 docs. (no production environment, only example solr and a 
little desktop machine but I think its very slow, and I know solr isn't 
the bottleneck (yet)) 

2. If 1 is possible, how many Threads should do this and how many memory 
Solr needs? I've tried it but i run into an out of memory exception.

Thanks in advantage

Best Regards
Johannes

Antwort: Re: Solr Cell Questions

2012-09-25 Thread Johannes . Schwendinger
Thank you Erick for your respone,

I've already tried what you've suggested and got some out of memory 
exceptions. Because of this i like the solution with solr Cell where i can 
send the file directly to solr via stream and don't collect them in my 
memory. 

And another question that came to my mind, how many documents per minute, 
second, what ever can i put into solr. Say XML format and from 100kb to 
100MB. 
Is there a number or is it to dependent from hardware and settings?


Best
Johannes

Erick Erickson  schrieb am 25.09.2012 00:22:26:

> Von:
> 
> Erick Erickson 
> 
> An:
> 
> solr-user@lucene.apache.org
> 
> Datum:
> 
> 25.09.2012 00:23
> 
> Betreff:
> 
> Re: Solr Cell Questions
> 
> If you're concerned about throughput, consider moving all the
> SolrCell (Tika) processing off the server. SolrCell is way cool
> for showing what can be done, but its downside is you're
> moving all the processing of the structured documents to the
> same machine doing the indexing. Pretty soon, especially
> with significant size files, you're spending all your CPU cycles
> parsing the files...
> 
> Happens there's a blog about this:
> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
> 
> By moving the indexing to N clients, you can increase
> throughput until you make Solr work hard to do the indexing
> 
> Best
> Erick
> 
> On Mon, Sep 24, 2012 at 10:04 AM,   
wrote:
> > Hi,
> >
> > Im currently experimenting with Solr Cell to index files to Solr. 
During
> > this some questions came up.
> >
> > 1. Is it possible (and wise) to connect to Solr Cell with multiple 
Threads
> > at the same time to index several documents at the same time?
> > This question came up because my prrogramm takes about 6hours to index
> > round 35000 docs. (no production environment, only example solr and a
> > little desktop machine but I think its very slow, and I know solr 
isn't
> > the bottleneck (yet))
> >
> > 2. If 1 is possible, how many Threads should do this and how many 
memory
> > Solr needs? I've tried it but i run into an out of memory exception.
> >
> > Thanks in advantage
> >
> > Best Regards
> > Johannes


Antwort: Re: Re: Solr Cell Questions

2012-09-25 Thread Johannes . Schwendinger
The difference with solr cell is, that i'am sending every single document 
to solr cell and don't collect them until i have a couple of them in my 
memory. 
Using mainly the code form here: 
http://wiki.apache.org/solr/ExtractingRequestHandler#SolrJ


Erick Erickson  schrieb am 25.09.2012 15:47:34:

> Von:
> 
> Erick Erickson 
> 
> An:
> 
> solr-user@lucene.apache.org
> 
> Datum:
> 
> 25.09.2012 15:48
> 
> Betreff:
> 
> Re: Re: Solr Cell Questions
> 
> bq: how many documents per minute, second, what ever can i put into solr
> 
> Too many variables to say. I've seen several thousand truly simple
> docs/sec. But since you're doing the Tika processing that's probably
> going to be your limiting factor. And it'll be many fewer...
> 
> I don't understand your OOM issue when running Tika on the client. Or,
> rather, why you think using SolrCell makes this different. SolrCell also
> uses Tika. So my suspicion it that your client-side process simply isn't
> allocating much memory to the JVM, did you try bumping the memory
> on your client?
> 
> Best
> Erick
> 
> On Tue, Sep 25, 2012 at 5:23 AM,   
wrote:
> > Thank you Erick for your respone,
> >
> > I've already tried what you've suggested and got some out of memory
> > exceptions. Because of this i like the solution with solr Cell where i 
can
> > send the file directly to solr via stream and don't collect them in my
> > memory.
> >
> > And another question that came to my mind, how many documents per 
minute,
> > second, what ever can i put into solr. Say XML format and from 100kb 
to
> > 100MB.
> > Is there a number or is it to dependent from hardware and settings?
> >
> >
> > Best
> > Johannes
> >
> > Erick Erickson  schrieb am 25.09.2012 
00:22:26:
> >
> >> Von:
> >>
> >> Erick Erickson 
> >>
> >> An:
> >>
> >> solr-user@lucene.apache.org
> >>
> >> Datum:
> >>
> >> 25.09.2012 00:23
> >>
> >> Betreff:
> >>
> >> Re: Solr Cell Questions
> >>
> >> If you're concerned about throughput, consider moving all the
> >> SolrCell (Tika) processing off the server. SolrCell is way cool
> >> for showing what can be done, but its downside is you're
> >> moving all the processing of the structured documents to the
> >> same machine doing the indexing. Pretty soon, especially
> >> with significant size files, you're spending all your CPU cycles
> >> parsing the files...
> >>
> >> Happens there's a blog about this:
> >> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
> >>
> >> By moving the indexing to N clients, you can increase
> >> throughput until you make Solr work hard to do the indexing
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Sep 24, 2012 at 10:04 AM,  
> > wrote:
> >> > Hi,
> >> >
> >> > Im currently experimenting with Solr Cell to index files to Solr.
> > During
> >> > this some questions came up.
> >> >
> >> > 1. Is it possible (and wise) to connect to Solr Cell with multiple
> > Threads
> >> > at the same time to index several documents at the same time?
> >> > This question came up because my prrogramm takes about 6hours to 
index
> >> > round 35000 docs. (no production environment, only example solr and 
a
> >> > little desktop machine but I think its very slow, and I know solr
> > isn't
> >> > the bottleneck (yet))
> >> >
> >> > 2. If 1 is possible, how many Threads should do this and how many
> > memory
> >> > Solr needs? I've tried it but i run into an out of memory 
exception.
> >> >
> >> > Thanks in advantage
> >> >
> >> > Best Regards
> >> > Johannes


Antwort: RE: Group.query

2012-09-26 Thread Johannes . Schwendinger
I think what you need is facetting, or is this another thing?
http://searchhub.org/dev/2009/09/02/faceted-search-with-solr/

Peter Kirk  schrieb am 26.09.2012 12:18:32:

> Von:
> 
> Peter Kirk 
> 
> An:
> 
> "solr-user@lucene.apache.org" 
> 
> Datum:
> 
> 26.09.2012 12:19
> 
> Betreff:
> 
> RE: Group.query
> 
> Thanks. Yes I can do this - but doesn't it mean I need to execute a 
> query per group?
> 
> What I really want to do (and I'm sorry I'm not so good at 
> explaining) is to execute one query for products, and receive 
> results grouped by the groups - but where a particular product may 
> be found in several groups.
> 
> For example, I'd like to execute a query for all products which 
> match "bucket".
> There are several products which are "buckets", each of which can 
> belong to several groups.
> Would it be possible to generate a query which would return the 
> groups, each with a list of the buckets?
> 
> Example result, with 3 groups, and several products (which may occur
> in several groups).
> 
> Children_sand_toys
>   Castle bucket
>   Plain bucket
> 
> Boys_toys
>   Castle bucket
>   Truck bucket
> 
> Girls_toys
>   Castle bucket
>   Large Pony bucket
> 
> Thanks,
> Peter
> 
> -Original Message-
> From: Ingar Hov [mailto:ingar@gmail.com] 
> Sent: 26. september 2012 11:57
> To: solr-user@lucene.apache.org
> Subject: Re: Group.query
> 
> I hope I understood the question, if so this may be a solution:
> 
> Why don't you make the field group for product multiple?
> 
> Example:
> 
>  multiValued="true"/>
> 
> If the product is a member of group1 and group2, just add both for 
> the product document so that each product has an array of group. 
> Then you can easily get all products for group1 by doing query: 
group:group1
> 
> Regards,
> Ingar
> 
> 
> 
> On Wed, Sep 26, 2012 at 10:48 AM, Peter Kirk  
wrote:
> > Thanks. Yes, the only solution I could think of was to execute 
> several queries.
> > I would like it to be a single query if at all possible. If anyone
> has ideas I could look into that would be great.
> > Thanks,
> > Peter
> >
> >
> > -Original Message-
> > From: Aditya [mailto:findbestopensou...@gmail.com]
> > Sent: 26. september 2012 10:41
> > To: solr-user@lucene.apache.org
> > Subject: Re: Group.query
> >
> > Hi
> >
> > You are doing AND search, so you are getting results prod1 and 
> prod2. I guess, you should query only for group1 and another query for 
group2.
> >
> > Regards
> > Aditya
> > www.findbestopensource.com
> >
> >
> >
> > On Wed, Sep 26, 2012 at 12:26 PM, Peter Kirk  
wrote:
> >
> >> Hi
> >>
> >> I have "products" which belong to one or more "groups".
> >> Products are documents in Solr, while the groups are fields (eg.
> >> group_1_bool:true).
> >>
> >> For example:
> >>
> >> Prod1 => group1, group2
> >> Prod2 => group1, group2
> >> Prod3 => group1
> >> Prod4 => group2
> >>
> >> I would like to execute a query which results in the groups with 
> >> their products. That is, the result should be something like:
> >>
> >> Group1 => Prod1, Prod2, Prod3
> >> Group2 => Prod1, Prod2, Prod4
> >>
> >> How can I do this?
> >>
> >> I've been looking at group.query, but I don't think this is what I 
want.
> >>
> >> For example, 
"q=*:*&group.query=group_1_bool:true+AND+group_2_bool:true"
> >> Results in 1 group called "group_1_bool:true AND group_2_bool:true", 
> >> which contains prod1 and prod2.
> >>
> >>
> >> Thanks,
> >> Peter
> >>
> >>
> >
> 
> 


System collection - lazy loading mechanism not working for custom UpdateProcessors

2018-04-25 Thread Johannes Brucher
Hi all,

I'm facing an issue regarding custom code inside a .system-collection and 
starting up a Solr Cloud cluster.
I thought, like its stated in the documentation, that in case using the .system 
collection custom code is lazy loaded, because it can happen that a collection 
that uses custom code is initialized before the system collection is up and 
running.

I did all the necessary configuration and while debugging, I can see that the 
custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, 
but I still get Exceptions when starting the Solr Cloud with the following 
errors:

SolrException: Blob loading failed: .no active replica available for .system 
collection...

In my case I'm using custom code for a couple of UpdateProcessors. So it seems, 
that this lazy mechanism is not working well for UpdateProcessors.
Inside the calzz LazyPluginHolder the comment says:

"A class that loads plugins Lazily. When the get() method is invoked the Plugin 
is initialized and returned."

When a core is initialized and you have a custom UpdateProcessor, the 
get-method is invoked directly and the lazy loading mechanism tries to get the 
custom class from the MemClassLoader, but in most scenarios the system 
collection is not up and the above Exception is thrown...
So maybe it’s the case that for UpdateProcessors while initializing a core, the 
routine is not implemented optimal for the lazy loading mechanism?

Pls let me know if it helps sharing my configuration!

Many thanks,

Johannes




System collection - lazy loading mechanism not working for custom UpdateProcessors?

2018-04-25 Thread Johannes Brucher
Hi all,

I'm facing an issue regarding custom code inside a .system-collection and 
starting up a Solr Cloud cluster.
I thought, like its stated in the documentation, that in case using the .system 
collection custom code is lazy loaded, because it can happen that a collection 
that uses custom code is initialized before the system collection is up and 
running.

I did all the necessary configuration and while debugging, I can see that the 
custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, 
but I still get Exceptions when starting the Solr Cloud with the following 
errors:

SolrException: Blob loading failed: .no active replica available for .system 
collection...

In my case I'm using custom code for a couple of UpdateProcessors. So it seems, 
that this lazy mechanism is not working well for UpdateProcessors.
Inside the calzz LazyPluginHolder the comment says:

"A class that loads plugins Lazily. When the get() method is invoked the Plugin 
is initialized and returned."

When a core is initialized and you have a custom UpdateProcessor, the 
get-method is invoked directly and the lazy loading mechanism tries to get the 
custom class from the MemClassLoader, but in most scenarios the system 
collection is not up and the above Exception is thrown...
So maybe it’s the case that for UpdateProcessors while initializing a core, the 
routine is not implemented optimal for the lazy loading mechanism?

Pls let me know if it helps sharing my configuration!

Many thanks,

Johannes




AW: System collection - lazy loading mechanism not working for custom UpdateProcessors?

2018-04-26 Thread Johannes Brucher
Maybe I have found a more accurate example constellation to reproduce the error.

By default the .system-collection is created with 1 shard and 1 replica.


In this constellation, everything works as expected and no matter how often I 
try to restart the Solr Cloud,

the error "SolrException: Blob loading failed: .no active replica available for 
.system collection" is never thrown...



[cid:image006.jpg@01D3DD73.9B783400]





But once I started to add one more replica to the .system collection things are 
messing up!

[cid:image007.jpg@01D3DD73.9B783400]



With this setup, I'm not able to start the Solr Cloud server without any error:



[cid:image008.jpg@01D3DD73.9B783400]



Sometimes one or two collections are Active but most of the time all 
collections are permanently marked as Down…



[cid:image012.jpg@01D3DD73.9B783400]

Are there any restrictions how to setup the .system collection?





Johannes


-Ursprüngliche Nachricht-
Von: Johannes Brucher [mailto:johannes.bruc...@shi-gmbh.com]
Gesendet: Mittwoch, 25. April 2018 10:57
An: solr-user@lucene.apache.org
Betreff: System collection - lazy loading mechanism not working for custom 
UpdateProcessors?



Hi all,



I'm facing an issue regarding custom code inside a .system-collection and 
starting up a Solr Cloud cluster.

I thought, like its stated in the documentation, that in case using the .system 
collection custom code is lazy loaded, because it can happen that a collection 
that uses custom code is initialized before the system collection is up and 
running.



I did all the necessary configuration and while debugging, I can see that the 
custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, 
but I still get Exceptions when starting the Solr Cloud with the following 
errors:



SolrException: Blob loading failed: .no active replica available for .system 
collection...



In my case I'm using custom code for a couple of UpdateProcessors. So it seems, 
that this lazy mechanism is not working well for UpdateProcessors.

Inside the calzz LazyPluginHolder the comment says:



"A class that loads plugins Lazily. When the get() method is invoked the Plugin 
is initialized and returned."



When a core is initialized and you have a custom UpdateProcessor, the 
get-method is invoked directly and the lazy loading mechanism tries to get the 
custom class from the MemClassLoader, but in most scenarios the system 
collection is not up and the above Exception is thrown...

So maybe it’s the case that for UpdateProcessors while initializing a core, the 
routine is not implemented optimal for the lazy loading mechanism?



Pls let me know if it helps sharing my configuration!



Many thanks,



Johannes






AW: System collection - lazy loading mechanism not working for custom UpdateProcessors?

2018-04-26 Thread Johannes Brucher
Ty Shawn,

I’m trying to use JustPaste.it to share my screenshots…


Hi all,

maybe I have found a more accurate example constellation to reproduce the error.

By default the .system-collection is created with 1 shard and 1 replica if you 
using just one node.

In this constellation, everything works as expected and no matter how often I 
try to restart the Solr Cloud,

the error "SolrException: Blob loading failed: .no active replica available for 
.system collection" is never thrown...

https://justpaste.it/685gf



But once I started to add one more replica to the .system collection things are 
messing up!

With this setup, I'm not able to start the Solr Cloud server without any error:

https://justpaste.it/4t66c



Sometimes one or two collections are Active but most of the time all 
collections are permanently marked as Down…

Here are the Exceptions I’m constantly getting:

https://justpaste.it/5ziem


Are there any restrictions how to setup the .system collection?





Johannes


-Ursprüngliche Nachricht-
Von: Johannes Brucher [mailto:johannes.bruc...@shi-gmbh.com]
Gesendet: Mittwoch, 25. April 2018 10:57
An: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
Betreff: System collection - lazy loading mechanism not working for custom 
UpdateProcessors?



Hi all,



I'm facing an issue regarding custom code inside a .system-collection and 
starting up a Solr Cloud cluster.

I thought, like its stated in the documentation, that in case using the .system 
collection custom code is lazy loaded, because it can happen that a collection 
that uses custom code is initialized before the system collection is up and 
running.



I did all the necessary configuration and while debugging, I can see that the 
custom code is wrapped via a PluginBag$LazyPluginHolder. So far its seems good, 
but I still get Exceptions when starting the Solr Cloud with the following 
errors:



SolrException: Blob loading failed: .no active replica available for .system 
collection...



In my case I'm using custom code for a couple of UpdateProcessors. So it seems, 
that this lazy mechanism is not working well for UpdateProcessors.

Inside the calzz LazyPluginHolder the comment says:



"A class that loads plugins Lazily. When the get() method is invoked the Plugin 
is initialized and returned."



When a core is initialized and you have a custom UpdateProcessor, the 
get-method is invoked directly and the lazy loading mechanism tries to get the 
custom class from the MemClassLoader, but in most scenarios the system 
collection is not up and the above Exception is thrown...

So maybe it’s the case that for UpdateProcessors while initializing a core, the 
routine is not implemented optimal for the lazy loading mechanism?



Pls let me know if it helps sharing my configuration!



Many thanks,



Johannes






gzip compression solr 8.4.1

2020-04-23 Thread Johannes Siegert
Hi,

we want to use gzip-compression between our application and the solr server.

We use a standalone solr server version 8.4.1 and the prepackaged jetty as
application server.

We have enabled the jetty gzip module by adding these two files:

{path_to_solr}/server/modules/gzip.mod (see below the question)
{path_to_solr}/server/etc/jetty-gzip.xml (see below the question)

Within the application we use a HttpSolrServer that is configured with
allowCompression=true.

After we had released our application we saw that the number of connections
within the TCP-state CLOSE_WAIT rising up until the application was not
able to open new connections.


After a long debugging session we think the problem is that the header
"Content-Length" that is returned by the jetty is sometimes wrong when
gzip-compression is enabled.

The solrj client uses a ContentLengthInputStream, that uses the header
"Content-Lenght" to detect if all data was received. But the InputStream
can not be fully consumed because the value of the header "Content-Lenght"
is higher than the actual content-length.

Usually the method PoolingHttpClientConnectionManager.releaseConnection is
called after the InputStream was fully consumed. This give the connection
free to be reused or to be closed by the application.

Due to the incorrect header "Content-Length" the
PoolingHttpClientConnectionManager.releaseConnection method is never called
and the connection stays active. After the connection-timeout of the jetty
is reached, it closes the connection from the server-side and the TCP-state
switches into CLOSE_WAIT. The client never closes the connection and so the
number of connections in use rises up.


Currently we try to configure the jetty gzip module to return no
"Content-Length" if gzip-compression was used. We hope that in this case
another InputStream implementation is used that uses the NULL-terminator to
see when the InputStream was fully consumed.

Do you have any experiences with this problem or any suggestions for us?

Thanks,

Johannes


gzip.mod

-

DO NOT EDIT - See:
https://www.eclipse.org/jetty/documentation/current/startup-modules.html

[description]
Enable GzipHandler for dynamic gzip compression
for the entire server.

[tags]
handler

[depend]
server

[xml]
etc/jetty-gzip.xml

[ini-template]
## Minimum content length after which gzip is enabled
jetty.gzip.minGzipSize=2048

## Check whether a file with *.gz extension exists
jetty.gzip.checkGzExists=false

## Gzip compression level (-1 for default)
jetty.gzip.compressionLevel=-1

## User agents for which gzip is disabled
jetty.gzip.excludedUserAgent=.*MSIE.6\.0.*

-

jetty-gzip.xml

-


http://www.eclipse.org/jetty/configure_9_3.dtd";>


















































-


Re: gzip compression solr 8.4.1

2020-05-05 Thread Johannes Siegert
Hi,

We did further tests to see where the problem exactly is. These are our
outcomes:

The content-length is calculated correctly, a quick test with curl showed
this.
The problem is that the stream with the gzip data is not fully consumed and
afterwards not closed.

Using the debugger with a breakpoint at
org/apache/solr/common/util/Utils.java:575 shows that it won't enter the
function readFully((entity.getContent()) most likely due to how the gzip
stream content is wrapped and extracted beforehand.

On line org/apache/solr/common/util/Utils.java:582 the
consumeQuietly(entity) should close the stream but does not because of a
silent exception.

This seems to be the same as it is described in
https://issues.apache.org/jira/browse/SOLR-14457

We saw that the problem happened also with correct GZIP responses from
jetty. Not only with non-GZIP as described within the jira issue.

Best,

Johannes

Am Do., 23. Apr. 2020 um 09:55 Uhr schrieb Johannes Siegert <
johannes.sieg...@offerista.com>:

> Hi,
>
> we want to use gzip-compression between our application and the solr
> server.
>
> We use a standalone solr server version 8.4.1 and the prepackaged jetty as
> application server.
>
> We have enabled the jetty gzip module by adding these two files:
>
> {path_to_solr}/server/modules/gzip.mod (see below the question)
> {path_to_solr}/server/etc/jetty-gzip.xml (see below the question)
>
> Within the application we use a HttpSolrServer that is configured with
> allowCompression=true.
>
> After we had released our application we saw that the number of
> connections within the TCP-state CLOSE_WAIT rising up until the application
> was not able to open new connections.
>
>
> After a long debugging session we think the problem is that the header
> "Content-Length" that is returned by the jetty is sometimes wrong when
> gzip-compression is enabled.
>
> The solrj client uses a ContentLengthInputStream, that uses the header
> "Content-Lenght" to detect if all data was received. But the InputStream
> can not be fully consumed because the value of the header "Content-Lenght"
> is higher than the actual content-length.
>
> Usually the method PoolingHttpClientConnectionManager.releaseConnection is
> called after the InputStream was fully consumed. This give the connection
> free to be reused or to be closed by the application.
>
> Due to the incorrect header "Content-Length" the
> PoolingHttpClientConnectionManager.releaseConnection method is never called
> and the connection stays active. After the connection-timeout of the jetty
> is reached, it closes the connection from the server-side and the TCP-state
> switches into CLOSE_WAIT. The client never closes the connection and so the
> number of connections in use rises up.
>
>
> Currently we try to configure the jetty gzip module to return no
> "Content-Length" if gzip-compression was used. We hope that in this case
> another InputStream implementation is used that uses the NULL-terminator to
> see when the InputStream was fully consumed.
>
> Do you have any experiences with this problem or any suggestions for us?
>
> Thanks,
>
> Johannes
>
>
> gzip.mod
>
> -
>
> DO NOT EDIT - See:
> https://www.eclipse.org/jetty/documentation/current/startup-modules.html
>
> [description]
> Enable GzipHandler for dynamic gzip compression
> for the entire server.
>
> [tags]
> handler
>
> [depend]
> server
>
> [xml]
> etc/jetty-gzip.xml
>
> [ini-template]
> ## Minimum content length after which gzip is enabled
> jetty.gzip.minGzipSize=2048
>
> ## Check whether a file with *.gz extension exists
> jetty.gzip.checkGzExists=false
>
> ## Gzip compression level (-1 for default)
> jetty.gzip.compressionLevel=-1
>
> ## User agents for which gzip is disabled
> jetty.gzip.excludedUserAgent=.*MSIE.6\.0.*
>
> -
>
> jetty-gzip.xml
>
> -
>
> 
>  http://www.eclipse.org/jetty/configure_9_3.dtd";>
>
> 
> 
> 
> 
> 
> 
>
> 
> 
> 
>  class="org.eclipse.jetty.server.handler.gzip.GzipHandler">
> 
>  deprecated="gzip.minGzipSize" default="2048" />
> 
> 
>  deprecated="gzip.checkGzExists" default="false" />
> 
> 
>  deprecated="gzip.compressionLevel" default="-1" />
> 
> 

ManagedFilter for stemming

2019-07-09 Thread Johannes Siegert
Hi,

we are using the SnowballPorterFilter to stem our tokens for serveral
languages.

Now we want to update the list of protected words over the Solr-API.

As I can see, there are only solutions for SynonymFilter and the
StopwordFilter with ManagedSynonymFilter and ManagedStopFilter.

Do you know any solution for my problem?

Thanks,

Johannes