Re: Replication lag after cache optimizations

2012-09-02 Thread Damien DUDOGNON
Thanks for your answer Erick.

For the polling interval, we use 1 second for the small index and 1 minute
for the big one. I'll try to increase it up to 5 minutes and see if it
solve the problem.The issue doesn't occur with the default cache settings
(i.e. cache size=512).

Indeed, we strive for near real time searching, so I'll take a look at 4.0
with NRT support, hoping it is sufficiently stable for production
environment.

Kind regards,
Damien

2012/9/2 Erick Erickson 

> You polling interval is much too short. 1 second
> is probably getting you into resource contention
> issues.
>
> A more reasonable interval is on the order of several
> minutes. If you really need near real time
> searching, consider 4.0 which supports NRT
>
> Best
> Erick
>
> On Fri, Aug 31, 2012 at 10:02 AM, Damien Dudognon
>  wrote:
> > Hi,
> >
> > We get some troubles with the solr replication after cache
> optimizations. We use a lot the facet features.
> >
> > We have increased the cache size and its initial size. We have also
> change the queryResultCache from LRU to FastLRU and the fieldValueCache was
> activated (see below the detailed configuration). This optimization step
> allows to  divide the average time per request by 50 (from 260ms to 5ms).
> >
> > However, with these modifications we noticed an important replication
> lag. This issue is not troublesome for smaller indexes (about 300.000
> elements - 3Gb), but it becomes critical if the index size is significant
> (30 million elements - 70Gb). In fact, the slaves can't make up for lag and
> they become out-of-date (and consequently unusable).
> >
> > Has anyone ever been faced this kind of problems?
> >
> > Our environment :
> > - Solr 3.4.0
> > - Java 1.6.0_26
> > - Debian 6.0.3
> >
> > Best regards,
> > Damien
> >
> > --
> > My previous cache settings (fieldValueCache was disabled):
> > --
> >  autowarmCount="0" />
> >  autowarmCount="0" />
> >  autowarmCount="0" />
> > 
> >
> > --
> > The settings now used:
> > --
> >  autowarmCount="0" />
> >  initialSize="4096" autowarmCount="0" />
> >  autowarmCount="0" />
> >  autowarmCount="1024" showItems="32" />
> >
> > --
> > The replication config:
> > --
> >   
> > 
> >   ${solr.enable.master:false}
> >   commit
> >   startup
> >   schema.xml,stopwords.txt
> > 
> > 
> >   ${solr.enable.slave:false}
> >   http://solrmaster:
> ${jetty.port:8083}/solr/en/replication
> >   00:00:01
> > 
> >   
>



-- 

Damien DUDOGNON

R&D Engineer - PhD Student

   - +33 (0)6 62 79 34 10
   - damien.dudog...@ebuzzing.com
   -
  -
   - linkedin.com/in/damiendudognon
   - www.irit.fr/~Damien.Dudognon


   - 1, avenue Jean Rieux
   - 31500 Toulouse
   - France
   - +33 (0)5 62 48 33 90
   -


RE: solrj api for partial document update

2012-09-02 Thread Yoni Amir
In the solrj api, the value of a SolrInputField can be a map, in which case 
solrj adds an additional attribute to the field's xml element.
For example,
This code:

SolrInputDocument doc = new SolrInputDocument();
Map partialUpdate = new HashMap();
partialUpdate.put("set", "foo");
doc.addField("id", "test_123");
doc.addField("description", partialUpdate);

yields this document:


test_123
foo


In this example I used the value "set" for this additional attribute, but it 
doesn't work. Solr doesn't update the field as I expected.
According to this link: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
valid values are "set" and "add".

Any idea?

Thanks,
Yoni


-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Saturday, September 01, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: RE: solrj api for partial document update

Any word on this?
I inspected the solrj code an found nothing. It's a shame if the GA version 
comes out without such an api.
Thanks again,
Yoni

-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Thursday, August 30, 2012 8:48 AM
To: solr-user@lucene.apache.org
Subject: solrj api for partial document update

Is there a solrj api for partial document update in solr 4?

It is described here: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

That article explains how the xml structure should be. I want to use solrj api, 
but I can't figure out if it is supported.

Thanks,
Yoni



stem porter with tokenizer..

2012-09-02 Thread Emiliana Suci
PorterStemmer using tokenizer which class in lucene??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/stem-porter-with-tokenizer-tp4004913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Not releasing memory

2012-09-02 Thread Rohit
Hi,

 

We are running solr3.5 using tomcal 6.26  on a Windows Enterprise RC2
server, our index size if pretty large.

 

We have noticed that once tomcat starts using/reserving ram it never
releases them, even when there is not a single user on the system.  I have
tried forced garbage collection, but that doesn't seem to help either.

 

Regards,

Rohit

 



Re: LineEntityProcessor process only one file

2012-09-02 Thread Lance Norskog
Ahmet, please post your dih script in a message (not as an attachment).

- Original Message -
| From: "James Dyer" 
| To: solr-user@lucene.apache.org
| Sent: Friday, August 31, 2012 12:53:50 PM
| Subject: RE: LineEntityProcessor process only one file
| 
| No, it should process all of the files that get listed.  I'm taking a
| look at the issue you opened, SOLR-3779.  This is also similar to
| SOLR-3307, although that was reported as a bug with "threads" in
| 3.6, which is no longer a feature in 4.0.
| 
| James Dyer
| E-Commerce Systems
| Ingram Content Group
| (615) 213-4311
| 
| 
| -Original Message-
| From: Ahmet Arslan [mailto:iori...@yahoo.com]
| Sent: Friday, August 31, 2012 1:53 PM
| To: solr-user@lucene.apache.org
| Subject: LineEntityProcessor process only one file
| 
| LineEntityProcessor processes only one document when combined with
| FileListEntityProcessor. Is this by design?
| 
| 
| 
| 
| 


Re: stem porter with tokenizer..

2012-09-02 Thread Lance Norskog
If you want to know class names, you want to check out the source code!

http://lucene.apache.org/core/developer.html
http://find.searchhub.org/s:javadoc,wiki?q=porterStemmer

- Original Message -
| From: "Emiliana Suci" 
| To: solr-user@lucene.apache.org
| Sent: Sunday, September 2, 2012 12:22:05 AM
| Subject: stem porter with tokenizer..
| 
| PorterStemmer using tokenizer which class in lucene??
| 
| 
| 
| --
| View this message in context:
| http://lucene.472066.n3.nabble.com/stem-porter-with-tokenizer-tp4004913.html
| Sent from the Solr - User mailing list archive at Nabble.com.
| 


Re: Solr Not releasing memory

2012-09-02 Thread Lance Norskog
1) I believe Java 1.7 release memory back to the OS.
2) All of the Javas I've used on Windows do this.

Is the physical memory use a problem? Does it push out all other programs?

Or is it just that the Java process appears larger? This explains the latter:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

- Original Message -
| From: "Rohit" 
| To: solr-user@lucene.apache.org
| Sent: Sunday, September 2, 2012 1:22:14 AM
| Subject: Solr Not releasing memory
| 
| Hi,
| 
|  
| 
| We are running solr3.5 using tomcal 6.26  on a Windows Enterprise RC2
| server, our index size if pretty large.
| 
|  
| 
| We have noticed that once tomcat starts using/reserving ram it never
| releases them, even when there is not a single user on the system.  I
| have
| tried forced garbage collection, but that doesn't seem to help
| either.
| 
|  
| 
| Regards,
| 
| Rohit
| 
|  
| 
| 


Antwort: Re: Antwort: Re: Query during a query

2012-09-02 Thread Johannes . Schwendinger
The problem is, that I don't know how to do this. :P

My sequence: the user enters his search words. This is sent to solr. There 
I need to make another query first to get metadata from the index. with 
this metadata I have to connect to an external source to get some 
information about the user. With this information and the first search 
words I query then the solr index to get the search result.

I hope its clear now wheres my problem and what I want to do

Regards,
Johannes



Von:
"Jack Krupansky" 
An:

Datum:
31.08.2012 15:03
Betreff:
Re: Antwort: Re: Query during a query



So, just do another query before doing the main query. What's the problem? 

Be more specific. Walk us through the sequence of processing that you 
need.

-- Jack Krupansky

-Original Message- 
From: johannes.schwendin...@blum.com
Sent: Friday, August 31, 2012 1:52 AM
To: solr-user@lucene.apache.org
Subject: Antwort: Re: Query during a query

Thanks for the answer, but I want to know how I can do a seperate query
before the main query.
And I only want this data in my programm. The user won't see it.
I need the values from one field to get some information from an external
source while the main query is executed.

pravesh  schrieb am 31.08.2012 07:42:48:

> Von:
>
> pravesh 
>
> An:
>
> solr-user@lucene.apache.org
>
> Datum:
>
> 31.08.2012 07:43
>
> Betreff:
>
> Re: Query during a query
>
> Did you checked SOLR Field Collapsing/Grouping.
> http://wiki.apache.org/solr/FieldCollapsing
> http://wiki.apache.org/solr/FieldCollapsing
> If this is what you are looking for.
>
>
> Thanx
> Pravesh
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/
> Query-during-a-query-tp4004624p4004631.html
> Sent from the Solr - User mailing list archive at Nabble.com. 




Re: need basic information

2012-09-02 Thread pravesh
Do logstash/graylog2 do log processing/searching in real time? Or can scale
for real time need?
I guess harshadmehta is looking for real-time indexing/search.

Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-basic-information-tp4004588p4004996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-02 Thread Günter Hipler
I made more tests with the Lucene/SOLR 4.0 version deployed in March and
the latest Lucene 4.0 beta version over the weekend.


My findings:

- the version deployed in march doesn't contain the error I now come across
in Beta4.0 (The number of documents part of the facetcounts differs
 from the real number of documents in a subsequent drill-down request using
a filter query)
This is true even in case a lot of updates were done against the index
At the moment this can be seen under
http://sb-tp1.swissbib.unibas.ch/(e.g. with the term 'mitbestimmung'
and the facet value  'nebis I used for
all my tests)
As a note: because we have to migrate the OS of our servers the host might
be down in the course of the current week for one or two days.

- using the latest Lucene/Solr Beta version, the error occurs when updates
are committed against the index as I described it in my former messages.
When the index is new and freshly built the error doesn't occur (I made
these tests on a host which is not accessible for the public)

>From my point of view this is a severe bug in Lucene/Solr Beta 4.0 because
filter queries are used very, very often!

I would be very happy if someone of the SOLR core team could comment it.

Thanks a lot for support!

Günter Hipler

2012/8/31 Günter Hipler 

>
> Hi,
>
> thanks for your responses!
>
> I made a more simple query with only one facet and without any boosting
> stuff so it should be easier to focus the problem
>
>
> facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+(+%2Bmitbestimmung++)+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true
> ->
> facet=on&
> facet.mincount=1&
> facet.limit=100&
> rows=0&
> start=0&
> q=+(+%2Bmitbestimmung++)+&
> facet.field=navNetwork&
> qt=only_queryfields_edismax&
> debugQuery=true
>
> facet counts say 2734 documents for nebis
> parsedQuery
> (+(+DisjunctionMaxQuery((title_series:mitbestimmung |
> title_uniform:mitbestimmung | authorfull:mitbestimmung |
> callnum:mitbestimmung | sfulltext:mitbestimmung | title_short:mitbestimmung
> | sbranchlib:mitbestimmung | bibid:mitbestimmung |
> sfullTextRemoteData:mitbestimmung | title_long:mitbestimmung |
> autnum:mitbestimmung | subfull:mitbestimmung |
> publplace:mitbestimmung/no_coord
> parsedQuery_toString
> +(+(title_series:mitbestimmung | title_uniform:mitbestimmung |
> authorfull:mitbestimmung | callnum:mitbestimmung | sfulltext:mitbestimmung
> | title_short:mitbestimmung | sbranchlib:mitbestimmung |
> bibid:mitbestimmung | sfullTextRemoteData:mitbestimmung |
> title_long:mitbestimmung | autnum:mitbestimmung | subfull:mitbestimmung |
> publplace:mitbestimmung))
>
>
>
> facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+(+%2Bmitbestimmung++)+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq={!term+f%3DnavNetwork}nebis
> ->
> facet=on&facet.mincount=1&
> facet.limit=100&
> rows=0&
> start=0&
> q=+(+%2Bmitbestimmung++)+&
> facet.field=navNetwork&
> qt=only_queryfields_edismax&
> debugQuery=true&
> fq={!term+f%3DnavNetwork}nebis
>
> delivers 2871 (not the same as the number indicated in the base query)
> What is interesting:
> the facetcount of the second query itself shows the 'correct' number
> indicated in the base query (2734)
>
> parsedQuery and parsedQuery_ToString same as in base query
> @Jack: and is exactly the same for a filter query with fq=navNetwork:nebis
> we are using the term query parser to overcome problems with escaping
> special characters (as it is also described in the
> Solr Enterprise Search server book on page 189)
>
>
> Using the alternatives suggested by Hoss
>
> http://sb-s7.swissbib.unibas.ch:8080/solr/collection1/select?facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+%28+%2Bmitbestimmung++%29+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq={!raw%20f=navNetwork}nebis
> and
>
> facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+(+%2Bmitbestimmung++)+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq={!lucene}navNetwork:nebis
> don't change the result. The number of returned documents is higher than
> it should be related to the number of facets in the facet counts displayed
> in the base query
>
>
> the type we are using for navNetwork:
>  
> 
> omitNorms="true">
>   
>  
>  
>   pattern="^(\([a-z]+\))vtls0"
> replacement="$10"
> replace="all"
>  />
>   pattern="[^\w]+"
> replacement=""
> replace="all"
>  />
>  
>  
>   
>
>
>
> which in my opinion should be a common treatment for facet types
>
> the new requestHandler I'm using is quite simple (without any boost