Re: Indexing gets significantly slower after every batch commit

2015-05-22 Thread Siegfried Goeschl
Hi Angel, a while ago I had issues with VMWare VM - somehow snapshots were created regularly which dragged down the machine. So I think is is a good idea to baseline the performance on physical box before moving to VMs, production boxes or whatever is thrown at you Cheers, Siegfried Goeschl

Re: New article on ZK "Poison Packet"

2015-05-10 Thread Siegfried Goeschl
Cool stuff - thanks for sharing Siegfried Goeschl > On 09 May 2015, at 08:43, steve wrote: > > While very technical and unusual, a very interesting view of the world of > Linux and ZooKeeper Clusters... > http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/ >

Re: Indexing PDF and MS Office files

2015-04-16 Thread Siegfried Goeschl
look at commons-exec :-) Cheers, Siegfried Goeschl PS: one more thing - please, tell your management that you will never ever successfully all real-world PDFs and cater for that fact in your requirements :-) On 16.04.15 13:10, Vijaya Narayana Reddy Bhoomi Reddy wrote: Erick, I tried ind

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl
uckets * the XML processor uses a Stax parser to handle huge JTL files (exceeding 1 GB) * it also caters for merging JTL files when running multiple JMeter instances Cheers, Siegfried Goeschl > On 06 Apr 2015, at 22:57, Walter Underwood wrote: > > The load testing is the easiest pa

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl
The good-sounding thing - you can do that easily with JMeter running the GUI or the command-line Cheers, Siegfried Goeschl > On 06 Apr 2015, at 21:35, Davis, Daniel (NIH/NLM) [C] > wrote: > > This sounds really good: > > "For load testing, we replay production logs t

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl
Appreciated :-) Siegfried Goeschl > On 06 Apr 2015, at 20:31, Davis, Daniel (NIH/NLM) [C] > wrote: > > OK, > > I have a lot of chutzpah posting that here ;)The other guys answering the > questions can probably explain it better. > I love showing off, howev

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl
, Siegfried Goeschl > On 06 Apr 2015, at 20:04, Davis, Daniel (NIH/NLM) [C] > wrote: > > Siegfried, > > It is early days as yet. I don't think we need a code drop. AFAIK, none > of our current Solr applications autocomplete the search box based on popular > qu

Re: Measuring QPS

2015-04-06 Thread Siegfried Goeschl
rch terms so this might be a better solution in the long run. But this requires to have a separate SOLR core & ingest plus GUI (check out SILK or ELK) - in other words more moving parts in production :-) * If there is sufficient interest I can make a code drop on GitHub Cheers, Siegfried Go

Re: Measuring QPS

2015-04-04 Thread Siegfried Goeschl
elopment-to-production-20121210.pdf> http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf <http://people.apache.org/~sgoeschl/presentations/jsug-2015/jee-performance-monitoring.pdf> Cheers, Siegfried Goeschl > On 03 Apr 2015, at 17:53, Shawn Hei

Re: Trending functionality in Solr

2015-02-09 Thread Siegfried Goeschl
If you are interested we could team up and make a proper SOLR contribution :-) Cheers, Siegfried Goeschl On 08.02.15 05:26, S.L wrote: Folks, Is there a way to implement the trending functionality using Solr , to give the results using a query for say the most searched terms in the past

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Siegfried Goeschl
Hi Dan, neat idea - made a mental note :-) That brings us back to the point that in complex setups you should not do the document pre-processing directly in SOLR but have an import process which can safely crash when processing a 4GB PDF file Cheers, Siegfried Goeschl On 16.01.15 05:02

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-15 Thread Siegfried Goeschl
Hi Ganesh, you can increase the heap size but parsing a 4 GB PDF document will very likely consume A LOT OF memory - I think you need to check if that large PDF can be parsed at all :-) Cheers, Siegfried Goeschl On 14.01.15 18:04, Michael Della Bitta wrote: Yep, you'll have to inc

Re: Slow queries

2014-12-08 Thread Siegfried Goeschl
expensive SOLR queries, what is your server code is doing - many questions and even more answers to that - in other words nobody can help you when the basic work is not done. And when you know your application performance-wise you probably also the solution :-) Cheers, Siegfried Goeschl >

Re: Slow queries

2014-12-02 Thread Siegfried Goeschl
the next three years :-) Cheers, Siegfried Goeschl > On 02 Dec 2014, at 10:02, melb wrote: > > Yes performance degraded over the time, I can raise the memory but I can't > do it every time and the volume will keep growing > Is it better to put the solr on dedicated machi

Re: Slow queries

2014-12-02 Thread Siegfried Goeschl
If you performance was fine but degraded over the time it might be easier to check / increase the memory to have better disk caching. Cheers, Siegfried Goeschl On 02.12.14 09:27, melb wrote: Hi, I have a solr collection with 16 millions documents and growing daily with 1 documents

Re: AW: AW: slorj -> httpclient 4, but we already have httpclient 3 in use

2014-09-19 Thread Siegfried Goeschl
Lucky you :-) Siegfried Goeschl On 19.09.14 07:31, Clemens Wyss DEV wrote: I'd like to mention, that substituting the httpcore.jar with the latest (4.3) "sufficed"... -Ursprüngliche Nachricht- Von: Guido Medina [mailto:guido.med...@temetra.com] Gesendet: Donnerstag

Re: AW: slorj -> httpclient 4, but we already have httpclient 3 in use

2014-09-18 Thread Siegfried Goeschl
AFAIK even the different minor versions are source/binary compatible so you might need to tinker with the right "version" to get your server running Cheers, Siegfried Goeschl On 18.09.14 17:45, Guido Medina wrote: Hi Clemens, If you are going thru the effort of migrating from So

Re: slorj -> httpclient 4, but we already have httpclient 3 in use

2014-09-18 Thread Siegfried Goeschl
not on the production server or might change to a change in the project Cheers, Siegfried Goeschl On 18.09.14 15:08, Clemens Wyss DEV wrote: I doing initial steps with solrj which is based on httpclient 4. Unfortunately parts of our framework are based on httpclient 3. So when I instantiat

Re: Mongo DB Users

2014-09-16 Thread Siegfried Goeschl
remove please On 16.09.14 15:42, Karolina Dobromiła Jeleń wrote: remove please On Tue, Sep 16, 2014 at 9:35 AM, Amey Patil wrote: Remove. On Tue, Sep 16, 2014 at 12:58 PM, Joan wrote: Remove please 2014-09-16 6:59 GMT+02:00 Patti Kelroe-Cooke : Remove Kind regards Patti On Mon, Sep

Re: external indexer for Solr Cloud

2014-09-01 Thread Siegfried Goeschl
the Camel Solr Integration http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/99739 Cheers, Siegfried Goeschl On 01.09.14 18:05, Jack Krupansky wrote: Packaging SolrCell in the same manner, with parallel threads and able to talk to multiple SolrCloud servers in parallel would have a

Re: SOLR Performance benchmarking

2014-07-13 Thread Siegfried Goeschl
, optimise them, re-run your tests ** check your cache warming and how fast you start your load injector threads Cheers, Siegfried Goeschl On 13 Jul 2014, at 09:53, rashi gandhi wrote: > Hi, > > I am using SolrMeter for load/stress testing solr performance. > Tomcat is configured

Re: SOLR: getting documents in the given order

2014-06-03 Thread Siegfried Goeschl
Assuming that you just want to sort - have you tried using sort=id desc Cheers, Siegfried Goeschl On 04 Jun 2014, at 06:19, sachinpkale wrote: > I have a following field in SOLR schema. > > > required="false" multiValued="false"/> > > If I issue

iText hitting infinite loop - Was Re: pdfs

2014-06-02 Thread Siegfried Goeschl
d by * Apache PDFBox 1.8.4 onwards * Apache Tika 1.5 * Apache SOLR 4.8 Cheers, Siegfried Goeschl On 26.05.14 18:20, Erick Erickson wrote: Brian: Yeah, if you can share the PDF that would be great. Parsing via Tika should not bring down Solr, although I supposed there could be something in Tika

Re: ExtractingRequestHandler indexing zip files

2014-05-27 Thread Siegfried Goeschl
Hi Sergio, your either do the stuff on the caller side (which is probably a good idea since you are off-load the SOLR server) or extend the ExtractingRequestHandler Cheers, Siegfried Goeschl On 27 May 2014, at 10:37, marotosg wrote: > Hi, > > Thanks for your answer Alexandre. >

Re: SolrCloud Nodes autoSoftCommit and (temporary) missing documents

2014-05-25 Thread Siegfried Goeschl
Hi folks, I think that the timestamp should be rounded down to a minute (or whatever) to avoid trashing the filter query cache Cheers, Siegfried Goeschl On 25 May 2014, at 18:19, Steve McKay wrote: > Solr can add the filter for you: > > > >timestamp:[*

Re: pdfs

2014-05-25 Thread Siegfried Goeschl
Sorry typo :- can you send me the PDF by email directly :-) Siegfried Goeschl On 25 May 2014, at 10:06, Siegfried Goeschl wrote: > Hi Brian, > > can you send me the email? I would like to play around :-) > > Have you opened a JIRA for PdfBox? If not I willl open one if I can r

Re: pdfs

2014-05-25 Thread Siegfried Goeschl
Hi Brian, can you send me the email? I would like to play around :-) Have you opened a JIRA for PdfBox? If not I willl open one if I can reproduce the issue … Thanks in advance Siegfried Goeschl On 25 May 2014, at 04:18, Brian McDowell wrote: > Our feeding (indexing) tool halts beca

Re: pdfs

2014-05-22 Thread Siegfried Goeschl
g the document extraction stuff out of SOLR * provide monitoring and recovery and stuck document extractions ** killing worker threads ** using external processed and kill them when spinning out of control Cheers, Siegfried Goeschl On 22.05.14 06:46, Jack Krupansky wrote: Yeah, PDF extractio

Re: Indexing PDF in Apache Solr 4.8.0 - Problem.

2014-05-12 Thread Siegfried Goeschl
Hi Vignesh, can you check your SOLR Server Log?! Not all PDF documents on this planet can be processed using Tikka :-) Cheers, Siegfried Goeschl On 07 May 2014, at 09:40, vignesh wrote: > Dear Team, > > I am Vignesh using the latest version 4.8.0 Apache Solr and am >

Re: Export big extract from Solr to [My]SQL

2014-05-02 Thread Siegfried Goeschl
Hi Per, basically I see three options * use a lot of memory to scope with huge result sets * user result set paging * SOLR 4.7 supports cursors (https://issues.apache.org/jira/browse/SOLR-5463) Cheers, Siegfried Goeschl On 02.05.14 13:32, Per Steffensen wrote: Hi I want to make extracts

Re: Having trouble with German compound words in Solr 4.7

2014-04-24 Thread Siegfried Goeschl
ies I do not tinker with DictionaryCompoundWordTokenFilterFactory in the "query" phase of the field so the following queries would work with the indexed word "Leinenhose" * "leinenhosen" * "leinenhose" * "leinen hose" * "leinen hosen"

Re: Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Siegfried Goeschl
actually executed * one thing I always do for prototyping is setting up the Solritas GUI using the same query handler as the application server Cheers, Siegfried Goeschl On 18 Apr 2014, at 06:06, Alistair wrote: > Hey Jack, > > thanks for the reply. I added autoGeneratePhraseQueries=

Re: No route to host

2014-04-09 Thread Siegfried Goeschl
Hi folks, the URL looks wrong (misconfigured) http://:8080/solr/collection1 Cheers, Siegfried Goeschl On 09 Apr 2014, at 14:28, Rallavagu wrote: > All, > > I see the following error in the log file. The host that it is trying to find > is itself. Wondering if anybody expe

Re: Anyone going to ApacheCon in Denver next week?

2014-04-06 Thread Siegfried Goeschl
Hi folks, I’m already here and would love to join :-) Cheers, Siegfried Goeschl On 05 Apr 2014, at 20:43, Doug Turnbull wrote: > I'll be there. I'd love to meet up. Let me know! > > Sent from my Windows Phone From: William Bell > Sent: 4/5/2014 10:40 PM > To: so

Re: Apache Solr.

2014-02-03 Thread Siegfried Goeschl
Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Siegfried Goeschl
Hi ALex, in my case * ignorance that Tomcat is not fully supported * Tomcat configuration and operations know-how inhouse * could migrate to Jetty but need approved change request to do so Cheers, Siegfried Goeschl On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here

Re: how to debug my own analyzer in solr

2013-10-21 Thread Siegfried Goeschl
Thread Dump and/or Remote Debugging?! Cheers, Siegfried Goeschl On 21.10.13 11:58, Mingzhu Gao wrote: More information about this , the custom analyzer just implement "createComponents" of Analyzer. And my configure in schema.xml is just something like : From the log I

Re: solr 4.4 config trouble

2013-09-30 Thread Siegfried Goeschl
Hi Marc, what exactly is not working - no obvious problemsin the logs as as I see Cheers, Siegfried Goeschl Am 30.09.2013 um 11:44 schrieb Marc des Garets : > Hi, > > I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't > get it to work. If someon

Re: how to suppress result

2008-04-07 Thread Siegfried Goeschl
Hi Evgeniy +) delete the documents if you really don't need need them +) create a field "ignored" and build an appropriate query to exclude the documents where 'ignored' is true Cheers, Siegfried Goeschl Evgeniy Strokin wrote: Hello,.. I have odd problem. I use Sol

Re: Can We append a field to the response that is not in the index but computed at runtime.

2008-03-31 Thread Siegfried Goeschl
code) Cheers, Siegfried Goeschl Umar Shah wrote: On Mon, Mar 31, 2008 at 7:38 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: Two approaches: 1. make a map and add it to the response: rb.rsp.add( "mystuff", mymap ); I tried using both Map/ NamedList it appends to t

Re: Solr interprets UTF-8 as ISO-8859-1

2008-03-31 Thread Siegfried Goeschl
oding(); if (null == encoding) { // Set your default encoding here request.setCharacterEncoding("UTF-8"); } else { request.setCharacterEncoding(encoding); } super.doFilter(request, response, chain); } } Cheers, Siegfried Goeschl Daniel Löfquist wrote:

Re: Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-28 Thread Siegfried Goeschl
Hi Noberto, JAMon is all about aggregating statistical data and displaying the information for a web browser - the main beauty is that it is easy to define what you are monitoring such as querying domain objects per customer. Cheers, Siegfried Goeschl Norberto Meijome wrote: On Tue, 27

Combining SOLR and JAMon to monitor query execution times from a browser

2007-11-27 Thread Siegfried Goeschl
cess to the access.log from a web browser +) a small presentation can be found at http://people.apache.org/~sgoeschl/presentations/jamon-20070717.pdf +) if it is of general I can rewrite the code as contribution Cheers, Siegfried Goeschl

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Siegfried Goeschl
abase import service running within the servlet container to avoid writing out the data to the file system and transmitting it over HTTP. +) I think there were some discussion regarding a generic database importer but nothing I'm aware of Cheers, Siegfried Goeschl Kevin Holmes

Re: Need question to configure Log4j for solr

2007-07-13 Thread Siegfried Goeschl
Hi Ken, and we stopped using Resin's support for daily rolling log files since it blocks the server for 20 minutes when rotating a 20 GB logfile - please don't ask what we are doing with the daily 20 GB ... :-( Cheers, Siegfried Goeschl Ken Krugler wrote: : the troubles come

Re: Need question to configure Log4j for solr

2007-07-12 Thread Siegfried Goeschl
Hi Erik, the troubles comes when you integrate third-party stuff depending on log4j (as I currently do). Having said this you have a strong point when looking at http://www.qos.ch/logging/classloader.jsp Cheers, Siegfried Goeschl Erik Hatcher wrote: On Jul 12, 2007, at 9:03 AM, Siegfried

Re: Need question to configure Log4j for solr

2007-07-12 Thread Siegfried Goeschl
Hi folks, would be using commons-logging an improvement? It is a common requirement to hook up different logging infrastructure .. Cheers, Siegfried Goeschl Erik Hatcher wrote: On Jul 11, 2007, at 9:07 PM, solruser wrote: How do I configure solr to use log4j logging. I am able to

Re: How to use bit fields to narrow a search

2007-06-26 Thread Siegfried Goeschl
Hi Yonik, looks intersting - I give it a try Cheers, Siegfried Goeschl Yonik Seeley wrote: On 6/26/07, Siegfried Goeschl <[EMAIL PROTECTED]> wrote: Hi folks, I'm currently evaluating SOLR to implement fulltext search and within 8 hours I have my content imported and able t

How to use bit fields to narrow a search

2007-06-26 Thread Siegfried Goeschl
rameters Any suggestions/ideas where to add this processing within SOLR ... Thanks in advance Siegfried Goeschl