Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Peter Karich
solr 1.4.x uses 2.9.x of lucene you could try the trunk which uses lucene 3.0.3 and should be compatible if I'm correct Regards, Peter. > I have the exact opposite problem where Luke won't even load the index but > Solr starts fine. I believe there are major differences between the two > index

Re: Search for social networking sites

2011-01-21 Thread Peter Karich
First, its more Solandra now (although the project is still named lucandra) ;) Second, it can help because data which is written to the index is immediately (configurable) available for search. solandra is distributed + real time solr, with no changes required on client side (be it SolrJ or othe

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Peter Karich
Am 18.01.2011 22:33, schrieb Steven A Rowe: >> [] ASF Mirrors (linked in our release announcements or via the Lucene >> website) >> >> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) >> >> [x] I/we build them from source via an SVN/Git checkout.

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Peter Karich
take a look also into icu4j which is one of the contrib projects ... > converting on the fly is not supported by Solr but should be relative > easy in Java. > Also scanning is relative simple (accept only a range). Detection too: > http://www.mozilla.org/projects/intl/chardet.html > >> We've crea

Exciting Solr Use Cases

2011-01-12 Thread Peter Karich
Hi all! Would you mind to write about your Solr project if it has an uncommon approach or if it is somehow exciting? I would like to extend my list for a new blog post. Examples I have in mind at the moment are: loggly (real time + big index), solandra (nice solr + cassandra combination), haiti

Re: verifying that an index contains ONLY utf-8

2011-01-12 Thread Peter Karich
converting on the fly is not supported by Solr but should be relative easy in Java. Also scanning is relative simple (accept only a range). Detection too: http://www.mozilla.org/projects/intl/chardet.html > We've created an index from a number of different documents that are > supplied by third p

Re: Input raw log file

2011-01-12 Thread Peter Karich
Dinesh, it will stay 'real time' even if you convert it. Converting should be done in the millisecond range if at all measureable (e.g. if you apply streaming). Beware: To use the real features you'll need the latest trunk of solr IMHO. I've done similar log-feeding stuff here (with code!): http

Re: Luke for inspecting indexes on remote solr servers?

2011-01-04 Thread Peter Karich
Am 04.01.2011 21:43, schrieb Ahmet Arslan: >> Is that supported? Pointer(s) >> to how to do it? > perhaps http://wiki.apache.org/solr/LukeRequestHandler ? or via ssh u...@host -X ;-)

Re: Removing deleted terms from spellchecker index

2010-12-29 Thread Peter Karich
how did you remove the term? In the spellcheck file? did you rebuild the spellcheck index? Regards, Peter. > Hi, > > I have configured spellchecker in solrconfig.xml and it is working fine for > existing terms. However, if i delete a term, it is still being returned as a > suggestion from the sp

Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) > 0 AND other_criteria

2010-12-22 Thread Peter Karich
facets=true&facet.field=field // SELECT count(distinct(field)) &fq=field:[* TO *] // WHERE length(field) > 0 &q=other_criteriaA&fq=other_criteriaB// AND other_criteria advantage: you can look into several fields at one time when adding another facet.field disadvantage: you get the counts splitte

Re: White space in facet values

2010-12-22 Thread Peter Karich
you should try fq=Product:"Electric Guitar" > How do I handle facet values that contain whitespace? Say I have a field > "Product" that I want to facet on. A value for "Product" could be "Electric > Guitar". How should I handle the white space in "Electric Guitar" during > indexing? What abo

Re: Solr (and mabye Java?) version numbering systems

2010-12-17 Thread Peter Karich
the current stable release is 1.4.1 (before there was 1.4) it has nothing todo with java's version numbers! (own release cycle) the next release will be 3.x: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ and then 4.x (current trunk): https://svn.apache.org/repos/asf/lucene/dev

Re: Rebuild Spellchecker based on cron expression

2010-12-13 Thread Peter Karich
Building on optimize is not possible as index optimization is done on the master and the slaves don't even run an optimize but only fetch the optimized index. isn't the spellcheck index replicated to the slaves too? -- http://jetwick.com open twitter search

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Peter Karich
tive should also be ok. Thanks in advanced ____ From: Peter Karich To: solr-user@lucene.apache.org Sent: Tue, December 7, 2010 2:06:49 PM Subject: Re: Solr& JVM performance issue after 2 days Hi Hamid, try to avoid autowarming when indexing (see solrcon

Re: Solr & JVM performance issue after 2 days

2010-12-07 Thread Peter Karich
Hi Hamid, try to avoid autowarming when indexing (see solrconfig.xml: caches->autowarm + newSearcher + maxSearcher). If you need to query and indexing at the same time, then probably you'll need one read-only core and one for writing with no autowarming configured. See: http://wiki.apache.or

Re: Taxonomy and Faceting

2010-12-06 Thread Peter Karich
I'm unsure but maybe you mean something like clustering? Then carrot^2 can do this (at index time I think): http://search.carrot2.org/stable/search?query=jetwick&view=visu (There is a plugin for solr) Or do you already know the categories of your docs. E.g. you already have a category tree and

Re: How to get all the search results?

2010-12-06 Thread Peter Karich
for dismax just pass an empty query all q= or none at all Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr User wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related

Re: Solr Got Exceptions When "schema.xml" is Changed

2010-12-04 Thread Peter Karich
QueryElevationComponent requires the schema to have a uniqueKeyFie ld implemented using StrField you should use the type StrField ('string') for the field used in

Re: Restrict access to localhost

2010-12-02 Thread Peter Karich
for 1) use the tomcat configuration in conf/server.xml address="127.0.0.1" port="8080" ... for 2) if they have direct access to solr either insert a middleware layer or create a write lock ;-) Hello all, 1) I want to restrict access to Solr only in localhost. How to acheive that? 2) If i wan

Re: entire farm fails at the same time with OOM issues

2010-12-01 Thread Peter Karich
also try to minimize maxWarming searchers to 1(?) or 2. And decrease cache usage (especially autowarming) if possible at all. But again: only if it doesn't affect performance ... Regards, Peter. On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersen wrote: My question is this. Why in the world

Re: distributed architecture

2010-12-01 Thread Peter Karich
Hi, also take a look at solandra: https://github.com/tjake/Lucandra/tree/solandra I don't have it in prod yet but regarding administration overhead it looks very promising. And you'll get some other neat features like (soft) real time, for free. So its same like A) + C) + X) - Y) ;-) Rega

Re: SOLR for Log analysis feasibility

2010-11-30 Thread Peter Karich
take a look into this: http://vimeo.com/16102543 for that amount of data it isn't that easy :-) We are looking into building a reporting feature and investigating solutions which will allow us to search though our logs for downloads, searches and view history. Each log item is relatively smal

Re: How to generate tag cloud in SOLR?

2010-11-23 Thread Peter Karich
Hi, another way is to use facets for the tagcloud as we did it in jetwick. Every document then needs a tag field (multivalued). See: https://github.com/karussell/Jetwick/blob/master/src/main/java/de/jetwick/ui/TagCloudPanel.java for an example with wicket and SolrJ. With that you could also

Jetwick Twitter Search now Open Source

2010-11-22 Thread Peter Karich
Jetwick is now available under the Apache 2 license: http://www.pannous.info/2010/11/jetwick-is-now-open-source/ Regards, Peter. PS: features http://www.pannous.info/products/jetwick-twitter-search/ installation https://github.com/karussell/Jetwick/wiki for devs http://karussell.wordpress.com/

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-19 Thread Peter Karich
Hi, the final solution is explained here in context: http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e " /If you are using Solr branch_3x or trunk, you can turn this off, by setting autoGeneratePhraseQueries to fa

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Peter, I recently had this issue, and I had to set splitOnCaseChange="0" to keep the word delimiter filter from doing what you describe. Can you try that and see if it helps? - Ken Hi Ken, yes this would solve my problem, but then I would lost a match for 'SuperMario' if I query 'mario', r

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? Regards, Peter. Hi, Please add preserveOriginal="1" to your WDF [1] definition and reindex (or just try with the analysis page). This will make

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich
y? We used our own round robin code because it was pre-Solr Cloud... I'm not too familiar with them, but I believe it's certainly worth having a look at Solr Cloud or Katta - could be useful here in dynamically allocating shards. Peter On Thu, Nov 18, 2010 at 5:41 PM, Peter Karich wrote:

WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, I am going crazy but which config is necessary to include the missing doc 2? I have: doc1 tw:aBc doc2 tw:abc Now a query "aBc" returns only doc 1 although when I try doc2 from admin/analysis.jsp then the term text 'abc' of the index gets highlighted as intended. I even indexed a simple ex

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich
Hi Peter! * I believe the NRT patches are included in the 4.x trunk. I don't think there's any support as yet in 3x (uses features in Lucene 3.0). I'll investage how much effort it is to update to solr4 * For merging, I'm talking about commits/writes. If you merge while commits are going on

Re: Spell-Check Component Functionality

2010-11-18 Thread Peter Karich
Hi Rajani, some notes: * try spellcheck.q=curst or completely without spellcheck.q but with q * compared to the normal q parameter spellcheck.q can have a different analyzer/tokenizer and is used if present * do not do spellcheck.build=true for every request (creating the spellcheck index c

Re: Solr context search

2010-11-17 Thread Peter Karich
take a look if the 'more like this' handler can solve your problem. Hi. I wonder is it possible in built-in way to make context search in Solr? I have about 50k documents (mainly 'name' of char(150)), so i receive a content of a page and should show found documents. Of course i can

Re: sort desc and out of memory exception

2010-11-17 Thread Peter Karich
You are applying the sort against a (tokenized) text field? You should better sort against a number or a string. Probably using the copyField directive. Regards, Peter. hi all: I configure a solr application and there is a field of type text,and some kind like this 123456, that is a string

Re: Possibilities of (near) real time search with solr

2010-11-16 Thread Peter Karich
'old' and then moving that data in an efficient manner to a read-only shard. Using SolrJ can help in this regard as it can offload some of the administration from the server(s). Thanks, Peter On Mon, Nov 15, 2010 at 8:06 PM, Peter Karich wrote: Hi, I wanted to provide my indexed do

Re: encoding messy code

2010-11-16 Thread Peter Karich
Am 16.11.2010 07:25, schrieb xu cheng: hi all: I configure an app with solr to index documents and there are some Chinese content in the documents and I've configure the apache tomcat URIEncoding to be utf-8 and I use the program curl to sent the documents in xml format however , when I query th

Re: Tuning Solr caches with high commit rates (NRT)

2010-11-15 Thread Peter Karich
a to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Peter Karich To: solr-user@lucene.apache.org Sent:

Re: Tuning Solr caches with high commit rates (NRT)

2010-11-15 Thread Peter Karich
d significantly in 1.4, and generally no longer takes a lot of memory -- for facets with "many" unique values, method fc in fact should take less than enum, I think? Peter Karich wrote: Just in case someone is interested: I put the emails of Peter Sturge with some minor

Re: Tuning Solr caches with high commit rates (NRT)

2010-11-15 Thread Peter Karich
Just in case someone is interested: I put the emails of Peter Sturge with some minor edits in the wiki: http://wiki.apache.org/solr/NearRealtimeSearchTuning I found myself search the thread again and again ;-) Feel free to add and edit content! Regards, Peter. Hi Erik, I thought this woul

Possibilities of (near) real time search with solr

2010-11-15 Thread Peter Karich
Hi, I wanted to provide my indexed docs (tweets) relative fast: so 1 to 10 sec or even 30 sec would be ok. At the moment I am using the read only core scenario described here (point 5)* with a commit frequency of 180 seconds which was fine until some days. (I am using solr1.4.1) Now the time

Re: How to Facet on a price range

2010-11-05 Thread Peter Karich
take a look here http://stackoverflow.com/questions/33956/how-to-get-facet-ranges-in-solr-results I am able to facet on a particular field because I have index on that field. But I am not sure how to facet on a price range when I have the exact price in the 'price' field. Can anyone help here.

Re: Testing/packaging question

2010-11-04 Thread Peter Karich
Hi, don't know if the python package provides one but solrj offers to start solr embedded (|EmbeddedSolrServer|) and setting up different schema + config is possible. for this see: https://karussell.wordpress.com/2010/06/10/how-to-test-apache-solrj/ if you need an 'external solr' (via jetty a

Re: Optimize Index

2010-11-04 Thread Peter Karich
what you can try maxSegments=2 or more as a 'partial' optimize: "If the index is so large that optimizes are taking longer than desired or using more disk space during optimization than you can spare, consider adding the maxSegments parameter to the optimize command. In the XML message, this

Re: Using setStart in solrj

2010-11-04 Thread Peter Karich
Hi Ron, how do I know what the starting row Always 0. especially if the original SolrQuery object has them all thats the point. solr will normally cache it for you. This is your friend: 40 just try it first with http to get an impression what start is good for: it just sets the starti

Re: Which is faster -- delete or update?

2010-11-01 Thread Peter Karich
From the user perspective I wouldn't delete it, because it could be that down-voting by mistake or spam or something and up-voting can resurrect it. It could be also wise to keep the docs to see which content (from which users?) are down voted to get spam accounts? From the dev perspective yo

Re: problem of solr replcation's speed

2010-10-31 Thread Peter Karich
we have an identical-sized index and it takes ~5minutes It takes about one hour to replacate 6G index for solr in my env. But my network can transfer file about 10-20M/s using scp. So solr's http replcation is too slow, it's normal or I do something wrong?

Feeding Solr with its own Logs

2010-10-27 Thread Peter Karich
In case someone is interested: http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/ a lot of TODOs but: it is working. I could also imagine that this kind of example would be suited for an intro-tutorial, because it covers dynamic fields, rapid solr prototyping, filter and

After java replication: field not found exception on slaves

2010-10-26 Thread Peter Karich
Hi, we had the following problem. We added a field to schema.xml and fed our master with the new data. After that querying on the master is fine. But when we replicated (solr1.4.0) to our slaves. All slaves said they cannot find the new field (standard exception for missing fields). And that a

Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Peter Karich
Hi, See this: http://wiki.apache.org/solr/CoreAdmin#RELOAD Solr will also load the new configuration (without restart the webapp) on the slaves when using replication: http://wiki.apache.org/solr/SolrReplication Regards, Peter. Hi Everybody, If I change my schema.xml to, do I have to rest

Re: command line to check if Solr is up running

2010-10-26 Thread Peter Karich
Hi Xin, from the wiki: http://wiki.apache.org/solr/SolrConfigXml The URL of the "ping" query is* /admin/ping * You can also check (via wget) the number of documents. it might look like a rusty hack but it works for me: wget -T 1 -q "http://localhost:8080/solr/select?q=*:*"; -O - | tr '/>'

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Peter Karich
I asked this myself ... here could be some pointers: http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-in-Single-Core-td475238.html > Hi everyone, > > I'm trying to write some code for creating and using multi cores

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Peter Karich
Hi, you can try to parse the xml via Java yourself and then push the SolrInputDocuments it via SolrJ to solr. setting format to binaray + using the streaming update processor should improve performance, but I am not sure... and performant (+less mem!) reading xml in Java is another topic ... ;-)

Re: weighted facets

2010-10-15 Thread Peter Karich
Hi, answering my own question(s). Result grouping could be the solution as I explained here: https://issues.apache.org/jira/browse/SOLR-385 > http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf (the file is dated to Aug > 2008) yonik implemented this here: https://issues.apache.org/jira/browse

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-14 Thread Peter Karich
just a blind shot (didn't read the full thread): what is your maxWarmingSearchers settings? For large indices we set it to 2 (maximum) Regards, Peter. > just update on this issue... > > we turned off the new/first searchers (upgrade to Solr 1.4.1), and ran > benchmark tests, there is no noticeabl

Re: using score to find high confidence duplicates

2010-10-13 Thread Peter Karich
Hi, are you using moreLikeThis for that feature? I have no suggestion for a reliable threshold, I think this depends on the domain you are operating and is IMO only solvable with a heuristic. It also depends on fields, boosts, ... It could be that there is a 'score gap' between duplicates and none

Re: About setting solrconfig.xml

2010-10-13 Thread Peter Karich
Hi Jason, > Hi, all. > I got some question about solrconfig.xml. > I have 10 fields in a document for index. > (Suppose that field names are f1, f2, ... , f10.) > Some user will want to search in field f1 and f5. > Another user will want to search in field f2, f3 and f7. > > I am going to use dism

Re: NPE for a MLT query on a missing doc due to null facet_counts in solrj

2010-10-13 Thread Peter Karich
> Should I create a JIRA ticket? already there: https://issues.apache.org/jira/browse/SOLR-2005 we should provide a patch though ... Regards, Peter. > With solrj doing a more like this query for a missing document: > /mlt?q=docId:SomeMissingId > always throws a null pointer exception: > Ca

Re: Replication and CPU

2010-10-12 Thread Peter Karich
Hi Olivier, the index size is relative big and you enabled replication after startup: startup This could explain why the slave is replicating from the very beginning. Are the index versions/generations the same? (via command or admin/replication) If not, the slaves tries to replicate and if that

Re: Replication and CPU

2010-10-12 Thread Peter Karich
Hi Olivier, maybe the slave replicates after startup? check replication status here: http://localhost/solr/admin/replication/index.jsp what is your poll frequency (could you paste the replication part)? Regards, Peter. > Hello, > > I setup a server for the replication of Solr. I used 2 cores an

Re: StatsComponent and multi-valued fields

2010-10-12 Thread Peter Karich
I'm not sure ... just reading it yesterday night ... but isn't the unapplied patch from Harish https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch what you want? Regards, Peter. > Running 1.4.1. > > I'm able to execute stats queries against multi-valued fields, but when > giv

weighted facets

2010-10-11 Thread Peter Karich
Hi, I need a feature which is well explained from Mr Goll at this site ** So, it then would be nice to do sth. like: facet.stats=sum(fieldX)&facet.stats.sort=fieldX And the output (sorted against the sum-output) can look sth. like this: 767 892 Is there something similar or wa

Re: multi level faceting

2010-10-09 Thread Peter Karich
Hi, there are two relative similar solutions for this problem. I will describe one of them: * create a multivalued string field called 'category' * you have a category tree. so make sure a document gets not only the leaf category, but all categories (name or id) until the root * now facet over

Re: multi level faceting

2010-10-06 Thread Peter Karich
Hi, there is a solution without the patch. Here it should be explained: http://www.lucidimagination.com/blog/2010/08/11/stumped-with-solr-chris-hostetter-of-lucene-pmc-at-lucene-revolution/ If not, I will do on 9.10.2010 ;-) Regards, Peter. > I've a similar problem with a project I'm working on

Re: multi level faceting

2010-10-05 Thread Peter Karich
also take a look at: http://wiki.apache.org/solr/HierarchicalFaceting + SOLR-64, SOLR-792 + http://markmail.org/message/jxbw2m5a6zq5jhlp Regards, Peter. > Take a look at "Mastering the Power of Faceted Search with Chris > Hostetter" > (http://www.lucidimagination.com/solutions/webcasts/faceting).

Re: Best way to check Solr index for completeness

2010-09-29 Thread Peter Karich
How long does it take to get 1000 docs? Why not ensure this while indexing? I think besides your suggestion or the suggestion of Luke there is no other way... Regards, Peter. > Hello, > What would be the best way to check Solr index against original system > (Database) to make sure index is up t

Re: Autocomplete: match words anywhere in the token

2010-09-24 Thread Peter Karich
Jonathan, this field described here from Chantal: > 2.) create an additional field that stores uses the > String type with the same content (use copy field to fill either) can be multivalued. Or what did you mean? BTW: The nice thing about facet.prefix is that you can add an arbitrary (filter)

Re: Help: java.lang.OutOfMemoryError: PermGen space

2010-09-20 Thread Peter Karich
see http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error and the links there. There seems to be no good solution :-/ The only reliable solution is restart, before you haven't enough permgenspace (use jvisualvm to monitor) And try to increase -XX:

Re: Full text search in facet scope

2010-09-16 Thread Peter Karich
Hi, if you index your doc with text='operating system' with an additional keyword field='linux' (of type string, can be multivalued) then solr facetting should be what you want: solr/select?q=*:*&facet=true&facet.field=keyword&rows=10 or rows=0 depending on your needs Does this help? Regards, P

Re: Solr for statistical data

2010-09-16 Thread Peter Karich
Hi Kjetil, is this custom component (which performes groub by + calcs stats) somewhere available? I would like to do something similar. Would you mind to share if it isn't already available? The grouping stuff sounds similar to https://issues.apache.org/jira/browse/SOLR-236 where you can have me

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-14 Thread Peter Karich
Peter Sturge, this was a nice hint, thanks again! If you are here in Germany anytime I can invite you to a beer or an apfelschorle ! :-) I only needed to change the lockType to none in the solrconfig.xml, disable the replication and set the data dir to the master data dir! Regards, Peter Karich

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-14 Thread Peter Karich
> > 2. We use sharding all the time, and it works just fine with this > scenario, as the RO instance is simply another shard in the pack. > > > On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich wrote: > >> Peter, >> >> thanks a lot for your in-depth explanations

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-12 Thread Peter Karich
Peter, thanks a lot for your in-depth explanations! Your findings will be definitely helpful for my next performance improvement tests :-) Two questions: 1. How would I do that: > or a local read-only instance that reads the same core as the indexing > instance (for the latter, you'll need som

Re: Autocomplete with Filter Query

2010-09-10 Thread Peter Karich
Hi there, I don't know if my idea is perfect but it seems to work ok in my twitter-search prototype: http://www.jetwick.com (keep in mind it is a vhost and only one fat index, no sharding, etc... so performance isn't perfect ;-)) That said, type in 'so' and you will get 'soldier', 'solar', ... bu

Re: How to enable Unicode Support in Solr

2010-09-06 Thread Peter Karich
Hi, Solr is only able to handle unicode (UTF-8). Make really sure that you push it into the index in the correct encoding. See my (accepted ;-)) answer: http://stackoverflow.com/questions/3086367/how-to-view-the-xml-documents-sent-to-solr/3088515#3088515 Regards, Peter. > I have an index that

Re: Purpose of SolrDocument.java

2010-09-03 Thread Peter Karich
> aaah okay. > > so its SolrDocument in "normal" search never been used ? its only for other > solr-plugins ? > SolrDocument is under org.apache.solr.common which is for the solr-solj.jar and not available for the solr-core.jar see e.g.: http://lucene.apache.org/solr/api/org/apache/solr/commo

Re: Purpose of SolrDocument.java

2010-09-03 Thread Peter Karich
Hi, you can use it via SolrJ: QueryResponse rsp = solrServer.query(query); SolrDocumentList docs = rsp.getResults(); for (SolrDocument doc : docs) { long id = (Long) doc.getFieldValue("id"); // create your higher level object here ... } SolrJ get the docs either from xml or binary stre

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Peter Karich
Hi, that issue is not really related to solr. See this: http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error Increasing maxpermsize -XX:MaxPermSize=128m does not really solve this issue but you will see less errros :-) I have written a mini mon

Re: solr working...

2010-08-26 Thread Peter Karich
Hi! What do you mean? You want a quickstart? Then see http://lucene.apache.org/solr/tutorial.html (But I thought you already got solr working (from previous threads)!?) Or do you want to know if solr is running? Then try the admin view: http://localhost:8080/solr/admin/ Regards, Peter. > Hi al

Re: solr

2010-08-21 Thread Peter Karich
Hi Ankita, first: thanks for trying apache solr. > does all the data to be indexed has to be in exampledocs folder? No. And there are several ways to push data into solr: via indexing, dataimporthandler, solrj, ... I know that getting comfortable with a new project is a bit complicated at first

Re: queryResultCache has no hits for date boost function

2010-08-18 Thread Peter Karich
forget to say: thanks again! Now the cache gets hits! Regards, Peter. > On Wed, Aug 18, 2010 at 4:34 PM, Peter Karich wrote: > >> Thanks a lot Yonik! Rounding makes sense. >> Is there a date math for the 'LAST_COMMIT'? >> > No - but it

Re: queryResultCache has no hits for date boost function

2010-08-18 Thread Peter Karich
Hi Yonik, would you point me to the Java classes where solr handles a commit or an optimize and then the date math definitions? Regards, Peter. > On Wed, Aug 18, 2010 at 4:34 PM, Peter Karich wrote: > >> Thanks a lot Yonik! Rounding makes sense. >> Is there a date math fo

Re: queryResultCache has no hits for date boost function

2010-08-18 Thread Peter Karich
Thanks a lot Yonik! Rounding makes sense. Is there a date math for the 'LAST_COMMIT'? Peter. > On Tue, Aug 17, 2010 at 6:29 PM, Peter Karich wrote: > >> my queryResultCache has no hits. But if I am removing one line from the >> bf section in my dismax handler a

queryResultCache has no hits for date boost function

2010-08-17 Thread Peter Karich
Hi all, my queryResultCache has no hits. But if I am removing one line from the bf section in my dismax handler all is fine. Here is the line: recip(ms(NOW,date),3.16e-11,1,1) According to http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents this should be fi

Re: Search document design problem

2010-08-17 Thread Peter Karich
> multiple SQL queries joining multiple tables each time a user changes > the search criteria, so the load of DB servers is quite high. > > I would like to use Solr for the search form because after some test > the Lucene seams to be really very fast and I believe it would improve &g

Re: OutOfMemoryErrors

2010-08-17 Thread Peter Karich
R : JAVA > HEAP SPACE , OUT OF MEMORY ERROR ) > I could see one lock file generated in the data/index path just after this > error. > > > > On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich wrote: > > >> >>> Is there a way to verify that I have added

Re: Search document design problem

2010-08-17 Thread Peter Karich
Hi Wenca, I am not sure wether my information here is really helpful for you, sorry if not ;-) > I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. you should tell us what you want to search and filter? Do you want only

Re: OutOfMemoryErrors

2010-08-17 Thread Peter Karich
> Is there a way to verify that I have added correctlly? > on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.

Re: Analysing SOLR logfiles

2010-08-12 Thread Peter Karich
I wonder too, that there shouldn't be a special tool which analyzes solr logfiles (e.g. parses qtime, the parameters q, fq, ...) Because there are some other open source log analyzers out there: http://yaala.org/ http://www.mrunix.net/webalizer/ Another free tool is newrelic.com (you will submit

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
re optimizing and refreshing >> every 10-15 minutes, that will invalidate all the caches, since an optimized >> index is essentially a set of new files. >> >> Can you give us some examples of the slow queries? Are you using stop >> words? >> >> If

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
ted and is in Solr 1.4. If you decide to use > CommonGrams you definitely need to re-index and you also need to use both the > index time filter and the query time filter. Your index will be larger. > > > > > > > > > > > > > > Tom >

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
Hi Robert! > Since the example given was "http" being slow, its worth mentioning that if > queries are "one word" urls [for example http://lucene.apache.org] these > will actually form slow phrase queries by default. > do you mean that http://lucene.apache.org will be split up into "http luce

Re: Improve Query Time For Large Index

2010-08-10 Thread Peter Karich
nd-common-words-part-2) > > Tom Burton-West > > -Original Message- > From: Peter Karich [mailto:peat...@yahoo.de] > Sent: Tuesday, August 10, 2010 9:54 AM > To: solr-user@lucene.apache.org > Subject: Improve Query Time For Large Index > > Hi, > > I have 5 Mil

Improve Query Time For Large Index

2010-08-10 Thread Peter Karich
Hi, I have 5 Million small documents/tweets (=> ~3GB) and the slave index replicates itself from master every 10-15 minutes, so the index is optimized before querying. We are using solr 1.4.1 (patched with SOLR-1624) via SolrJ. Now the search speed is slow >2s for common terms which hits more tha

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Peter Karich
>> The default solr solution is client side loadbalance. >> Is there a solution provide the server side loadbalance? >> >> >> > No. Most of us stick a HTTP load balancer in front of multiple Solr servers. > E.g. mod_jk is a very easy solution (maybe too simple/stupid?) for a load balancer

Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Peter Karich
Ophir, this sounds a bit strange: > CommonsHttpSolrServer.java, line 416 takes about 95% of the application's > total search time Is this only for heavy load? Some other things: * with lucene you accessed the indices with MultiSearcher in a LAN, right? * did you look into the logs of the se

Re: Solr Indexing slows down

2010-08-02 Thread Peter Karich
er > needs > to be reopened, and this happens on commit. > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > >> From: Peter Karich >> To: s

Re: Solr Indexing slows down

2010-07-30 Thread Peter Karich
e. > > 30-60 seconds is pretty frequent for Solr. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message > >> From: Peter Karich >> To: solr-u

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Peter Karich
Hi Peter :-), did you already try other values for hl.maxAnalyzedChars=2147483647 ? Also regular expression highlighting is more expensive, I think. What does the 'fuzzy' variable mean? If you use this to query via "~someTerm" instead "someTerm" then you should try the trunk of solr which is a l

Re: Programmatically retrieving numDocs (or any other statistic)

2010-07-30 Thread Peter Karich
Both approaches are ok, I think. (although I don't know the python API) BTW: If you query q=*:* then add rows=0 to avoid some traffic. Regards, Peter. > I want to programmatically retrieve the number of indexed documents. I.e., > get the value of numDocs. > > The only two ways I've come up with

Re: Solr Indexing slows down

2010-07-30 Thread Peter Karich
often you're committing. If you're committing before the warmup queries > from the previous commit have done their magic, you might be getting > into a death spiral. > > HTH > Erick > > On Thu, Jul 29, 2010 at 7:02 AM, Peter Karich wrote: > > >> Hi, >> &

Re: slave index is bigger than master index

2010-07-29 Thread Peter Karich
Hi Muneeb, I fear you'll have no chance: replicating an index will use more disc space on the slave nodes. Of course, you could minimize disc usage AFTER the replication via the 'optimize-hack'. But are you sure the reason for the slave-node die, is due to disc limitations? Try to observe the sla

  1   2   >