AW: WordDelimiterFilter combined with PositionFilter

2010-09-29 Thread Mathias Walter
Hi Robert, > On Fri, Sep 24, 2010 at 3:54 AM, Mathias Walter wrote: > > > Hi, > > > > I'm combined the WordDelimiterFilter with the PositionFilter to prevent the > > creation of expensive Phrase and MultiPhraseQueries. But > > if I now parse an escaped string consisting of two terms, the analyser

Shard Query in Solrsharp

2010-09-29 Thread Maddy.Jsh
Hi, I have been using solrsharp to integrate solr in my project. Everything was going fine until I tried to incorporate shard query. I tested the shard query using the browser and everything went fine. I tried to do the same in solrsharp by adding the following line queryBuilder.AddSearchParame

Where is the lock file?

2010-09-29 Thread Steve Cohen
Hello, We were testing nutch configurations and apparently we got heavy handed with our approach to stopping things. Now when nutch starts indexing solr, we are seeing these messages: org.apache.solr.common.SolrException: Lock obtain timed out: SingleInstanceLock: write.lock org.apache.lucene.st

How to get line numbers from Solr plugin to show up in stack trace

2010-09-29 Thread Gregory Solovyev
Hello, I am writing a clustering component for Solr. It registers, loads and works properly. However, whenever there is an exception inside my plugin, I cannot get tomcat to show me the line numbers. It always says "Unknown source" for my classes. The stack trace in tomcat shows line numbers for

Re: Why the query performance is so different for queries?

2010-09-29 Thread Walter Underwood
Stop running 32-bit operating systems. You'll never get good performance with a toy like that. --wunder On Sep 29, 2010, at 8:18 PM, newsam wrote: > Thanks for your reply. > > Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (>2G) may > not be helpful for JVM in 32bits box. T

Re: Why the query performance is so different for queries?

2010-09-29 Thread newsam
Thanks for your reply. Our box is win server 2003 (32bits) and 6G RAM totally. Large heap (>2G) may not be helpful for JVM in 32bits box. Therefore we set JAVA_OPTIONS to "-Xms521m -Xmx1400m". Is my understanding right? Thanks. >From: Lance Norskog >Reply-To: solr-user@lucene.apache.org >To:

Re: Why the query performance is so different for queries?

2010-09-29 Thread Lance Norskog
How much ram does the JVM have? Wildcard queries are slow. Starting with '*' are even slower. If you want all values try "field:[* TO *]". This is a range query and lets you pick a range of values- this picks everything. The "*:*" is not a wildcard. It is a magic syntax for "all documents" and do

Re: Is Solr right for my business situation ?

2010-09-29 Thread Lance Norskog
Some of these are big questions- try them in different emails. On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra wrote: > Some questions. > > 1. I have about 3-5 tables. Now designing schema.xml for a single table looks > ok, but whats the direction for handling multiple table structures is >

Re: Memory usage

2010-09-29 Thread Lance Norskog
How many documents are there? How many unique words are in a text field? Both of these numbers can have a non-linear effect on the amount of space used. But, usually a 22Gb index (on disk) might need 6-12G of ram total. There is something odd going on here. Lance On Wed, Sep 29, 2010 at 4:34 PM,

Re: Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Lance Norskog
This would be a Java VM option, not something Solr or other apps can know about. Using this or procset seems like a great way to handle it. On Wed, Sep 29, 2010 at 8:46 AM, Glen Newton wrote: > In a recent blog entry ("The MySQL “swap insanity” problem and the > effects of the NUMA architecture"

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Lance Norskog
Simple text .txt files and MS office .doc files are very very different beasts. You can do simple .txt files with some more lines in your DataImportHandler script. With DOC files it is easiest to use the extracting request handler */extract". This is on the wiki. If you want to do this inside the D

Re: Solr with example Jetty and score problem

2010-09-29 Thread Floyd Wu
Does anybody can help on this ? Many thanks 2010/9/29 Floyd Wu > Hi there > > I have a problem, the situation is when I issue a query to single instance, > Solr response XML like following > as you can see, the score is normal() > === > > > 0 > 23 > > _l_title,score > 0 >

DataImportHandler dynamic fields clarification

2010-09-29 Thread harrysmith
Looking for some clarification on DIH to make sure I am interpreting this correctly. I have a wide DB table, 100 columns. I'd rather not have to add 100 values in schema.xml and data-config.xml. I was under the impression that if the column name matched a dynamic Field name, it would be added. I

Memory usage

2010-09-29 Thread Jeff Moss
My server has 128GB of ram, the index is 22GB large. It seems the memory consumption goes up on every query and the garbage collector will never free up as much memory as I expect it to. The memory consumption looks like a curve, it eventually levels off but the old gen is always 60 or 70GB. I have

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
No, I am using xpath for html, this is not the question.  I am indexing pure text in addition to html that I was indexing.  Pure text like TXT file or Microsoft Word doc.  So, no xpath for TXT, how do I index TXT file into different fields in my index like the way I use xpath to index html into

Re: terms / stemming?

2010-09-29 Thread Erick Erickson
Yes, this is almost certainly stemming. Take a look at solr/admin, [schema browser], then click on Home>fields>>. Then the index and query "details" link shows you exactly what's happening. You can also get some joy from the admin [analysis] page. That takes input and shows you exactly what transf

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Erick Erickson
Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're using DIH against a filesystem, you could use two fileDataSo

Re: terms / stemming?

2010-09-29 Thread Luke Crouch
Make sure your index and query analyzers are identical, and pay special attention if you're using any of the http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemminganalyzers - many of them have a number of configurable attributes that could cause differences. -L On Wed, Sep 29, 2010

terms / stemming?

2010-09-29 Thread Peter A. Kirk
Hi I issue a request like the following, in order to get a list of search-terms in a particular field: http://localhost:8983/solr/terms?terms.limit=-1&terms.fl=bodytext But some of the terms which are returned are not quite the same as those which were indexed (or which are returned in a searc

Re: Dismax Request handler and Solrconfig.xml

2010-09-29 Thread Chris Hostetter
: In Solrconfig.xml, default request handler is set to "standard". I am : planning to change that to use dismax as the request handler but when I : set "default=true" for dismax - Solr does not return any results - I get : results only when I comment out "dismax". you need to elaborate on what yo

Re: Data Import Handler Rich Format Documents

2010-09-29 Thread Chris Hostetter
: What's a GA release? http://en.wikipedia.org/wiki/Software_release_life_cycle#General_availability -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!

Re: Solr rate limiting / DoS attacks

2010-09-29 Thread Shawn Heisey
I am using HAProxy for load balancing on my Solr installation, for redundancy. Very recently, request throttling (and by extension, DoS mitigation) was added to the development branch (1.5) of HAProxy. You could probably use that, even if you don't need actual load balancing. http://haproxy.

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
No, these new documents are not html, these are pure text, like the ones you see in notepad or Microsoft Word.  I have no problem indexing Html, but I got stuck with these pure text. From: Scott Gonyea To: solr-user@lucene.apache.org Sent: Wed, September 29,

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Scott Gonyea
Break your HTML pages into the desired fields, format it as follows: http://wiki.apache.org/solr/UpdateXmlMessages And away you go. You may want to search / review the Wiki. Also, if you're indexing websites and want to place it in Solr, you should look at Nutch. It can do all that work for yo

How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
Hi,   I am using xpath to index different parts of the html pages into different fields.  Now, I have some pure text documents that has no html.  So I can't use xpath.  How do I index these pure text into different fields of the index?  How do I make nutch/solr understand these different parts b

Re: Solr rate limiting / DoS attacks

2010-09-29 Thread Allistair Crossley
This kind of thing is not limited to Solr and you normally wouldn't solve it in software - it's more a network concern. I'd be looking at a web server solution such as Apache mod_evasive combined with a good firewall for more conventional DOS attacks. Just hide your Solr install behind the firew

Solr rate limiting / DoS attacks

2010-09-29 Thread Ian Upright
Hi, I'm curious as to what approaches one would take to defend against users attacking a Solr service, especially if exposed to the internet as opposed to an intranet. I'm fairly new to Solr, is there anything built in? Is there anything in place to prevent the search engine from getting overwhel

Issues with SolrJ and IndexReader reopening (again)

2010-09-29 Thread Antoniya Statelova
I saw there had been a previous discussion on commit failing for EmbeddedSolrServer here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg28236.html But it was never resolved. I have an embedded solr server and it does not seem to pick up changes in the index after a commit through Solr

Re: Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
Hi, For us this is a usability concern. You either don't show Sweden in a pick-list called Country and some users go away thinking you don't *ever* support Sweden (not true). OR you allow a user to execute an empty result search - but at least they know you do support Sweden. It is we believe

RE: Queries, Functions, and Params

2010-09-29 Thread Robert Thayer
Yes, just after sending the email I reread the wiki and noticed the 4.0 requirement. I will try that, thanks. From: ysee...@gmail.com on behalf of Yonik Seeley Sent: Wed 9/29/2010 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Queries, Functions, and Param

Re: Missing facet values for zero counts

2010-09-29 Thread kenf_nc
I don't understand why you would want to show Sweden if it isn't in the index, what will your UI do if the user selects Sweden? However, one way to handle this would be to make a second document type. Have a field called type or some such, and make the new document type be 'dummy' or 'system' or

Re: Is Solr right for my business situation ?

2010-09-29 Thread Erick Erickson
If at all possible, denormalize the data. Anytime you find yourself trying to make Solr behave like a database, the probability is high that you're mis-using Solr or the DB. Best Erick On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra < sraghven...@corelogic.com> wrote: > Some questions. > >

Re: Best way to check Solr index for completeness

2010-09-29 Thread Erick Erickson
Yep, I was thinking of this on a field. I was assuming that there was a PK in the database that you were mapping to the uniqueKey field, but if that's not so then it's more of a problem. But you'd have problems anyway if you *don't* have a uniqueKey when it comes time to update any records, so it

RE: Is Solr right for my business situation ?

2010-09-29 Thread Sharma, Raghvendra
Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as

Re: Best way to check Solr index for completeness

2010-09-29 Thread Walter Underwood
Think about what fields you need to return. For this, you probably only need the id. That could be a lot faster than the default set of fields. wunder On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote: > > Actually retrieving 1000 docs via search isn't that bad. Turned out it takes > under 1 sec.

Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy
Actually retrieving 1000 docs via search isn't that bad. Turned out it takes under 1 sec. I still like the idea of using TermComponent and will use it in the future if number of docs in the index will grow. Thanks for all suggestions. Dmitriy -- View this message in context: http://lucene.47206

Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Glen Newton
In a recent blog entry ("The MySQL “swap insanity” problem and the effects of the NUMA architecture" http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/), Jeremy Cole describes a particular but common problem with large memory installations of MySql on multi-core

Re: Queries, Functions, and Params

2010-09-29 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 6:08 PM, Robert Thayer wrote: > On the http://wiki.apache.org/solr/FunctionQuery page, the following query > function is listed: > > q={!func}add($v1,$v2)&v1=sqrt(popularity)&v2=100.0 > > When run against the default solr instance, server returns the error(400): > "undefi

Re: How to set up multiple indexes?

2010-09-29 Thread Luke Crouch
Check http://doc.ez.no/Extensions/eZ-Find/2.2/Advanced-Configuration/Using-multi-core-features It's for eZ-Find, but it's the basic setup for multiple cores in any environment. We have cores designed like so: solr/sfx/ solr/forum/ solr/mail/ solr/news/ solr/tracker/ each of those core directori

Re: How to set up multiple indexes?

2010-09-29 Thread Christopher Gross
Hi Andy! I configured this a few days ago, and found a good resource -- http://wiki.apache.org/solr/MultipleIndexes That page has links that will give you the instructions for setting up Tomcat, Jetty and Resin. I used the Tomcat ones the other day, and it gave me everything that I needed to get

Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy
Regenerating index is a slow operation due to limitation of the source systems. We run several complex SQL statements to generate 1 Solr document. Full reindex takes about 24 hours. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completenes

How to set up multiple indexes?

2010-09-29 Thread Andy
I installed Solr according to the tutorial. My schema.xml & solrconfig.xml is in ~/apache-solr-1.4.1/example/solr/conf Everything so far is just like that in the tutorial. But I want to set up a 2nd index (separate from the "main" index) just for the purpose of auto-complete. I understand that

Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy
Using TermComponent is an interesting suggestion. However my understanding it will work only for unique terms. For example compare database primary key with Solr id field. A variation of that is to calculate some kind of unique record hash and store it in the index.Then retrieve id and hash via T

Re: Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
OK good to know I'm not going bonkers :) Cheers On Sep 29, 2010, at 9:45 AM, Luke Crouch wrote: > We had to do the same thing - we draw our facet navigation links by looping > over the full result set from our database, and then we add the facet counts > and draw the link url's using the solr da

Re: Missing facet values for zero counts

2010-09-29 Thread Chantal Ackermann
Hi Allistair, On Wed, 2010-09-29 at 15:37 +0200, Allistair Crossley wrote: > Hello list, > > I am implementing a directory using Solr. The user is able to search with a > free-text query or 2 filters (provided as pick-lists) for country. A > directory entry only has one country. > > I am usin

Re: Missing facet values for zero counts

2010-09-29 Thread Luke Crouch
We had to do the same thing - we draw our facet navigation links by looping over the full result set from our database, and then we add the facet counts and draw the link url's using the solr data. -L On Wed, Sep 29, 2010 at 8:42 AM, Markus Jelsma wrote: > I'm afraid you'd have to add the missin

RE: Missing facet values for zero counts

2010-09-29 Thread Markus Jelsma
I'm afraid you'd have to add the missing countries in your application. If it's not in the index, it will not be returned. You last question is possible, the facet.query parameter allows you to rely on other conditions to generate a facet count. But if the missing countries are not in the index,

Missing facet values for zero counts

2010-09-29 Thread Allistair Crossley
Hello list, I am implementing a directory using Solr. The user is able to search with a free-text query or 2 filters (provided as pick-lists) for country. A directory entry only has one country. I am using Solr facets for country and I use the facet counts generated initially by a *:* search t

Re: deadlock in solrj?

2010-09-29 Thread Avi Rosenschein
This sounds like https://issues.apache.org/jira/browse/SOLR-1711. It is a known issue in Solr 1.4.0, which is apparently fixed in Solr 1.4.1. We also encountered it when indexing large numbers of documents with SolrJ, and are therefore in the process of upgrading to 1.4.1. -- Avi On Wed, Sep 29,

Re: Best way to check Solr index for completeness

2010-09-29 Thread Peter Karich
How long does it take to get 1000 docs? Why not ensure this while indexing? I think besides your suggestion or the suggestion of Luke there is no other way... Regards, Peter. > Hello, > What would be the best way to check Solr index against original system > (Database) to make sure index is up t