Re: Question on Solr Scalability

2010-02-10 Thread David Stuart
Hi, I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch Which allows sharding to live on different servers and will search across all of those shard when a query comes in. There are a few patch which will hopefully be available in the S

Re: Question on Solr Scalability

2010-02-10 Thread Juan Pedro Danculovic
To scale solr, take a look to this article http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Juan Pedro Danculovic CTO - www.linebee.com On Thu, Feb 11, 2010 at 4:12 AM, abhishes wrote: > > Suppose I am indexing very large data (5 billion rows

Question on Solr Scalability

2010-02-10 Thread abhishes
Suppose I am indexing very large data (5 billion rows in a database) Now I want to use the Solr Core feature to split the index into manageable chunks. However I have two questions 1. Can Cores reside on difference physical servers? 2. when a query comes, will the query be answered by index i

Re: dismax and multi-language corpus

2010-02-10 Thread Jason Rutherglen
> Claudio - fields with '-' in them can be problematic. Why's that? On Wed, Feb 10, 2010 at 2:38 PM, Otis Gospodnetic wrote: > Claudio - fields with '-' in them can be problematic. > > Side comment: do you really want to search across all languages at once?  If > not, maybe 3 different dismax c

hl.maxAlternateFieldLength defaults in solrconfig.xml

2010-02-10 Thread Yao Ge
It appears the hl.maxAlternateFieldLength parameter default setting in solrconfig.xml does not take effect. I can only get it to work by explicitly sending the parameter via the client request. It is not big deal but it appears to be a bug. -- View this message in context: http://old.nabble.com/

Re: Which schema changes are incompatible?

2010-02-10 Thread Chris Hostetter
: http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F : : but it is not clear about the times when this is needed. So I wonder, do I : need to do it after adding a field, removing a field, changing field type, : changing indexed/stored/multiValue prop

RE: HTTP caching and distributed search

2010-02-10 Thread Chris Hostetter
: I tried your suggestion, Hoss, but committing to the new coordinator : core doesn't change the indexVersion and therefore the ETag value isn't : changed. Hmmm... so the "empty" commit doesn't change the indexVersion? ... i didn't realize that. Well, I suppose you could replace your empty comm

The Riddle of the Underscore and the Dollar Sign . . .

2010-02-10 Thread Christopher Ball
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters with regard to Underscores. I am trying to get rid of underscores('_') when shingling, but seem unable to do so with a Stopwords Filter. And yet underscores are being removed when I am not even trying to by the WordDelimi

Re: source tree for lucene

2010-02-10 Thread Chris Hostetter
: i want to recompile lucene with : http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure : which source tree to use, i tried using the implied trunk revision : from the admin/system page but solr fails to build with the generated : jars, even if i exclude the patches from 2230... Hmm

RE: Index Courruption after replication by new Solr 1.4 Replication

2010-02-10 Thread Osborn Chan
Hi All, I found out there is file corruption issue by using both "EmbeddedSolrServer" & "Solr 1.4 Java based replication" together in slave server. In my slave server, I have 2 webapps in a tomcat instance. 1) "multicore" webapp with slave config 2) "my custom" webapp using EmbeddedSolrServer

Query elevation based on field

2010-02-10 Thread Jason Chaffee
Is it possible to do query elevation based on field? Basically, I would like to search the same term on three different fields: q=field1:term OR field2:term OR field3:term and I would like to sort the results by fourth field sort=field4+asc However, I would like to elevate all

Re: How to not limit maximum number of documents?

2010-02-10 Thread Chris Hostetter
: Okay. So we have to leave this question open for now. There might be : other (more advanced) users that can answer this question. It's for : sure, the solution we found is not quite good. The question really isn't "open", it's a FAQ... http://wiki.apache.org/solr/FAQ#How_can_I_get_ALL_the_ma

Re: Faceting

2010-02-10 Thread Chris Hostetter
: NOTE: Please start a new email thread for a new topic (See : http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking) FWIW: I'm the most nit-picky person i know about Thread-Hijacking, but i don't see any MIME headers to indicate that Jose did that). : > If i follow this path can i then

Re: Indexing / querying multiple data types

2010-02-10 Thread Chris Hostetter
: Subject: Indexing / querying multiple data types : In-Reply-To: <8cf3f00d0572f8479efcd0783be11eb1927...@xmb-rcd-104.cisco.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing

Re: How to configure multiple data import types

2010-02-10 Thread Chris Hostetter
: Subject: How to configure multiple data import types : In-Reply-To: <4b6c0de5.8010...@zib.de> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh e

Re: Need a bit of help, Solr 1.4: type "text".

2010-02-10 Thread Yu-Shan Fung
Check out the configuration of WordDelimiterFilterFactory in your schema.xml. Depending on your settings, it's probably tokenizaing 13th into "13" and "th". You can also have them concatenated back into a single token, but I can't remember the exact parameter. I think it could be catenateAll. O

Need a bit of help, Solr 1.4: type "text".

2010-02-10 Thread Dickey, Dan
I'm using the standard "text" type for a field, and part of the data being indexed is "13th", as in "Friday the 13th". I can't seem to get it to match when I'm querying for "Friday the 13th" either quoted or not. One thing that does match is "13 th" if I send the search query with a space between

Re: dismax and multi-language corpus

2010-02-10 Thread Otis Gospodnetic
Claudio - fields with '-' in them can be problematic. Side comment: do you really want to search across all languages at once? If not, maybe 3 different dismax configs would make your searches better. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :

implementing profanity detector

2010-02-10 Thread Mike Perham
FYI this does not work.  It appears that the update seems to run on a different thread to the analysis, perhaps because the update is done when the commit happens?  I'm sending the document XML with commitWithin="6". I would appreciate any other ideas.  I'm drawing a blank on how to implement

DataImportHandler - "too many connections" MySQL error after upgrade to Solr 1.4 release

2010-02-10 Thread Bojan Šmid
Hi all, I had DataImportHandler working perfectly on Solr 1.4 nightly build from June 2009. I upgraded the Solr to 1.4 release and started getting errors: Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException: Server connection failure during transaction. Due to underlying

RE: Indexing / querying multiple data types

2010-02-10 Thread Stefan Maric
Lance after a bit more reading - & cleaning up my configuration (case sensitivity corrected but didn't appear to be affecting the indexing & i don't use the atomID field for querying anyhow) I've added a docType field when I index my data and now use the fq parameter to filter on that new fiel

dismax and multi-language corpus

2010-02-10 Thread Claudio Martella
Hello list, I have a corpus with 3 languages, so i setup a text content field (with no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers. i copyField the text to my language-away fields. So, I setup this dismax searchHandler: dismax title^1.2 content-en^0.8 content-it

Re: Distributed search and haproxy and connection build up

2010-02-10 Thread Ian Connor
Thanks, I bypassed haproxy as a test and it did reduce the number of connections - but it did not seem as those these connections were hurting anything. Ian. On Tue, Feb 9, 2010 at 11:01 PM, Lance Norskog wrote: > This goes through the Apache Commons HTTP client library: > http://hc.apache.org

RE: analysing wild carded terms

2010-02-10 Thread Steven A Rowe
Hi Joe, See this recent thread from a user with a very similar issue: http://old.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--td24162104.html In the above thread, Mark Miller mentions that Lucene's AnalyzingQueryParser should do the trick, but would need to be integrated into So

Re: question/suggestion for Solr-236 patch

2010-02-10 Thread gdeconto
Joe Calderon-2 wrote: > > you can do that very easily yourself in a post processing step after > you receive the solr response > true (and am already doing so). was thinking that having this done as part of the field collapsing code, it might be faster than doing so via post processing (ie no

Re: analysing wild carded terms

2010-02-10 Thread Joe Calderon
sorry, what i meant to say is apply text analysis to the part of the query that is wildcarded, for example if a term with latin1 diacritics is wildcarded ide still like to run it through ISOLatin1Filter On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi wrote: >> hello *, quick question, what would i h

Re: question/suggestion for Solr-236 patch

2010-02-10 Thread Joe Calderon
you can do that very easily yourself in a post processing step after you receive the solr response On Wed, Feb 10, 2010 at 8:12 AM, gdeconto wrote: > > I have been able to apply and use the solr-236 patch (field collapsing) > successfully. > > Very, very cool and powerful. > > My one comment/conc

question/suggestion for Solr-236 patch

2010-02-10 Thread gdeconto
I have been able to apply and use the solr-236 patch (field collapsing) successfully. Very, very cool and powerful. My one comment/concern is that the collapseCount and aggregate function values in the collapse_counts list only represent the collapsed documents (ie the ones that are not shown in

delete via DIH

2010-02-10 Thread Lukas Kahwe Smith
Hi, There is a solution to update via DIH, but is there also a way to define a query that fetches id's for documents that should be removed? regards, Lukas Kahwe Smith m...@pooteeweet.org

Re: How to not limit maximum number of documents?

2010-02-10 Thread Ron Chan
I meant, available in total, not what just what satisfies the particular query you should have at least an estimate of the amount of total documents, even if it grows daily and if you are talking about millions of rows, and you are try to retrieve them all, IMHO, not getting all of them will

Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Okay. So we have to leave this question open for now. There might be other (more advanced) users that can answer this question. It's for sure, the solution we found is not quite good. In the meantime, I will look for a way to submit a feature request. :) Original-Message > D

Re: How to not limit maximum number of documents?

2010-02-10 Thread Walter Underwood
Solr will not do this efficiently. Getting all rows will be very slow. Adding a parameter will not make it fast. Why do you want to do this? wunder On Feb 10, 2010, at 7:06 AM, ego...@gmx.de wrote: > Setting the 'rows' parameter to a number larger than the number of documents > available requ

RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
Yes, I tried the q=&rows=-1 - the other day and gave up But as you say it wouldn't help because you might get a) timeouts because you have to wait a 'long' time for the large set of results to be returned b) exceptions being thrown because you're retrieving too much info to be thrown around the

Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Setting the 'rows' parameter to a number larger than the number of documents available requires that you know how much are available. That's what I intended to retrieve via the LukeRequestHandler. Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) But still, it feels li

AW: Solr-JMX/Jetty agentId

2010-02-10 Thread Jan Simon Winkelmann
2010/2/10 Jan Simon Winkelmann : > I am (still) trying to get JMX to work. I have finally managed to get a Jetty > installation running with the right parameters to enable JMX. Now the next > problem appeared. I need to get Solr to register ist MBeans with the Jetty > MBeanServer. Using service

Re: How to not limit maximum number of documents?

2010-02-10 Thread Ron Chan
just set the rows to a very large number, larger than the number of documents available useful to set the fl parameter with the fields required to avoid memory problems, if each document contains a lot of information - Original Message - From: "stefan maric" To: solr-user@lucene.a

RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
Egon If you first run your query with q=&rows=0 Then your you get back an indication of the total number of docs Now your app can query again to get 1st n rows & manage forward|backward traversal of results by subsequent queries Regards Stefan Maric -Original Message- From: ego..

Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Hi Stefan, you are right. I noticed this page-based result handling too. For web pages it is handy to maintain a number-of-results-per-page parameter together with an offset to browse result pages. Both can be done be solr's 'start' and 'rows' parameters. But as I don't use Solr in a web contex

Cannot get like exact searching to work

2010-02-10 Thread Aaron Zeckoski
I am using SOLR 1.3 and my server is embedded and accessed using SOLRJ. I would like to setup my searches so that exact matches are the first results returned, followed by near matches, and finally token based matches. For example, if I have a summary field in schema which is created using copyFiel

RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
I was just thinking along similar lines As far as I can tell you can use the parameters start & rows in combination to control the retrieval of query results So http://:/solr/select/?q= Will retrieve up to results 1..10 http://:/solr/select/?q=&start=11&rows=10 Will retrieve up results 11..20

Getting max/min dates from solr index

2010-02-10 Thread Mark N
How can we get the max and min date from the Solr index ? I would need these dates to draw a graph ( for example timeline graph ) Also can we use date faceting to show how many documents are indexed every month . Consider I need to draw a timeline graph for current year to show how many records

RE: analysing wild carded terms

2010-02-10 Thread Fuad Efendi
> hello *, quick question, what would i have to change in the query > parser to allow wildcarded terms to go through text analysis? I believe it is illogical. "wildcarded terms" will go through terms enumerator.

Re: Solr-JMX/Jetty agentId

2010-02-10 Thread Tim Terlegård
2010/2/10 Jan Simon Winkelmann : > I am (still) trying to get JMX to work. I have finally managed to get a Jetty > installation running with the right parameters to enable JMX. Now the next > problem appeared. I need to get Solr to register ist MBeans with the Jetty > MBeanServer. Using service

How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Hi at all, I'm working with Solr1.4 and came across the point, that Solr limits the number of documents retrieved by a solr response. This number can be changed by the common query parameter 'rows'. In my scenario it is very important that the response contains ALL documents in the index! I pl

spellcheck

2010-02-10 Thread michaelnazaruk
Hello,all! I have some problem with spellcheck! I download,build and connect dictionary(~500 000 words)!It work fine! But i have suggestions for any word (even correct word)! Is there possible to get suggestion only for wrong word? -- View this message in context: http://old.nabble.com/spellch

Solr-JMX/Jetty agentId

2010-02-10 Thread Jan Simon Winkelmann
Hi, I am (still) trying to get JMX to work. I have finally managed to get a Jetty installation running with the right parameters to enable JMX. Now the next problem appeared. I need to get Solr to register ist MBeans with the Jetty MBeanServer. Using , Solr doesn't complain on loading, but the

Re: "after flush: fdx size mismatch" on query durring writes

2010-02-10 Thread Michael McCandless
Yes, more details would be great... Is this easily repeated? The exists?=false is particularly spooky. It means, somehow, a new segment was being flushed, containing 1285 docs, but then after closing the doc stores, the stored fields index file (_X.fdx) had been deleted. Can you turn on IndexWr

Re: Replication and querying

2010-02-10 Thread Julian Hille
Hi, its would be possible to add that to the main solr but the problem is: Lets face it (example): We have kind of 1.5 million documents in the solr master. These Documents are books. These books have fields like title, ids, numbers and authors and more. This solr is global. Now: The slave solr