Re: Our SOLR instance seems to be single-threading and therefore not taking advantage of its multi-proc host

2010-09-14 Thread Jan Høydahl / Cominvent
Hi, Tell us more about your deploy. How many documents, and how large? How much RAM? What kind of physical disk system and how is it allocated to the VM? When do you measure - during indexing or during search load? Have you tried to throw more search load on the system? When (how many QPS) does i

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-14 Thread Peter Karich
Hi Peter, this scenario would be really great for us - I didn't know that this is possible and works, so: thanks! At the moment we are doing similar with replicating to the readonly instance but the replication is somewhat lengthy and resource-intensive at this datavolume ;-) Regards, Peter. > 1

A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps to reproduce:* 1.1 The indexed sample document contains only one sentence: "This is a TechNote." 1.2 Query is: q=TechNo

Swapping cores with SolrJ

2010-09-14 Thread Shaun Campbell
I've got Solr set up now with two cores which I call live and rebuild and which point to core0 and core1 directories respectively. My solr.xml file contains: In my Spring MVC application I have Solr set up as an embedded server and have two singleton beans which I use to refer to

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread Erick Erickson
Really well done problem statement by the way On Tue, Sep 14, 2010 at 5:40 AM, yandong yao wrote: > Hi Guys, > > I encountered a problem when enabling WordDelimiterFilterFactory for both > index and query (pasted relative part of schema.xml at the bottom of > email). > > *1. Steps to reprodu

order of analyzers, tokeinizers and filters

2010-09-14 Thread Markus.Rietzler
hi, it's the second time i am stumble across some strange behaviour: in my schema.xml i have defined i can't place the PatternReplaceFilter before the WhitespaceTokenizer. i have the schema like above, di

Re: order of analyzers, tokeinizers and filters

2010-09-14 Thread Rafał Kuć
Hello! Tokenizer is executed before filters, because tokenizer is "generating" tokens and than filters operate on them. > hi, > it's the second time i am stumble across some strange behaviour: > in my schema.xml i have defined > positionIncrementGap="100"> > > >

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread Robert Muir
did you index with solr 1.4 (or are you using solr 1.4) ? at a quick glance, it looks like it might be this: https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1 On Tue, Sep 14, 2010 at 5:40 AM, yandong yao wrote: > Hi Guys, > > I encountered a problem when enabling WordDe

How to install DuplicatesDetectorService

2010-09-14 Thread hellboy
I found http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/ Is anybody knows, hot to install ans use this lib on existing Solr instance? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp147256

Re: How to install DuplicatesDetectorService

2010-09-14 Thread Erick Erickson
Why do you want to? Perhaps there's a better solution for your underlying problem if you'd explain shat it is... Best Erick On Tue, Sep 14, 2010 at 8:05 AM, hellboy wrote: > > I found > > > http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/ > > Is anybody

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-14 Thread Peter Karich
Peter Sturge, this was a nice hint, thanks again! If you are here in Germany anytime I can invite you to a beer or an apfelschorle ! :-) I only needed to change the lockType to none in the solrconfig.xml, disable the replication and set the data dir to the master data dir! Regards, Peter Karich.

RE: order of analyzers, tokeinizers and filters

2010-09-14 Thread Jonathan Rochkind
CharFilters go before Tokenizers which go before (token) Filters. Token filters (called just in the config) operate on tokens, so need to go after the tokenizer. WhitespaceTokenizer is a tokenizer. PatternReplaceFilterFactory is a token filter. What you probably want instead is solr.Patter

possible bug in zookeeper / solrCloud ?

2010-09-14 Thread Yatir Ben Shlomo
Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances. I am performing survivability tests: Taking one of the zookeeper instances down I would expect the client to use a different zookeeper server instance. But as you can see in the below logs attached Depending on which insta

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
Hi Robert, I am using solr 1.4, will try with 1.4.1 tomorrow. Thanks very much! Regards, Yandong Yao 2010/9/14 Robert Muir > did you index with solr 1.4 (or are you using solr 1.4) ? > > at a quick glance, it looks like it might be this: > https://issues.apache.org/jira/browse/SOLR-1852 , whi

Re: Swapping cores with SolrJ

2010-09-14 Thread MitchK
Hi Shaun, I think it is more easy to fix this problem, if we got more information about what is going on in your application. Please, could you provide the CoreAdminResponse returned by car.process() for us? Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.

Re: Swapping cores with SolrJ

2010-09-14 Thread Shaun Campbell
Hi Mitch Thanks for responding. Not actually sure what you wanted from CoreAdminResponse but I put the following in: CoreAdminRequest car = new CoreAdminRequest(); car.setCoreName("live"); car.setOtherCoreName("rebuild"); car.setAction(CoreAdminPar

Returning max value of fields within documents

2010-09-14 Thread Kura
Hey guys, Is there a way of doing the following: We want to get the highest value from a list of multiple fields within a document. Example below: max(field1,field2,field3,field4) The values are as follow: field1 = 100 field2 = 300 field3 = 250 field4 = not indexed in document (null) The hig

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Jonathan Rochkind
Shawn Heisey wrote: The one called PatternReplaceFilterFactory (no Char) has been around forever. It is not mentioned on the Wiki page about analyzers. The one called PatternReplaceCharFilterFactory is only available from svn. This seems to be true, which I hadn't realized either. The

Re: Returning max value of fields within documents

2010-09-14 Thread Jonathan Rochkind
The stats component will give you the maximum value within one field: http://wiki.apache.org/solr/StatsComponent You're going to have to compute the max amongst several fields client-side, having StatsComponent return the max for each field, and then just max-ing them client side. Not hard.

Re: Returning max value of fields within documents

2010-09-14 Thread Jonathan Rochkind
Oh wait, I misunderstood, you want just the highest value _for one document_, from stored fields, given for each document? StatsComponent won't help you there. Either do it client side, or do it at index time in a single stored field, that's it. Maybe there's some confusing way to use a quer

Solr 1.4.1 and field collapsing

2010-09-14 Thread Moazzam Khan
Hey guys, Has anyone successfully compiled and used Field Collapsing patch (236) with Solr 1.4.1? I keep getting this exception when I search: null java.lang.NullPointerException at org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser$FloatValueFieldComparator.compare(NonA

LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
Hi, I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create tokens, based solely on lower-casing characters. Is there a way to tell it NOT to drop non-characters? It's amazingly frustrating that the TokenizerFactory and the FilterFactory have two entirely different modes of behav

Solr Rolling Log Files

2010-09-14 Thread Vladimir Sutskever
Can SOLR be configured out of the box to handle rolling log files? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. Tel: (212) 552.5097 This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale

Geographic clustering

2010-09-14 Thread Charlie DeTar
Hi, I'm interested in using geographic clustering of records in a Solr search index. Specifically, I want to be able to efficiently produce a map with clustered bubbles that represent the number of documents that are indexed with points in that general area. I'd like to combine this with other f

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Robert Muir
On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea wrote: > Hi, > > I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create > tokens, based solely on lower-casing characters. Is there a way to tell it > NOT to drop non-characters? It's amazingly frustrating that the > TokenizerFactor

Facet Field Value truncation

2010-09-14 Thread Niall O'Connor
Hi, Has anyone come across a situation where they have seen their facet field values wrap into a new facet entry when the value exceeds 256 characters? For example: 2302 1403 1382 419 236 236* As you can see the last value in the tissue-antology list is split between two facet values.

Re: Facet Field Value truncation

2010-09-14 Thread Jonathan Rochkind
Faceting on a multi-value field? I wonder if your positionIncrementGap for your field definition in your schema is 256. I am not sure what it defaults to. But it seems possible if it's 256 it could lead to what you observed. Try explicitly defining it to be really really big maybe? I'm not

Re: Facet Field Value truncation

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor wrote: > Has anyone come across a situation where they have seen their facet field > values wrap into a new facet entry when the value exceeds 256 characters? Yes, for indexed string fields, there currently is a limit of 256 chars per token. It's b

RE: Field names

2010-09-14 Thread Peter A. Kirk
From: Simon Willnauer [simon.willna...@googlemail.com] Sent: Tuesday, 14 September 2010 17:47 To: solr-user@lucene.apache.org Subject: Re: Field names >On Tue, Sep 14, 2010 at 1:39 AM, Peter A. Kirk wrote: >> >> >> >> So it only finds 9? > >Since the "gb" term says 18 occurrences throughout th

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Erick Erickson
Hmmm, were you logged in on the Wiki? If not, you can create a login pretty easily... Or someone might pick it up.. Erick On Tue, Sep 14, 2010 at 12:18 PM, Jonathan Rochkind wrote: > > > Shawn Heisey wrote: > >> >> The one called PatternReplaceFilterFactory (no Char) has been around >> forever.

Re: Solr Rolling Log Files

2010-09-14 Thread Erick Erickson
What does "handle" mean? Create them or index them? Erick On Tue, Sep 14, 2010 at 2:02 PM, Vladimir Sutskever < vladimir.sutske...@jpmorgan.com> wrote: > Can SOLR be configured out of the box to handle rolling log files? > > > Kind regards, > > Vladimir Sutskever > Investment Bank - Technology >

solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread h00kpub...@gmail.com
hi... i am using solr for indexing local files (solrj) and indexing crawled nutch-documents... i have configured the fieldtype and use this type by field stored="true"/> the pattern for date is how described in DateField.java: -MM-dd'T'HH:mm:ssZ i need this date for sorting my se

Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com wrote: > SEVERE: org.apache.solr.common.SolrException: Error while creating field > 'metadata_last_modified{type=date,properties=indexed,stored,omitNorms}' from > value '2010-09-14T22:29:24+0200' Different timezones are currently not allowed -

Re: Facet Field Value truncation

2010-09-14 Thread Niall O'Connor
I opened a bug for this issue: https://issues.apache.org/jira/browse/SOLR-2120 On 09/14/2010 03:51 PM, Yonik Seeley wrote: On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor wrote: Has anyone come across a situation where they have seen their facet field values wrap into a new facet entry whe

RE: Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Markus Jelsma
It would be a nice feature if Solr supports queries with time zone support on an index where all times are UTC. There is some chatter about this in SOLR-750 but i haven't found an issue that would add support for time zone queries.   Did i do a lousy search or is the issue missing as of yet?  

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
I went for a different route: https://issues.apache.org/jira/browse/LUCENE-2644 Scott On Tue, Sep 14, 2010 at 11:18 AM, Robert Muir wrote: > On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea wrote: > > > Hi, > > > > I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create > > token

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Jonathan Rochkind
Erick Erickson wrote: Hmmm, were you logged in on the Wiki? If not, you can create a login pretty easily... Or someone might pick it up.. I was logged in, created an account just for that purpose in fact. The page still said "protected" or something and wouldn't let me edit it. I tried, rea

Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Erick Erickson
If you're using Javas SimpleDateFormat, try enclosing your Z in the format string with single quotes, like: SimpleDateFormat sdf = new SimpleDateFormat("-MM-dd'T'HH:mm:ss'Z'"); HTH Erick On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com < h00kpub...@googlemail.com> wrote: > hi... i am u

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Jonathan Rochkind
Why would you want to do that, instead of just using another tokenizer and a lowercasefilter? It's more confusing less DRY code to leave them separate -- the LowerCaseTokenizerFactory combines anyway because someone decided it was such a common use case that it was worth it for the demonstrat

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Robert Muir
Jonathan, you bring up an excellent point. I think its worth our time to actually benchmark this LowerCaseTokenizer versus LetterTokenizer + LowerCaseFilter This tokenizer is quite old, and although I can understand there is no doubt its technically faster than LetterTokenizer + LowerCaseFilter e

Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
I'm using for a field, indexing, then looking at the terms component. I'm seeing shingles that consist of only 2 terms, whereas I'm expecting all the terms to be at least 4 terms... What's up? Thanks.

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
There doesn't seem to have been anything readily available. All of the tokenizers make their own assumptions about how I want to treat the data. The end result is that this felt like the most direct approach. The default behavior of "LowerCaseTokenizer"(+Factory) was retained, while allowing it

wildcard searches not consistent

2010-09-14 Thread Rico Lelina
Hi, Still working on extending my proof of concept by working off the example configuration and modifying the schema.xml. Having trouble with wildcard searches: factory OR faction -- 40 results (ok) factory -- 1 result (ok) faction -- 39 results (ok) facti?n -- 39 results (ok) fact* -- 40 resul

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
I'd agree with your point entirely. My attacking LowerCaseTokenizer was a result of not wanting to create yet more Classes. That said, rightfully dumping LowerCaseTokenizer would probably have me creating my own Tokenizer. I could very well be thinking about this wrong... But what if I wanted t

Re: Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
To answer my own question, and this sucks :) the minShingleSize isn't set in at least 1.4.2. I'm guessing a later version though? On Tue, Sep 14, 2010 at 5:49 PM, Jason Rutherglen wrote: > positionIncrementGap="100"> > > > > words="stopwords.txt"/> > maxShingleSize="4" outputUnigrams="fal

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Jonathan Rochkind
How about patching the LetterTokenizer to be capable of tokenizing how you want, which can then be combined with a LowerCaseFilter (or not) as desired. Or indeed creating a new tokenizer to do exactly what you want, possibly (but one that doesn't combine an embedded lowercasefilter in there too

Re: Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
And here's the issue... https://issues.apache.org/jira/browse/SOLR-1740 On Tue, Sep 14, 2010 at 6:08 PM, Jason Rutherglen wrote: > To answer my own question, and this sucks :)  the minShingleSize isn't > set in at least 1.4.2.  I'm guessing a later version though? > > On Tue, Sep 14, 2010 at 5:49

Re: wildcard searches not consistent

2010-09-14 Thread Robert Muir
> but > > facto?y -- 0 (expecting 1) > > you have stemming enabled for the field? stemming will make your wildcards behave strangely. I would recommend you turn it off. because stemming likely turned factory into factori or similar > I thought these are all valid searches but am I missing somethi

Re: Null pointer exception when mixing highlighter & shards & q.alt

2010-09-14 Thread Chris Hostetter
I didn't see any open Jira issues for this, so i created one... https://issues.apache.org/jira/browse/SOLR-2121 : Date: Tue, 7 Sep 2010 01:35:39 -0700 (PDT) : From: Marc Sturlese : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: Null pointer exception when

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Erick Erickson
K, just making sure. Erick On Tue, Sep 14, 2010 at 5:20 PM, Jonathan Rochkind wrote: > Erick Erickson wrote: > >> Hmmm, were you logged in on the Wiki? If not, you can create a login >> pretty easily... >> >> Or someone might pick it up.. >> >> > I was logged in, created an account just for

Re: wildcard searches not consistent

2010-09-14 Thread Rico Lelina
That was it! Thank you very much. - Original Message From: Robert Muir To: solr-user@lucene.apache.org Sent: Tue, September 14, 2010 5:58:03 PM Subject: Re: wildcard searches not consistent > but > > facto?y -- 0 (expecting 1) > > you have stemming enabled for the field? stemming will

Re: Geographic clustering

2010-09-14 Thread Dennis Gearon
You are probably not talking about clusters in the physical structure of data on this disk, right? What do YOU mean by clusters if not? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
There's a lot of reasons, with the performance hit being notable--but also because I feel that using a regex on something this basic amounts to a lazy hack. I'm typically against regular expressions in XML. I'm vehemently opposed to them in cases where not using them should otherwise be quite tri

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Jonathan Rochkind
Because (just IMO, I'm not an expert here either) the basic framework in Solr is that tokenizers tokenize, but they don't generally change bytes inside values. What changes bytes (or adds or removes tokens to the token stream initially created by a tokenizer, etc) is filters. And there's alrea

Re: Geographic clustering

2010-09-14 Thread Charlie DeTar
On 09/14/2010 07:48 PM, Dennis Gearon wrote: > You are probably not talking about clusters in the physical structure of data > on this disk, right? > > What do YOU mean by clusters if not? I mean basically "range facets", where the ranges are 2-dimensional distances between documents that have i

Re: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out : SingleInstanceLock: write.lock

2010-09-14 Thread Bharat Jain
Thanks Mark for taking time to reply. What else could cause this issue to happen so frequently. We have a master/slave configuration and only one update server that writes to index. We have plenty of disk space available. Thanks Bharat Jain On Fri, Sep 10, 2010 at 8:19 AM, Mark Miller wrote:

about SolrCloud

2010-09-14 Thread 郭芸
Dear All: I am studying SolrCloud now,I downloaded it from:https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/ but i found that there no webapps:https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/example/webapps/ but we need http://localhost:8983/solr/collection1/admin/zookee

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
After upgrading to 1.4.1, it is fixed. Thanks very much for your help! Regards, Yandong Yao 2010/9/14 yandong yao > Hi Robert, > > I am using solr 1.4, will try with 1.4.1 tomorrow. > > Thanks very much! > > Regards, > Yandong Yao > > 2010/9/14 Robert Muir > > did you index with solr 1.4 (or

LukeRequestHandler numTerms

2010-09-14 Thread Peter A. Kirk
Hi when using LukeRequestHandler, I can for example call: http://localhost:8983/solr/admin/luke?fl=name&fl=cat which will return data including the frequency of the top 10 search terms in the specified fields. I can also add a "numTerms" parameter to obtain more than the top 10. But how do I e

RE: LukeRequestHandler numTerms

2010-09-14 Thread Jonathan Rochkind
> when using LukeRequestHandler... > But how do I ensure I get *all* the terms in the index returned? Can I set > "numTerms=ALL" or something like that? I'm not sure about LukeRequestHandler, but you can do that with the TermsComponent instead. /terms?terms.fl=name&terms.limit=-1 Will give y

Re: Geographic clustering

2010-09-14 Thread Dennis Gearon
So, basically, faceting geographically? within 100 meters within 300 meters within 1km within 3km within 10km within 100km This type of results? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www

RE: Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Dennis Gearon
Can you give us a scencario: 1/ Like a OOP sequence diagram, "Thishappens, that happens, now that" 2/ Where you see it useful? Isn't it possible to convert before storing/after retrieving? Couldn't a timezone offset (or local timezone designation) be stored as a separate field to fil

Re: Geographic clustering

2010-09-14 Thread Dennis Gearon
>From what I can tell, it's being controlled in the browser. I CAN'T tell if >it's being generated in the browser or in the server. Which is it in the example,and where to you want it generated? Do you want the DATA for the clusters, or the actual icons also? Looks like a display object way to

Re: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out : SingleInstanceLock: write.lock

2010-09-14 Thread Dennis Gearon
I saw something about having separate reader vs writer to an index. The email said that the reader had to do occasional (empty) commits to keep the cache warm and for another reason. Is this relevant? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all

cloud or zookeeper

2010-09-14 Thread satya swaroop
Hi All, What is the difference of using shards,solr cloud and zookeeper.. which is the best way to scale the solr.. I need to reduce the index size in every system and reduce the search time for a query... Regards, satya