Re: drastic performance decrease with 20 cores

2011-09-26 Thread Toke Eskildsen
On Tue, 2011-09-27 at 02:43 +0200, Bictor Man wrote: > thanks for your replies. indeed the filesystem caching seems to be the > difference. sadly I can't add more memory and the 6GB/20core combination > doesn't work. so I'll just try to tweak it as much as I can. A (better) alternative to more mem

Re: multiple dateranges/timeslots per doc: modeling openinghours.

2011-09-26 Thread David Smiley (@MITRE.org)
In case anyone is curious, I responded to him with a solution using either SOLR-2155 (Geohash prefix query filter) or LSP: https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244 ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-s

Re: what is delata query and how to write?

2011-09-26 Thread Gora Mohanty
On Tue, Sep 27, 2011 at 11:25 AM, nagarjuna wrote: > Hi gora can u pls quit ur answers like these.. >            i may get the perfect answer from anybody but not u,so kindly > please be quit Sorry, didn't mean to be particularly obnoxious. > i already googled and i saw many links as

Re: How to reserve ids?

2011-09-26 Thread Gabriele Kahlout
I'm interested in the stopwords solution as it sounds like less work but i'm not sure i understand how it works. By having msn.com as a stopword it doesnt mean i wont get msn.com as a result for say 'hotmail'. My understanding is that msn.com will never make it to the similarity function and thu

Re: what is delata query and how to write?

2011-09-26 Thread nagarjuna
Hi gora can u pls quit ur answers like these.. i may get the perfect answer from anybody but not u,so kindly please be quit i already googled and i saw many links as a beginner i am unable to got the main intention behind using the delta query,even we have query.and i di

Re: what is delata query and how to write?

2011-09-26 Thread Gora Mohanty
On Tue, Sep 27, 2011 at 10:51 AM, nagarjuna wrote: > Hi everybody. > > right now i have little bit idea about the solr query ..but i am not > clear about delta query > wht it is? and how to write ?any sample delta query? http://lmgtfy.com/?q=solr+delta+query There are many useful links a

Re: Solr stopword problem in Query

2011-09-26 Thread Isan Fulia
Hi Rahul, I also tried searching "Coke Studio MTV" but no documents were returned. Here is the snippet of my schema file. *

Re: How to implement Spell Checker using Solr?

2011-09-26 Thread tamanjit.bin...@yahoo.co.in
Firstly, just to make it clear the dictionary is made out of already indexed terms, rather it is built upon it if you are using *solr.IndexBasedSpellChecker* which you are. Next lot of changes are required for your *solrconfig.xml* 1. spell is the name of the field which will be used to create yo

Re: How to implement Spell Checker using Solr?

2011-09-26 Thread anupamxyz
I have been able to setup Solr Spell checker on my web application. It is a file based spell checker that i have implemented. I would like to add that the same isn't that accurate, since I haven't applied any specific algorithm for having the most relevant search result. Kindly do let me know in ca

Re: solr DIH for mongodb

2011-09-26 Thread Otis Gospodnetic
>From: Kiwi de coder > >wow, this search engine is powerful ! Thanks, glad it helps. >too bad after look throught it, still got not solution. > >seem like I need to get my hand dirty to make one :) :) Please consider contributing: http://wiki.apache.org/solr/HowToContribute Otis >kiwi > >

Re: drastic performance decrease with 20 cores

2011-09-26 Thread Otis Gospodnetic
The following should help with size estimation: http://search-lucene.com/?q=estimate+memory&fc_project=Solr http://issues.apache.org/jira/browse/LUCENE-3435 I'll just add that with that much RAM you'll be more than fine. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene

Re: Boost Exact matches on Specific Fields

2011-09-26 Thread Balaji S
Hi You mean to say copy the String field to a Text field or the reverse . This is the approach I am currently following Step 1: Created a FieldType Step 2 : Step 3 : And in the SOLR Query planning to q=hospitals&qf=body^4.0 title^5.0

Re: solr DIH for mongodb

2011-09-26 Thread Kiwi de coder
wow, this search engine is powerful ! too bad after look throught it, still got not solution. seem like I need to get my hand dirty to make one :) kiwi On Tue, Sep 27, 2011 at 12:08 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hi, > > Here is a 1 month old thread I found on sea

Re: matching reponse and request

2011-09-26 Thread Otis Gospodnetic
Hi Roland, Have a look at hit #1 here: http://search-lucene.com/?q=manifoldcf&fc_project=Solr I think this is what you are after. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > >From: R

Re: error while replication

2011-09-26 Thread Otis Gospodnetic
Rajat, What version?  If < 3.4.0, I'd try 3.4.0 first. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > >From: shinkanze >To: solr-user@lucene.apache.org >Sent: Monday, September 26, 2011

Re: SOLR Index Speed

2011-09-26 Thread Otis Gospodnetic
Hello, > PS: solr streamindex  is not option because we need to submit javabin... If you are referring to StreamingUpdateSolrServer, then the above statement makes no sense and you should give SUSS a try. Are you sure your 16 reducers produce more than 500 docs/second? I think somebody already

Re: Update ingest rate drops suddenly

2011-09-26 Thread Otis Gospodnetic
Aha!  See, it was the DB after all! ;)  Thanks for following up, I was curious. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > >From: eks dev >To: solr-user >Sent: Monday, September 26,

Re: solr DIH for mongodb

2011-09-26 Thread Otis Gospodnetic
Hi, Here is a 1 month old thread I found on search-lucene -- didn't even have to do a search, I got it as a suggestion from AutoComplete when I started typing the word mongodb :) http://search-lucene.com/m/8AEE31AaTd32 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene

Re: Boost Exact matches on Specific Fields

2011-09-26 Thread Way Cool
If I were you, probably I will try defining two fields: 1. ts_category as a string type 2. ts_category1 as a text_en type Make sure copy ts_category to ts_category1. You can use the following as qf in your dismax: qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0 or something like that. YH

Re: How to reserve ids?

2011-09-26 Thread Otis Gospodnetic
Hi Gabriele, Either the latter option, or just treat them as stop words if you just want to remove those urls/ids from indexed docs (may still get highlighted). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ >___

Re: Searching multiple fields

2011-09-26 Thread Otis Gospodnetic
Hi Mark, Eh, I don't have Lucene/Solr source code handy, but I *think* for that you'd need to write custom Lucene similarity. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > >From: Mark

Any plans to support function queries on score?

2011-09-26 Thread Way Cool
Hi, guys, Do you have any plans to support function queries on score field? for example, sort=floor(product(score, 100)+0.5) desc? So far I am getting the following error: undefined field score I can't use subquery in this case because I am trying to use secondary sorting, however I will be open

Re: external file field partial data match in key field

2011-09-26 Thread abhayd
i found answer to my question .. basically it works only with complete match.. -- View this message in context: http://lucene.472066.n3.nabble.com/external-file-field-partial-data-match-in-key-field-tp3368547p3371328.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to apply filters to stored data

2011-09-26 Thread Jithin
Is UpdateProcessor triggered when updating an existing document or for new documents also? On Tue, Sep 27, 2011 at 6:00 AM, Chris Hostetter-3 [via Lucene] < ml-node+s472066n3371110...@n3.nabble.com> wrote: > > : Hi Erick, The problem I am trying to solve is to filter invalid entities. > > : User

Re: drastic performance decrease with 20 cores

2011-09-26 Thread Bictor Man
Hi guys, thanks for your replies. indeed the filesystem caching seems to be the difference. sadly I can't add more memory and the 6GB/20core combination doesn't work. so I'll just try to tweak it as much as I can. thanks a lot. 2011/9/26 François Schiettecatte > You have not said how big your

Re: How to apply filters to stored data

2011-09-26 Thread Chris Hostetter
: Hi Erick, The problem I am trying to solve is to filter invalid entities. : Users might mispell or enter a new entity name. This new/invalid entities : need to pass through a KeepWordFilter so that it won't pollute our : autocomplete result. how are you doing autocomplete? if you are using th

Searching multiple fields

2011-09-26 Thread Mark
I have a use case where I would like to search across two fields but I do not want to weight a document that has a match in both fields higher than a document that has a match in only 1 field. For example. Document 1 - Field A: "Foo Bar" - Field B: "Foo Baz" Document 2 - Field A: "Foo Blar

Re: Unique Key error on trunk

2011-09-26 Thread Chris Hostetter
: Subject: Re: Unique Key error on trunk : : : You can replicate it with the example app by replacing the id definition in schema.xml with : : > thanks for reporting this Viswa, I've filed a bug to track it... https://issues.apache.org/jira/browse/SOLR-2796 -Hoss

Re: how to implemente a query like " like '%pattern%' "

2011-09-26 Thread Chris Hostetter
: References: : : In-Reply-To: : : Subject: how to implemente a query like " like '%pattern%' " https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead star

RE: SOLR Index Speed

2011-09-26 Thread Lan
Are you batching the documents before sending them to the solr server? Are you doing a commit only at the end? Also since you have 32 cores, you can try upping the number of concurrent updaters from 16 to 32. Jaeger, Jay - DOT wrote: > > 500 / second would be 1,800,000 per hour (much more than

RE: A fieldType for a address street

2011-09-26 Thread Jaeger, Jay - DOT
We used copyField to copy the address to two fields: 1. Which contains just the first token up to the first whitespace 2. Which copies all of it, but translates to lower case. Then our users can enter either a street number, a street name, or both. We copied all of it to the second field bec

Re: how to implemente a query like " like '%pattern%' "

2011-09-26 Thread Tomás Fernández Löbbe
If you need those kinds of searches then you should probably not be using the KeywordTokenizerFactory, is there any reason why you can't switch to a WhitespaceTokenizer for example? then you could use a simple phrase query for your search case. if you need everything as a Token, you could use a cop

Boost Exact matches on Specific Fields

2011-09-26 Thread balaji
Hi all I am new to SOLR and have a doubt on Boosting the Exact Terms to the top on a Particular field For ex : I have a text field names ts_category and I want to give more boost to this field rather than other fields, SO in my Query I pass the following in the QF params "qf=body^4.0 ti

How to reserve ids?

2011-09-26 Thread Gabriele Kahlout
Hello, While indexing there are certain urls/ids I'd never want to appear in the search results (so be indexed). Is there already a 'supported by design' mechanism to do that to point me too, or should I just create this blacklist as an processor in the update chain? -- Regards, K. Gabriele ---

RE: SOLR Index Speed

2011-09-26 Thread Jaeger, Jay - DOT
500 / second would be 1,800,000 per hour (much more than 500K documents). 1) how big is each document? 2) how big are your index files? 3) as others have recently written, make sure you don't give your JRE so much memory that your OS is starved for memory to use for file system cache. JRJ --

Re: mlt content stream help

2011-09-26 Thread Chris Hostetter
Dan: The disconnect here seems to be that these examples urls on the MoreLikeThisHandler wiki page assume a "/mlt" request handler exists, but no handler by that name has ever actually existed in the solr example configs. (the wiki page doesn't explicitly state that those URLs will work with

Re: Unique Key error on trunk

2011-09-26 Thread Viswa S
You can replicate it with the example app by replacing the id definition in schema.xml with > Removing the id fields in the one of the example doc.xml and posting it to solr. Thanks Viswa On Sep 26, 2011, at 12:15 AM, Viswa S wrote: > Hello, > > We use solr.UUIDField to generate unique

aggregate functions in Solr?

2011-09-26 Thread Esteban Donato
Hello guys,   I need to implement a functionality which requires something similar to aggregate functions in SQL.  My Solr schema looks like this: -doc_id: integer -date: date -value1: integer -value2: integer   Basically the index contains some numerical values (value1, value2, etc) per doc and

Re: SOLR error with custom FacetComponent

2011-09-26 Thread Chris Hostetter
: : Unfortunately the facet fields are not static. The field are dynamic SOLR : fields and are generated by different applications. : The field names will be populated into a data store (like memcache) and : facets have to be driven from that data store. : : I need to write a Custom FacetComponen

how to implemente a query like " like '%pattern%' "

2011-09-26 Thread libnova
Hi all. how can we do a query similar to 'like' ? if I have this phrase like a single token in the index: "This phrase has various words" (using KeywordTokenizerFactory) and i like a exact match of: "phrase has various" or "various words" form instance... How can i do this?? Thanks a lot.

Re: drastic performance decrease with 20 cores

2011-09-26 Thread François Schiettecatte
You have not said how big your index is but I suspect that allocating 13GB for your 20 cores is starving the OS of memory for caching file data. Have you tried 6GB with 20 cores? I suspect you will see the same performance as 6GB & 10 cores. Generally it is better to allocate just enough memory

Re: drastic performance decrease with 20 cores

2011-09-26 Thread Shawn Heisey
On 9/26/2011 9:33 AM, Bictor Man wrote: Hi everyone, Sorry if this issue has been discussed before, but I'm new to the list. I have a solr (3.4) instance running with 20 cores (around 4 million docs each). The instance has allocated 13GB in a 16GB RAM server. If I run several sets of queries se

drastic performance decrease with 20 cores

2011-09-26 Thread Bictor Man
Hi everyone, Sorry if this issue has been discussed before, but I'm new to the list. I have a solr (3.4) instance running with 20 cores (around 4 million docs each). The instance has allocated 13GB in a 16GB RAM server. If I run several sets of queries sequentially in each of the cores, the I/O a

solr DIH for mongodb

2011-09-26 Thread Kiwi de coder
hi, do we got any DIH plugin which is for mongodb? regards, kiwi

Re: Solr stopword problem in Query

2011-09-26 Thread Rahul Warawdekar
Hi Isan, Does your search return any documents when you remove the 'at' keyword and just search for "Coke studio MTV" ? Also, can you please provide the snippet of schema.xml file where you have mentioned this field name and its "type" description ? On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia wro

Re: mlt content stream help

2011-09-26 Thread dan whelan
OK. This is exactly what i did. With a fresh download of solr 3.2 unpack and go to example directory start solr: java -jar start.jar the go to exampledocs and run: ./post.sh *xml Then go here: http://localhost:8983/solr/mlt?stream.body=electronics%20memory&mlt.fl=manu,cat&mlt.interestingTerm

Solr Cloud Number of Shard Limitation?

2011-09-26 Thread Jamie Johnson
Is there any limitation, be it technical or for sanity reasons, on the number of shards that can be part of a solr cloud implementation?

drastic performance decrease with 20 cores

2011-09-26 Thread Bictor Man
Hi everyone, Sorry if this issue has been discussed before, but I'm new to the list. I have a solr (3.4) instance running with 20 cores (around 4 million docs each). The instance has allocated 13GB in a 16GB RAM server. If I run several sets of queries sequentially in each of the cores, the I/O a

Re: mlt content stream help

2011-09-26 Thread Erick Erickson
Please don't say "it's just like the example". If it was, then it would most likely be working. If you don't take the time to show us what you've tried, and the results you get back, then there's not much we can do to help. Best Erick On Mon, Sep 26, 2011 at 7:18 AM, dan whelan wrote: > On 9/24

Re: Solr stopword problem in Query

2011-09-26 Thread Bill Bell
This is pretty serious issue Bill Bell Sent from mobile On Sep 26, 2011, at 4:09 AM, Isan Fulia wrote: > Hi all, > > I have a text field named* textForQuery* . > Following content has been indexed into solr in field textForQuery > *Coke Studio at MTV* > > when i fired the query as > *textFor

Re: Update ingest rate drops suddenly

2011-09-26 Thread eks dev
Just to bring closure on this one, we were slurping data from the wrong DB (hardly desktop class machine)... Solr did not cough on 41Mio records @34k updates / sec., single threaded. Great! On Sat, Sep 24, 2011 at 9:18 PM, eks dev wrote: > just looking for hints where to look for... > > We we

Re: mlt content stream help

2011-09-26 Thread dan whelan
On 9/24/11 12:17 PM, Erick Erickson wrote: What version of Solr? I am using solr 3.2 When you copied the default, did you set up default values for MLT? This is what I need help with. "How should the request handler / solrconfig be setup?" Showing us the request you used The request is

Re: email - DIH

2011-09-26 Thread jb
Hi Alonso, Gora, I run in the same Problem with the MailEntityProcessor. I have an Email-Folder called "Test". Inside there a "only" two messages. When I run the DIH everything looks find, except that the two Emails doesn't get indexed. Are there any adidtional informations to this problem? I'm

RE: Best Solr escaping?

2011-09-26 Thread Bob Sandiford
I won't guarantee this is the 'best algorithm', but here's what we use. (This is in a final class with only static helper methods): // Set of characters / Strings SOLR treats as having special meaning in a query, and the corresponding Escaped versions. // Note that the actual operators

Re: NRT and commit behavior

2011-09-26 Thread Vadim Kisselmann
Tirthankar, are you indexing 1.smaller docs or 2.books? if 1. your caches are too big for your memory, as Erick already said. Try to allocate 10GB für JVM, leave 14GB for your HDD-Cache and make your caches smaller. if 2. read the blog-posts on hathitrust.com. http://www.hathitrust.org/blogs/la

SOLR Index Speed

2011-09-26 Thread Lord Khan Han
Hi, We have 500K web document and usind solr (trunk) to index it. We have special anaylizer which little bit heavy cpu . Our machine config: 32 x cpu 32 gig ram SAS HD We are sending document with 16 reduce client (from hadoop) to the stand alone solr server. the problem is we couldnt get speedi

Re: Seek your wisdom for implementing 12 million docs..

2011-09-26 Thread Toke Eskildsen
On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote: > Documents: We have close to ~12 million XML docs, of varying sizes average > size 20 KB. These documents have 150 fields, which should be searchable & > indexed. [...] Approximately ~6000 such documents are updated & 400-800 new > ones > are a

AW: How to map database table for facted search?

2011-09-26 Thread Chorherr Nikolaus
Thx for your response, we will try dynamic fields for this -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 24. September 2011 21:33 An: solr-user@lucene.apache.org Betreff: Re: How to map database table for facted search? In general, you

Solr stopword problem in Query

2011-09-26 Thread Isan Fulia
Hi all, I have a text field named* textForQuery* . Following content has been indexed into solr in field textForQuery *Coke Studio at MTV* when i fired the query as *textForQuery:("coke studio at mtv")* the results showed 0 documents After runing the same query in debugMode i got the following r

multiple dateranges/timeslots per doc: modeling openinghours.

2011-09-26 Thread britske
Sorry for the somewhat length post, I would like to make clear that I covered my basis here, and looking for an alternative solution, because the more trivial solutions don't seem to work for my use-case. Consider Bars, musea, etc. These places have multiple openinghours that can depend on: RE

error while replication

2011-09-26 Thread shinkanze
hi , I am replicating solr and getting this error . i am unable to make out the cause so please kindly help 26 Sep, 2011 8:00:14 AM org.slf4j.impl.JDK14LoggerAdapter fillCallerData SEVERE: Error during auto-warming of key:org.apache.solr.search.QueryResultKey@150f0455:java.lang.NullPointerExcept

external file field partial data match in key field

2011-09-26 Thread abhayd
hi i have product inventory data in solr index I would like to boost or sort results by using some popularity. for instance SOLR index has field named Title. Some docs with tile like iphone 4 - white iphone 3 - white blackberry torch I would like to boost docs where title contains word "iphone"

Unique Key error on trunk

2011-09-26 Thread Viswa S
Hello, We use solr.UUIDField to generate unique ids, using the latest trunk (change list 1163767) seems to throw an error "Document is missing mandatory uniqueKey field: id". The schema is setup to generate a id field on updates Thanks Viswa SEVERE: org.apache.solr.common.SolrException: