query about the server configuration

2011-06-19 Thread Jonty Rhods
Dear all, I am quite new and not work on solr for heavy request. I have following server configuration: 16GB RAM 16 CPU I need to index update in every minutes and at least more than 5000 docs per day. Size of the data per day will be around 50 MB. I am expecting 10 to 30 concurrent hit on serv

Re: about the SolrServer server = new CommonsHttpSolrServer(URL);

2011-06-19 Thread Jonty Rhods
for heavy use (30 to 40 concurrent user) will it work. How to open and maintain more connection at a time like connection pool. So user cat receive fast response.. regards On Fri, Jun 17, 2011 at 12:50 PM, Ahmet Arslan wrote: > > SolrServer server = new CommonsHttpSolrServer(URL); > > > > t

Jonty Rhods wants to chat

2011-06-19 Thread Jonty Rhods
--- Jonty Rhods wants to stay in better touch using some of Google's coolest new products. If you already have Gmail or Google Talk, visit: http://mail.google.com/mail/b-26ddccf9dc-56859aec19-TvU2zC9tjv8Q_u4jzhyceWuZkgs You'll ne

Re: about the SolrServer server = new CommonsHttpSolrServer(URL);

2011-06-19 Thread Ahmet Arslan
> for heavy use (30 to 40 concurrent > user) will it work. > How to open and maintain more connection at a time like > connection pool. So > user cat receive fast response.. It uses HttpClient under the hood. You can pass httpClient to its constructor too. It seems that MultiThreadedHttpConnectio

Weird optimize performance degradation

2011-06-19 Thread Santiago Bazerque
Hello! Here is a puzzling experiment: I build an index of about 1.2MM documents using SOLR 3.1. The index has a large number of dynamic fields (about 15.000). Each document has about 100 fields. I add the documents in batches of 20, and every 50.000 documents I optimize the index. The first 10

Re: Is it true that I cannot delete stored content from the index?

2011-06-19 Thread François Schiettecatte
That is correct, but you only need to commit, optimize is not a requirement here. François On Jun 18, 2011, at 11:54 PM, Mohammad Shariq wrote: > I have define in my solr and Deleting the docs from solr using > this uniqueKey. > and then doing optimization once in a day. > is this right way to

"site:" feature in Solr?

2011-06-19 Thread Gabriele Kahlout
Hello, Beside creating an index with just the site in question, is it possible like with Google to search for results only in a given domain? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains "[LON]" or the addressee acknowledges the receipt within 48 hours th

Re: "site:" feature in Solr?

2011-06-19 Thread Ahmet Arslan
> Beside creating an index with just the site in question, is > it possible like > with Google to search for results only in a given domain? If you have an appropriate field that is indexed, yes. fq=site:foo.com http://wiki.apache.org/solr/CommonQueryParameters#fq

example doesnt run from source?

2011-06-19 Thread Jason Toy
I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started SocketConnector@0.0.0.0:8983 But then when I go to my browser and go to this address: http://localhost:8983/solr/ I get a 404 error. What

Re: example doesnt run from source?

2011-06-19 Thread Stefan Matheis
Jason, which source did you use for the checkout and how did you build solr? Regards Stefan Am 19.06.2011 15:00, schrieb Jason Toy: I'm trying to run the example app from the svn source, but it doesn't seem to work. I am able to run : java -jar start.jar and Jetty starts with: INFO::Started So

Re: Multiple indexes

2011-06-19 Thread lee carroll
your data is being used to build an inverted index rather than being stored as a set of records. de-normalising is fine in most cases. what is your use case which requires a normalised set of indices ? 2011/6/18 François Schiettecatte : > You would need to run two independent searches and then 'jo

Re: Weird optimize performance degradation

2011-06-19 Thread Erick Erickson
First, there's absolutely no reason to optimize this often, if at all. Older versions of Lucene would search faster on an optimized index, but this is no longer necessary. Optimize will reclaim data from deleted documents, but is generally recommended to be performed fairly rarely, often at off-pea

Re: Is it true that I cannot delete stored content from the index?

2011-06-19 Thread Erick Erickson
That'll work, but you could just as easily simply add the document. Solr will take care of deleting any other documents with the same as a document being added automatically. Optimizing once a day is reasonable, but note that about all you're doing here is reclaiming some space. So if you only do

Re: example doesnt run from source?

2011-06-19 Thread Erick Erickson
Right, run "ant example" first to build the example code. You have to run it from the /solr directory. Best Erick On Sun, Jun 19, 2011 at 9:00 AM, Jason Toy wrote: > I'm trying to run the example app from the svn source, but it doesn't seem > to work. I am able to run : > java -jar start.jar > a

Re: Optimize taking two steps and extra disk space

2011-06-19 Thread Michael McCandless
With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for optimize (maxMergeAtOnceExplicit)... so you could eg

Re: Weird optimize performance degradation

2011-06-19 Thread Santiago Bazerque
Hello Erick, thanks for your answer! Yes, our over-optimization is mainly due to paranoia over these strange commit times. The long optimize time persisted in all the subsequent commits, and this is consistent with what we are seeing in other production indexes that have the same problem. Once the

Solr Multithreading

2011-06-19 Thread Rahul Warawdekar
Hi, I am currently working on a search based project which involves indexing data from a SQL Server database including attachments using DIH. For indexing attachments (varbinary DB objects), I am using TikaEntityProcessor. I am trying to use the multithreading to speed up the indexing but it seem

fq vs adding to query

2011-06-19 Thread Jamie Johnson
Are there any hard and fast rules about when to use fq vs adding to the query? For instance if I started with a search of camera then wanted to add another keyword say digital, is it better to do q=camera AND digital or q=camera&fq=digital I know that fq isn't taken into account when doing h

Re: fq vs adding to query

2011-06-19 Thread Mohammad Shariq
fq is filter-query, search based on category, timestamp, language etc. but I dont see any performance improvement if use 'keyword' in fq. useCases : fq=lang:English&q=camera AND digital OR fq=time:[13023567 TO 13023900]&q=camera AND digital On 19 June 2011 20:17, Jamie Johnson wrote: > Are the

Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Yuriy Akopov
Hi, This is my first post here so excuse me please if it is not really related. At the moment I'm using Solr 1.4.1 with SOLR-236 (https://issues.apache.org/jira/browse/SOLR-236) patch applied to support field collapsing. One of the mandatory fields of documents indexed is generated from the

Re: Optimize taking two steps and extra disk space

2011-06-19 Thread Shawn Heisey
On 6/19/2011 7:32 AM, Michael McCandless wrote: With LogXMergePolicy (the default before 3.2), optimize respects mergeFactor, so it's doing 2 steps because you have 37 segments but 35 mergeFactor. With TieredMergePolicy (default on 3.2 and after), there is now a separate merge factor used for op

Re: Weird optimize performance degradation

2011-06-19 Thread Mohammad Shariq
I also have the solr with around 100mn docs. I do optimize once in a week, and it takes around 1 hour 30 mins to optimize. On 19 June 2011 20:02, Santiago Bazerque wrote: > Hello Erick, thanks for your answer! > > Yes, our over-optimization is mainly due to paranoia over these strange > commit

Re: fq vs adding to query

2011-06-19 Thread Markus Jelsma
If you wan't to make good use of the filter cache then use filter queries. > fq is filter-query, search based on category, timestamp, language etc. but > I dont see any performance improvement if use 'keyword' in fq. > > useCases : > fq=lang:English&q=camera AND digital > OR > fq=time:[13023567 T

Re: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Shawn Heisey
On 6/19/2011 9:32 AM, Yuriy Akopov wrote: For 3.2, I can't see a similar build option. First, there is no release-3.2 folder, so I tried to checkout http://svn.apache.org/repos/asf/lucene/dev/trunk supposing this is the latest stable release (and I might be wrong there). However, there is no "

Re: fq vs adding to query

2011-06-19 Thread Shawn Heisey
On 6/19/2011 10:00 AM, Markus Jelsma wrote: If you wan't to make good use of the filter cache then use filter queries. Additionally, information in filter queries will not affect relevancy ranking. If you want the terms you are using to affect the document scores, include them in the main qu

Re: query about the server configuration

2011-06-19 Thread Ranveer
Please help I am also in same situation. regards On Sunday 19 June 2011 12:59 PM, Jonty Rhods wrote: Dear all, I am quite new and not work on solr for heavy request. I have following server configuration: 16GB RAM 16 CPU I need to index update in every minutes and at least more than 5000 d

Re: about the SolrServer server = new CommonsHttpSolrServer(URL);

2011-06-19 Thread Ranveer
thanks.. however few more query. How to maintain connections threads (max and min settings)? What would be ideal setting for max in setMaxConnectionsPerHost method. Will it be ok for 30 to 40 concurrent user. How thread will be maintain for MultiThreadedHttpConnectionManager class. On Sunday

Re: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Yuriy Akopov
In the checked out lucene (either trunk or one of the 3.x branches) source is a solr/ directory. You just cd into that directory, and dist-war becomes a build option. Thanks, Shawn! That worked and by invoking dist-war build I have received apache-solr-4.0-SNAPSHOT.war file successfully - but

RE: Building Solr 3.2 from sources - can't get war

2011-06-19 Thread Steven A Rowe
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_2/ > -Original Message- > From: Yuriy Akopov [mailto:ako...@hotmail.co.uk] > Sent: Sunday, June 19, 2011 4:38 PM > To: solr-user@lucene.apache.org > Subject: Re: Building Solr 3.2 from sources - can't get war > > > In the chec

Re: Solr and Tag Cloud

2011-06-19 Thread Alexey Serba
Consider you have multivalued field _tag_ related to every document in your corpus. Then you can build tag cloud relevant for all data set or specific query by retrieving facets for field _tag_ for "*:*" or any other query. You'll get a list of popular _tag_ values relevant to this query with occur

paging and maintaingin a cursor just like ScrollableResultSet

2011-06-19 Thread Hiller, Dean x66079
As you probably know, using Query in hibernate/JPA gets slower and slower each page since it starts all over on the index tree :( WHILE ScrollableResultSet does NOT because the database maintains a cursor into the index that just picks up where it left off so as you go to the next page, next pag

Re: Why are not query keywords treated as a set?

2011-06-19 Thread lee carroll
do you mean a phrase query? "past past" can you give some more detail? On 18 June 2011 13:02, Gabriele Kahlout wrote: > q=past past > > 1.0 = (MATCH) sum of: > *  0.5 = (MATCH) fieldWeight(content:past in 0), product of:* >   1.0 = tf(termFreq(content:past)=1) >   1.0 = idf(docFreq=1, maxDocs=2)

Re: paging and maintaingin a cursor just like ScrollableResultSet

2011-06-19 Thread Michael Sokolov
One technique I've used to page through huge result sets that could help: if you have a sortable key (like an id), you can just fetch all docs, sorted by the key, and then on subsequent page requests use the last value from the previous page as a filter in a range term like: id:[ TO *] where

Re: solr highliting feature

2011-06-19 Thread Jan Høydahl
Hi, First, you should consider SolrJ API if you're working from Java/JSP. Then, say you want to highlight title. In you loop across the N hits, instead of pulling the title from the hits themselves, check if you find a highlighted result with the same ID in the section. -- Jan Høydahl, search

why too many open files?

2011-06-19 Thread Jason, Kim
Hi, All I have 12 shards and ramBufferSizeMB=512, mergeFactor=5. But solr raise java.io.FileNotFoundException (Too many open files). mergeFactor is just 5. How can this happen? Below is segments of some shard. That is too many segments over mergFactor. What's wrong and How should I set the mergeFa

Re: query about the server configuration

2011-06-19 Thread Jonty Rhods
I forgot an important point that I need to commit the server in 2 to 5 minutes.. please help.. regards On Sun, Jun 19, 2011 at 11:29 PM, Ranveer wrote: > Please help I am also in same situation. > > regards > > > > On Sunday 19 June 2011 12:59 PM, Jonty Rhods wrote: > >> Dear all, >> >> I am

score of Infinity on dismax query

2011-06-19 Thread Chris Book
Hello, I have a solr search server running and in at least one very rare case, I'm seeing a strange scoring result. The following example will cause solr to return a score of "Infinity": Query: {!dismax tie=0.1 qf=lyrics pf=lyrics ps=5}drugs the drugs Here is the debug output: Infinity = (MATCH)

Re: Why are not query keywords treated as a set?

2011-06-19 Thread Gabriele Kahlout
past past *past past* *content:past content:past* I was expecting the query to get parsed into content:past only and not content:past content:past. On Mon, Jun 20, 2011 at 12:12 AM, lee carroll wrote: > do you mean a phrase query? "past past" > can you give some more detail? > > On 18 June 2011

Re: score of Infinity on dismax query

2011-06-19 Thread Robert Muir
This is a bug, thanks for including all the information necessary to reproduce! https://issues.apache.org/jira/browse/LUCENE-3215 On Sun, Jun 19, 2011 at 10:24 PM, Chris Book wrote: > Hello, I have a solr search server running and in at least one very rare > case, I'm seeing a strange scoring re

Re: solr highliting feature

2011-06-19 Thread Romi
yes, I find title in section. If i am getting results say by parsing json object then do i need to parse ? - Thanks & Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/solr-highliting-feature-tp3079239p3084890.html Sent from the Solr - User mailing list archive

Re: solr highliting feature

2011-06-19 Thread Jan Høydahl
Perhaps I don't understand your question right, but if you're working with the json response format, yes, you need to pull the highlighted version of the field from the highlighting section. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraini

Request handle solrconfig.xml Spellchecker

2011-06-19 Thread Romi
I am trying to set up spellchecker, according to solr documentation. But when I am testing, I don't have any suggestion. My piece of code follows: textSpell solr.IndexBasedSpellChecker default name ./spellchecker explicit d

Re: why too many open files?

2011-06-19 Thread Mark Schoy
Hi, did you have checked the max opened files of your OS? see: http://lj4newbies.blogspot.com/2007/04/too-many-open-files.html 2011/6/20 Jason, Kim > Hi, All > > I have 12 shards and ramBufferSizeMB=512, mergeFactor=5. > But solr raise java.io.FileNotFoundException (Too many open files). > m

Re: Why are not query keywords treated as a set?

2011-06-19 Thread lee carroll
this might help in your analysis chain http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory On 20 June 2011 04:21, Gabriele Kahlout wrote: > past past > *past past* > *content:past content:past* > > I was expecting the query to get parsed into con