RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
Yonik, I already tried with around 200M doc in a desktop type box with 2Gb memory. The simple queries (like getting data for a date range, queries without wild card etc.) are working fine within the level of response time 10-20 secs, provided the number of records hit is low (within couple of 1

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
Ah sorry, I had misread your original post. 3-6M docs per hour can be challenging. Using the CSV loader, I've indexed 4000 docs per second (14M per hour) on a 2.6GHz Athlon, but they were relatively simple and small docs. On Fri, Nov 28, 2008 at 9:54 PM, souravm <[EMAIL PROTECTED]> wrote: > There

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
Hi Yonik, There is a case where I'm expecting at peak season around 36M doc per day, at hourly level peaking to 2-3M per hr. Now I need to do some processing of those docs before I index them. Then based on the performance figure of indexing I saw in http://wiki.apache.org/solr/SolrPerformanceF

Re: Function Queries

2008-11-28 Thread Yonik Seeley
On Fri, Nov 28, 2008 at 8:33 PM, outre <[EMAIL PROTECTED]> wrote: > > Hi, > > I was wondering if function queries are supported in SOLR1.3? > > I looked thru http://wiki.apache.org/solr/FunctionQuery, and tried to run an > example on my SOLR setup. It doesn't seem though that _val_ hook has any > e

Re: range queries on string field with millions of values

2008-11-28 Thread Yonik Seeley
On Wed, Nov 26, 2008 at 5:43 PM, Naomi Dushay <[EMAIL PROTECTED]> wrote: > sortCallNum["A123 B34 1970" TO *]&rows=10. If you really just want to get call numbers X through X+10, then you are in luck: https://issues.apache.org/jira/browse/SOLR-877 http://wiki.apache.org/solr/TermsComponent But loo

Re: range queries on string field with millions of values

2008-11-28 Thread Walter Underwood
Do you want to page through all items or through the result of a query (like all hits for "civil war" in call number order). If you want the former, then a text search engine is really the wrong tool. This problem only requires indexed sequential file formats (like B-trees), something that worked

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
The indexing rate you need to achieve should be equal to the rate that new documents are produced. It shouldn't matter much how long it takes to index 3-6M documents the first time (within reason), given that you only need to do it once/occasionally. What is that rate (i.e. why do you think you c

Function Queries

2008-11-28 Thread outre
Hi, I was wondering if function queries are supported in SOLR1.3? I looked thru http://wiki.apache.org/solr/FunctionQuery, and tried to run an example on my SOLR setup. It doesn't seem though that _val_ hook has any effect on sorting, and "score" parameter doesn't seem to return computed values.

Re: range queries on string field with millions of values

2008-11-28 Thread Naomi Dushay
Gosh, I'm sorry to be so unclear. Hmm. Trying to clarify below: On Nov 28, 2008, at 3:52 PM, Chris Hostetter wrote: Having read through this thread, i'm not sure i understand what exactly the problem is. my naive understanding is... 1) you want to sort by a field 2) you want to be able t

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
Hi Yonik, Let me explain why I thought using hadoop will help in achieving the parallel indexing better. Here are the set of requirements and constraints - 1. The 3-6M documents (around 300 to 600 MB size) would belong to the same schema 2. The resulting index of those 3-6M documents has to re

Re: range queries on string field with millions of values

2008-11-28 Thread Chris Hostetter
Having read through this thread, i'm not sure i understand what exactly the problem is. my naive understanding is... 1) you want to sort by a field 2) you want to be able to "paginate" through all docs in order of this field. 3) you want to be able to start your pagination at any arbitrary val

Re: facet.sort and distributed search

2008-11-28 Thread Chris Hostetter
: Thanks a lot for your answer : I've tried the patch and it works fine. please add your comments/opinions to hte Jira issue -- knowing people have tested out patches helps vet them for being committed. -- Grégoire Neuville -Hoss

Re: omiting no price documents when sorting on price

2008-11-28 Thread Chris Hostetter
: when a product doesn't have a price, I index the price as 0. When sorting on : price, these values come up first or last. How can you omit these items when : sorting against price. FWIW: i'm not sure if by "omit these items" you really mean "don't include them in the results at all" .. in whi

Re: Problem generating summaries for redirected url´s

2008-11-28 Thread Chris Hostetter
I think you are more likely to get more feedback fro mthe [EMAIL PROTECTED] list ... i don't think anyone here can tell you how Nutch generates summaries. : I would like to know what is the way Nutch generates summaries, why it : leaves them empty when redirecting. Perharps there is a command t

Re: Compiling Solr 1.3.0 + KStem

2008-11-28 Thread Chris Hostetter
: /usr/local/build/apache-solr-1.3.0/src/java/org/apache/solr/analysis/ : KStemFilterFactory.java:63: : cannot find symbol : [javac] symbol : method : init(org.apache : .solr.core.SolrConfig,java.util.Map) : [javac] location: class org.apache.solr.analysis.BaseTokenFilterFactory : [ja

Re: Sorting and JVM heap size ....

2008-11-28 Thread Chris Hostetter
: Subject: Sorting and JVM heap size : In-Reply-To: <[EMAIL PROTECTED]> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you

Re: Using Solr with Hadoop ....

2008-11-28 Thread Yonik Seeley
While future Solr-hadoop integration is a definite possibility (and will enable other cool stuff), it doesn't necessarily seem needed for the problem you are trying to solve. > indexing them in parallel is not an option as my target doc size per hr > itself can be very huge (3-6M) I'm not sure I

[RESULTS] Community Logo Preferences

2008-11-28 Thread Ryan McKinley
Check the results from the poll: http://people.apache.org/~ryan/solr-logo-results.html The obvious winner is: https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png But since things are never simple given the similarity of this logo to solaris logo: http://to

Re: Stuck threads on Weblogic

2008-11-28 Thread Alexander Ramos Jardim
Thanks for the answering, I isolated the problem and discovered it's not Solr's fault. It has something to do with the way I am manipulating the data, as the thread stays more than 10 minutes solely executing the same loop in some situations. 2008/11/28 Bill Au <[EMAIL PROTECTED]> > Take a threa

Re: range queries on string field with millions of values

2008-11-28 Thread Glen Newton
Hi Naomi, Try fixing your data. :-) No, really: 1 - Sort all of your call numbers using whatever sort makes sense to you. 2 - Assign them - in your sort order - sort keys that are floats, starting: 0.01 0.02 ... 1.01 1.02 ... 79,999.98 79,999.99 This should ap

Re: range queries on string field with millions of values

2008-11-28 Thread Naomi Dushay
The point isn't really how the exact sort works - it's the performance issues, coupled with an unpredictable distribution along the entire possible sort space. the sort works the range queries work the performance sucks and I haven't thought of a clever work around. - Naomi On Nov 27, 2008

Using Solr with Hadoop ....

2008-11-28 Thread souravm
Hi All, I have huge number of documents to index (say per hr) and within a hr I cannot compete it using a single machine. Having them distributed in multiple boxes and indexing them in parallel is not an option as my target doc size per hr itself can be very huge (3-6M). So I am considering usi

Re: Stuck threads on Weblogic

2008-11-28 Thread Bill Au
Take a thread dump of the JVM next time it is stuck. That will tell you where and why the threads are stuck. Bill On Tue, Nov 25, 2008 at 4:15 PM, Alexander Ramos Jardim < [EMAIL PROTECTED]> wrote: > Hello guys, > > I am getting some stuck threads on my application when it connects to Solr. > Th

Re: Phrase query search with stopwords

2008-11-28 Thread Yonik Seeley
See https://issues.apache.org/jira/browse/SOLR-879 we never enabled position increments in the query parser. -Yonik On Mon, Nov 24, 2008 at 9:48 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Ack! I tried it too, and it failed for me also. > The analysis page indicates that the tokens are all in

Re: PatternReplaceFilterFactory and html tag

2008-11-28 Thread Antonio Zippo
I think my problem has been solved using (for whitespaces and html tag and (for all non alphanumeric chars) it's true? Da: Antonio Zippo <[EMAIL PROTECTED]> A: solr-user@lucene.apache.org Inviato: Venerdì 28 novembre 2008, 17:

PatternReplaceFilterFactory and html tag

2008-11-28 Thread Antonio Zippo
Hi all, i've a text field with some html code ex. "blablabla hi this is a paragraph bbb" i need to exclude theese tag into the index or query so i think i need to use a PatternReplaceFilterFactory this filter is to exclude all chars different from a-zA-Z0-9 (so i can exclude punctuation,

Oracle Clob support

2008-11-28 Thread Joel Karlsson
Hello, Is there any built in support for indexing Oracle-databases with columns of type Clob? I've declared a field in schema.xml of type text. But after indexing, this field contains only the string representation of the object oracle.sql.Clob, and not the content of the column. // Joel

Re: Mock solr server

2008-11-28 Thread Robert Young
Will look into it, thanks. On Fri, Nov 28, 2008 at 9:01 AM, Erik Hatcher <[EMAIL PROTECTED]>wrote: > In solr-ruby there is a basic "mock" Solr server implementation: > > < > http://svn.apache.org/viewvc/lucene/solr/trunk/client/ruby/solr-ruby/test/unit/solr_mock_base.rb?view=markup > > > > It's

Re: Stuck threads on Weblogic

2008-11-28 Thread Geoff Hopson
Dunno if it helps, but I have had some issues with Solr running standalone using the BEA JRockit JVM with Solr hanging - when I switched to Sun's JVM, hanging disappeared completely. Not investigated the cause etc, so who knows the reason. YMMV with swapping out JRockit with Sun when running WebLo

Re: data import handler - going deeper...

2008-11-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
The extension points are well documented as Entityprocessor, DataSource and Transformers Adding fieldboost is a planned item. It must work as follows . Add a special value $fieldBoost. to the row map And DocBuilder should respect that. You can raise a bug and we can commit it soon. On Fri,

data import handler - going deeper...

2008-11-28 Thread Marc Sturlese
Hey there, After developing my own extends classes from sqlentityprocesor, jdbcdatasource and transformer I have my customized dataimporthandler almost working. I have to reach one more goal. In one hand I don't always have to index all the fields from my db row. For example fields from db that

Re: Mock solr server

2008-11-28 Thread Erik Hatcher
In solr-ruby there is a basic "mock" Solr server implementation: It's used to test some core response handling routines, like this:

Re: Mock solr server

2008-11-28 Thread Robert Young
I'm not using Java unfortunately. Is there anything that allows me to interact with it much like a normal mock object, setting expectations and return values? On Fri, Nov 28, 2008 at 12:06 AM, Jeryl Cook <[EMAIL PROTECTED]> wrote: > are you trying to unit test something? I would simply make use