Re: Newbie problem ordering results

2009-08-11 Thread Avlesh Singh
> > The error is thrown by Lucene. Actually, multi valued fields are not very > different from tokenized fields. Multiple values are indexed with their > respective token positions differing by the positionIncrementGap value as > specified in schema. > I truly understand that. But I guess, that is

Re: Newbie problem ordering results

2009-08-11 Thread Shalin Shekhar Mangar
On Tue, Aug 11, 2009 at 9:56 PM, Avlesh Singh wrote: > > However the RuntimeException that Solr throws has a misleading error > message > - "... but it's impossible to sort on tokenized fields". The field in this > case is untokenized. > The error is thrown by Lucene. Actually, multi valued fiel

Re: Indexing date into multiple fields

2009-08-11 Thread Shalin Shekhar Mangar
On Wed, Aug 12, 2009 at 7:15 AM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > Am very new to SOLR, so this question may seem overly basic - > > In schema.xml, I have a date field type - > > omitNorms="true"/> > > used by - > > >multiValued="true" /> > > and want to

Re: build.xml errors

2009-08-11 Thread viorelhojda
Thank you for your reply. You were right, I could run the targets from build.xml :working:. All the best, Viorel Shalin Shekhar Mangar wrote: > > On Tue, Aug 11, 2009 at 1:31 PM, viorelhojda > wrote: > >> >> Hello. I've downloaded SOLR using the SVN and Eclipse IDE. After setting >> up >> the

Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-11 Thread Grant Ingersoll
Is there a time of day you could schedule merges? See http://www.lucidimagination.com/search/document/bd53b0431f7eada5/concurrentmergescheduler_and_mergepolicy_question Or, you might be able to implement a scheduler that only merges the small segments, and then does the larger ones at slow ti

Indexing date into multiple fields

2009-08-11 Thread Bernadette Houghton
Am very new to SOLR, so this question may seem overly basic - In schema.xml, I have a date field type - used by - and want to create additional indexes based on variations of the date, e.g. year, decade (which can then be used as facets). I'm assuming I need to set up another f

Re: Functions in search result

2009-08-11 Thread Chris Hostetter
: As far as I know, functions are executed on a per-document/field basis. : That is, I don't think any of them aggregate numeric field values from a : result set. correct. it sounds like what you are looking for is the StatsComponent... http://wiki.apache.org/solr/StatsComponent : >

Re: DIH problem passing HTTP parameters into data-config

2009-08-11 Thread John Lowe
Oops, the url attribute of the element in the snippet should read: url="${dataimporter.request.feed}" to match the http parameter... John

Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Jason Rutherglen
> 1 minute of document updates (about 100,000 documents) and then SOLR stops 100,000 docs in a minute is a lot. Lucene is probably automatically flushing to disk and merging which is tying up the IO subsystem. You may want to set the ConcurrentMergeScheduler to 1 thread (which in Solr cannot be do

DIH problem passing HTTP parameters into data-config

2009-08-11 Thread John Lowe
I've read the documentation as carefully as I can, but I must be missing something. I'm running Solr 1.3. The doc sez that I can pass my own parameters in to DIH via the HTTP request: http://wiki.apache.org/solr/DataImportHandler#head-520f8e527d9da55e8ed1e274e29709c8805c8eae What I'd like

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-11 Thread Fuad Efendi
Forgot to add: committing only once a day I tried mergeFactor=1000 and performance of index write was extremely good (more than 50,000,000 updates during part of a day) However, "commit" was taking 2 days or more and I simply killed process (suspecting that it can break my harddrive); I had about

Using Lucene's payload in Solr

2009-08-11 Thread Bill Au
It looks like things have changed a bit since this subject was last brought up here. I see that there are support in Solr/Lucene for indexing payload data (DelimitedPayloadTokenFilterFactory and DelimitedPayloadTokenFilter). Overriding the Similarity class is straight forward. So the last piece o

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-11 Thread Fuad Efendi
Never tried profiling; 3000-5000 docs per second if SOLR is not busy with segment merge; During segment merge 99% CPU, no disk swap; I can't suspect I/O... During document updates (small batches 100-1000 docs) only 5-15% CPU -server 2048Gb option of JVM (which is JRockit) + 256M for RAM Buffer

Re: Trouble with Shingle filter and query parsing / expansion

2009-08-11 Thread Mark Bennett
One other idea I tried, which didn't work, was to see if I could get proper parsing via the stream arg: http://localhost:8983/solr/mlt?stream.body=hello+world&mlt.fl=shingle_field&mlt.mintf=0&debugQuery=true On Tue, Aug 11, 2009 at 9:09 AM, Mark Bennett wrote: > I've got an index building with

RE: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Fuad Efendi
Hi Jason, I am using Master/Slave (two servers); I monitored few hours today - 1 minute of document updates (about 100,000 documents) and then SOLR stops for at least 5 minutes to do background jobs like RAM flush, segment merge... Documents are small; about 10Gb of total index size for 50,000,00

Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-11 Thread Grant Ingersoll
Have you tried profiling? How often are you committing? Have you looked at Garbage Collection or any of the usual suspects like that? On Aug 11, 2009, at 4:49 PM, Fuad Efendi wrote: In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM Buffer Flash / Segment Merge per 1 min

Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-11 Thread Fuad Efendi
In a heavily loaded Write-only Master SOLR, I have 5 minutes of RAM Buffer Flash / Segment Merge per 1 minute of (heavy) batch document updates. I am using mergeFactor=100 etc (I already posted message...) So that... I can't see hardware is a problem: with more CPU and faster RAID-0 I'll get the

Re: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Jason Rutherglen
Fuad, The lock indicates to external processes the index is in use, meaning it's not cause ConcurrentMergeScheduler to block. ConcurrentMergeScheduler does merge in it's own thread, however if the merges are large then they can spike IO, CPU, and cause the machine to be somewhat unresponsive. Wh

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-11 Thread Mark Bennett
Thanks Grant. *** mlb: comments inline On Tue, Aug 11, 2009 at 12:40 PM, Grant Ingersoll wrote: > Inline... > > On Aug 11, 2009, at 12:44 PM, Mark Bennett wrote: > > I'm going somewhere with this... be patient. :-) I had asked about this >> briefly at the SF meetup, but there was a lot going

Re: ArrayIndexOutOfBounds on Some Searches

2009-08-11 Thread Stephen Duncan Jr
On Tue, Aug 11, 2009 at 2:09 PM, Stephen Duncan Jr wrote: > This is with trunk for Solr 1.4. It happened both with a build from 1 week > ago as well as with a build from today, so I'm not sure if it's something > recent, or even if it would happen on Solr 1.3 or not. Here's the stack > trace in

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-11 Thread Grant Ingersoll
Inline... On Aug 11, 2009, at 12:44 PM, Mark Bennett wrote: I'm going somewhere with this... be patient. :-) I had asked about this briefly at the SF meetup, but there was a lot going on. 1: Suppose you had Solr 1.4 and all the Carrot^2 DOCUMENT clustering was all in, and you had built

ArrayIndexOutOfBounds on Some Searches

2009-08-11 Thread Stephen Duncan Jr
This is with trunk for Solr 1.4. It happened both with a build from 1 week ago as well as with a build from today, so I'm not sure if it's something recent, or even if it would happen on Solr 1.3 or not. Here's the stack trace indicating that a value looping around from Integer.MAX_VALUE to Integ

FW: NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Fuad Efendi
Most probably I need to play around UpdateHandler(s); I am using DirectUpdateHandler with allowDuplicates = false: solrj.SolrServer.add(docs, overwrite=true) Use case: I have a timestamp on a document; documents in an index get expired by timestamp; same document could be added to the index many

Re: Building documents using content residing both in database tables and text files

2009-08-11 Thread Sascha Szott
Hi Noble, Noble Paul wrote: isn't it possible to do this by having two datasources (one Js=dbc and another File) and two entities . The outer entity can read from a DB and the inner entity can read from a file. Yes, it is. Here's my db-data-config.xml file:

NativeFSLockFactory, ConcurrentMergeScheduler: why locks?

2009-08-11 Thread Fuad Efendi
1. I always have files lucene--write.lock and lucene--n-write.lock which I believe shouldn't be used with NativeFSLockFactory 2. I use mergeFactor=100 and ramBufferSizeMB=256, few GB indes size. I tried mergeFactor=10 and mergeFactor=1000. It seems ConcurrentMergeSchedul

Re: Solr 1.4 Clustering / mlt AS search?

2009-08-11 Thread Mark Bennett
With regards my second question, re. More Like this, I do see: "The MoreLikeThisHandler can also use a ContentStream to find similar documents. It will extract the "interesting terms" from the posted text." at http://wiki.apache.org/solr/MoreLikeThisHandler and that it uses the TF/IDF stuff. Still

Solr 1.4 Clustering / mlt AS search?

2009-08-11 Thread Mark Bennett
I'm going somewhere with this... be patient. :-) I had asked about this briefly at the SF meetup, but there was a lot going on. 1: Suppose you had Solr 1.4 and all the Carrot^2 DOCUMENT clustering was all in, and you had built the cluster index for all your docs. 2: Then, if you had a particula

Re: Newbie problem ordering results

2009-08-11 Thread Avlesh Singh
Ahhh, I should have seen this first. Your "contributororder" field is multi-valued, you cannot sort on that field. However the RuntimeException that Solr throws has a misleading error message - "... but it's impossible to sort on tokenized fields". The field in this case is untokenized. Cheers Av

Re: Searching for reservations/availability with Solr

2009-08-11 Thread Shalin Shekhar Mangar
On Tue, Aug 11, 2009 at 7:08 PM, Constantijn Visinescu wrote: > > > Room1 > 2000-08-01T00:00:00Z > 2000-08-31T23:59:59Z > > > Room2 > 2000-08-01T00:00:00Z > 2000-08-13T23:59:59Z > 2000-08-20T00:00:00Z > 2000-08-22T23:59:59Z > > > Now i want to run a query that gives me all documents(rooms) tha

Trouble with Shingle filter and query parsing / expansion

2009-08-11 Thread Mark Bennett
I've got an index building with the shingle filter and I can see the compound terms with Luke, etc. So far so good. One detail, I did tell it to not emit unigrams - I've got single words covered in a normal field. And a bit of poking around the other day explained why shingle queries weren't wor

Re: Searching for reservations/availability with Solr

2009-08-11 Thread Avlesh Singh
>From what I understood, you need a day level granularity (i.e booked on 15th, 16th and 17th of August) in your indexes. If this is true, then why even store a "date"? For your use case, I think this should suffice - Each document will have values like these for this particular field - reserved_d

Re: build.xml errors

2009-08-11 Thread Shalin Shekhar Mangar
On Tue, Aug 11, 2009 at 1:31 PM, viorelhojda wrote: > > Hello. I've downloaded SOLR using the SVN and Eclipse IDE. After setting up > the classpath and everything i've managed to have no errrors in the code > files. The problem is that the BUILD XML files (build.xml, common-build.xml > etc) are f

Re: faceting/searching on multi-valued fields

2009-08-11 Thread Avlesh Singh
> > Let's say I have a dynamic field defined as type="string" > > Can I use those fields at query time, although they are not defined in > schema.xml? > Yes. Though I am not sure whether you can create a dynamic field without a prefix of suffix with the wild-card. I would rather suggest to name t

Re: Multiple Unique Ids

2009-08-11 Thread Shalin Shekhar Mangar
On Mon, Aug 10, 2009 at 2:05 PM, Ninad Raut wrote: > Hi, > I have two Ids DocumentId and AuthorId. I want both of them unique. Can i > have two in my document? > id > authorId > No. You can have only one uniqueKey in solrconfig.xml. But during indexing you can create the uniqueKey value as "id

Re: Newbie problem ordering results

2009-08-11 Thread Germán Biozzoli
Sure The strange thing is that I could sort by another fields that is defined using string, but not by another defined as some tokenized field and after that copied as string. I attach the schema.xml for the case is there another error and the error log says the following INFO: UnInverted mult

Re: Building documents using content residing both in database tables and text files

2009-08-11 Thread Noble Paul നോബിള്‍ नोब्ळ्
isn't it possible to do this by having two datasources (one Js=dbc and another File) and two entities . The outer entity can read from a DB and the inner entity can read from a file. On Tue, Aug 11, 2009 at 8:05 PM, Sascha Szott wrote: > Hello, > > is it possible (and if it is, how can I accompli

Building documents using content residing both in database tables and text files

2009-08-11 Thread Sascha Szott
Hello, is it possible (and if it is, how can I accomplish it) to configure DIH to build up index documents by using content that resides in different data sources? Here is an example scenario: Let's assume we have a table T with two columns, ID (which is the primary key of T) and TITLE. Furt

faceting/searching on multi-valued fields

2009-08-11 Thread AHMET ARSLAN
I have two parallel multivauled fields for holding key value pairs for each document. red other VS 10 cm. 50 GB ... Color Type Brand Size RAM There ara about 300 different keys

Searching for reservations/availability with Solr

2009-08-11 Thread Constantijn Visinescu
Hello, I have a problem i'm trying to solve where i want to check if objects are reserved or not. (by reservation i mean like making a reservation at a hotel, because you would like to stay there on certain dates). I have the following in my schema.xml and the follwoing 2 documents in Solr

Re: Retrieving the boost factor using Solrj CommonsHttpSolrServer

2009-08-11 Thread Avlesh Singh
> > The boost factor is available in the SolrInputDocument, but not in the > SolrDocument returned by the SolrServer 'query' method > Yes, you are right. There seems to be an inconsistency. And there is no relationship between the SolrInputDocument and the > SolrDocument (... which in itself is pr

Re: Querying Dynamic Fields.. simple query not working

2009-08-11 Thread Avlesh Singh
SOLR-1129 was for a different use case, Ninad. I have created an issue for this enhancement - https://issues.apache.org/jira/browse/SOLR-1357 Cheers Avlesh On Tue, Aug 11, 2009 at 12:09 PM, Ninad Raut wrote: > Hi, > SOLR-1129 seems to have > been

Re: How can i get lucene index format version information?

2009-08-11 Thread Licinio Fernández Maurelo
Thanks all for your responses, what i expect to get is the index format version as it appears in luke's overview tab (index format : -9 (UNKNOWN) 2009/7/31 Jay Hill : > Check the system request handler: http://localhost:8983/solr/admin/system > > Should look something like this: > > 1.3.0.2009.

build.xml errors

2009-08-11 Thread viorelhojda
Hello. I've downloaded SOLR using the SVN and Eclipse IDE. After setting up the classpath and everything i've managed to have no errrors in the code files. The problem is that the BUILD XML files (build.xml, common-build.xml etc) are full of errros (couple of hundreds), such as: Attribute "defa