Re: SolrIndexWriter holding reference to deleted file?

2008-01-03 Thread amamare

I haven't been able to get a profiler at the server yet, but I thought I
might show how my code works, because it's quite different from the example
in the link you provided...


public synchronized ResultItem[] search(String query) throws
CorruptIndexException, IOException{
  SolrIndexSearcher searcher = new SolrIndexSearcher(solrCore.getSchema(),
"MySearcher", solrCore.getIndexDir(), true);
  Hits hits = search(searcher, query);
  for(int i =0; i < hits.length(); i++){
   parse(hits.doc(i));
   //add to result-array
  }
  searcher.close();
  //return result-array
}

private Hits search(SolrIndexSearcher searcher, String pQuery){
  try {
SolrQueryParser parser = new SolrQueryParser(solrCore.getSchema(), 
"text");
//default search field is called "text"
Query query = parser.parse(pQuery);
return searcher.search(query);
  }
  //catch exceptions
}


This is the code that does the searching. The searcher is passed as a
parameter to the search-method, because it needs to be open while I'm
parsing the documents in the hits. I know I should move the closure of the
search-operation to a finally-block, will do that in any case, but I doubt
it will solve the problem because I've never had any exceptions in this
code. Might the problem be that I'm not using SolrQueryRequest objects?

Best regards, 


Yonik Seeley wrote:
> 
> This is probably related to "using Solr/Lucene embeddedly"
> See the warning at the top of http://wiki.apache.org/solr/EmbeddedSolr
> 
> It does sound like your SolrIndexSearcher objects aren't being closed.
> Solr (via SolrCore) doesn't rely on garbage collection to close the
> searchers (since gc unfortunately can't be triggered by low
> descriptors).  SolrIndexSearcher objects are reference counted and
> closed when no longer in use.  This means that SolrQueryRequest
> objects must always be closed or the refcount will be off.
> 
> Not sure where you could start except perhaps trying to verify the
> number of live SolrIndexSearcher objects.
> 
> -Yonik
> 
> On Dec 20, 2007 8:20 AM, amamare <[EMAIL PROTECTED]> wrote:
>>
>> I have an application consisting of three web applications running on
>> JBoss
>> 1.4.2 on a Linux Redhat server. I'm using Solr/Lucene embeddedly to
>> create
>> and maintain a frequently updated index. Once updated, the index is
>> copied
>> to another directory used for searching. Old index-files in the search
>> directory are then deleted. The streams used to copy the files are closed
>> in
>> finally-blocks. After a few days an IOException occurs because of "too
>> many
>> open files". When I run the linux command
>>
>> ls -l /proc/26788/fd/
>>
>> where 26788 is jboss' process id, it gives me a seemingly ever-increasing
>> list of deleted files (1 per update since I optimize on every update and
>> use
>> compound file format), marked with 'deleted' in parantheses. They are all
>> located in the search directory. From what I understand this means that
>> something still holds a reference to the file, and that the file will be
>> permanently deleted once this something loses its reference to it.
>>
>> Only SolrIndexSearcher objects are in direct contact with these files in
>> the
>> search application. The searchers are local objects in search-methods,
>> and
>> are closed after every search operation. In theory, the garbage collector
>> should collect these objects later (though while profiling other
>> applications I've noticed that it often doesn't garbage collect until the
>> allocated memory starts running out).
>>
>> The other objects in contact with the files are the FileOutputStreams
>> used
>> to copy them, but as stated above, these are closed in finally-blocks and
>> thus should hold no reference to the files.
>>
>> I need to get rid of the "too many open files"-problem. I suspect that it
>> is
>> related to the almost-deleted files in the proc-dir, but I know too
>> little
>> of Linux to be sure. Does the problem ring a bell to anyone, or do you
>> have
>> any ideas as to how I can get rid of the problem?
>>
>> All help is greatly appreciated.
>> --
>> View this message in context:
>> http://www.nabble.com/SolrIndexWriter-holding-reference-to-deleted-file--tp14436326p14436326.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SolrIndexWriter-holding-reference-to-deleted-file--tp14436326p14594325.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

2008-01-03 Thread Geert-Jan Brits
Hi Otis,

after some thought (I must have been sleeping or something)  it seems that
it is indeed possible to remove the 2000 product-variant fields from the
index and store them in an external store. I was doubting this option before
as I mistakingly thought that I would still need to have the 2000 stored
fields in place to store the product-variant keys for accessing the
database. However I have some way of identifying the product-variants
client-side, once Solr returns the products.

This however makes that an external datastore must have 1 row per
product-variant. Having an upper-range of about 200.000 products and up to
2000 product variants per product this would give a maximum of
400.000.000product-variant records in the external datastore. I really
don't have a
clue about possible performance given these numbers but it sounds rather
large to me, although it may sound peanuts to you ;-) . The query would be
to return 10 rows based on 10 product-variant id's. Any rough guestimates
whether this sounds doable? I guess I'm just going to find out.

Thanks for helping me think out of the box!

Geert-Jan

2008/1/2, Otis Gospodnetic <[EMAIL PROTECTED]>:
>
> Maybe I'm not following your situation 100%, but it sounded like pulling
> the values of purely stored fields is the slow part. *Perhaps* using a
> non-Lucene data store just for the saved fields would be faster.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
> From: Geert-Jan Brits <[EMAIL PROTECTED] >
> To: solr-user@lucene.apache.org
> Sent: Monday, December 31, 2007 8:49:43 AM
> Subject: Re: big perf-difference between solr-server vs. SOlrJ 
> req.process(solrserver)
>
>
> Hi Otis,
>
> I don't really see how this would minimize my number of fields.
> At the moment I have 1 pricefield (stored / indexed) and 1 multivalued
> field
> (stored) per  product-variant. I have about 2000 product variants.
>
> I could indeed replace each multivalued field by a singlevaluedfield
> with an
> id pointing to a external store, where I get the needed fields. However
> this
> would not change the number of fields in my index (correct?) and thus
> wouldn't matter for the big scanning-time I'm seeing. Moreover, it
> wouldn't
> matter for the query-time either I guess.
>
> Thanks,
> Geert-Jan
>
>
>
>
>
> 2007/12/29, Otis Gospodnetic < [EMAIL PROTECTED]>:
> >
> > Hi Geert-Jan,
> >
> > Have you considered storing this data in an external data store and
> not
> > Lucene index?  In other words, use the Lucene index only to index the
> > content you need to search.  Then, when you search this index, just
> pull out
> > the single stored fields, the unique ID for each of top N hits, and
> use
> > those ID to pull the actual content for display purposes from the
> external
> > store.  This external store could be a RDBMS, an ODBMS, a BDB, etc.
>   I've
> > worked with very large indices where we successfully used BDBs for
> this
> > purpose.
> >
> > Otis
> >
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> > - Original Message 
> > From: Geert-Jan Brits < [EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Thursday, December 27, 2007 11:44:13 AM
> > Subject: Re: big perf-difference between solr-server vs. SOlrJ
> req.process
> > (solrserver)
> >
> > yeah, that makes sense.
> > so, in in all, could scanning all the fields and loading the 10
> fields
> > add
> > up to cost about the same or even more as performing the intial
> query?
> > (Just
> > making sure)
> >
> > I am wondering if the following change to the schema would help in
> this
> > case:
> >
> > current setup:
> > It's possible to have up to 2000 product-variants.
> > each product-variant has:
> > - 1 price field (stored / indexed)
> > - 1 multivalued field which contains product-variant characteristics
> > (strored / not indexed).
> >
> > This adds up to the 4000 fields described. Moreover there are some
> > fields on
> > the product level but these would contibute just a tiny bit to the
> > overall
> > scanning / loading costs (about 50 -stored and indexed- fields in
> > total)
> >
> > possible new setup (only the changes) :
> > - index but not store the price-field.
> > - store the price as just another one of the product-variant
> > characteristics
> > in the multivalued product-variant field.
> >
> > as a result this would bring back the maximum number of stored fields
> > to
> > about 2050 from 4050 and thereby about halving scanning / loading
> costs
> > while leaving the current quering-costs intact.
> > Indexing costs would increase a bit.
> >
> > Would you expect the same performance gain?
> >
> > Thanks,
> > Geert-Jan
> >
> > 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>:
> > >
> > > On Dec 27, 2007 11:01 AM, Britske < [EMAIL PROTECTED]> wrote:
> > > > after inspecting solrconfig.xml I see that I already have enabled
> > lazy
> > > field
> > > > loading by:
> > > > true (I guess it
> > was
> > > > enabl

Field collapsing

2008-01-03 Thread Doug Steigerwald
Being able to collapse multiple documents into one result with Solr is a big deal for us here.  Has 
anyone been able to get field collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch to 
a recent checkout of Solr?  I've been unsuccessful so far in trying to modify the latest patch to work.


Thanks.
Doug


Re: correct escapes in csv-Update files

2008-01-03 Thread Yonik Seeley
CSV doesn't use backslash escaping.
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

"This is text with a ""quoted"" string"

-Yonik

On Jan 2, 2008 8:21 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote:
> I use UpdateCSV to feed my data into SOLR and it works very well. The
> only thing I don't understand is how to properly escape the encapsulator
> and the backslash.
> An example with the default encapsulator ("):
> "This is a text with a \"quote\""
> "This gives one \ backslash"
> "This gives two backslashes before the \\\"quote\""
> "This gives an error \\"quote\""
>
> So what if I want only one backslash before the quote, e.g. the
> unescaped data looks like this:
> Text with \"funny characters
> (a real backslash before a real quote not an escaped quote)
>
> I know this isn't common and perhaps it would be possible to find an
> encapsulator that will be very, very unlikely to be found in the data
> but you can never be sure.
> So is there a way to correctly escape or otherwise encode all possible
> combinations of special characters?
>
> -Michael
>
>


Re: Leading WildCard in Query

2008-01-03 Thread Yonik Seeley
On Dec 12, 2007 6:51 AM, Michael Kimsal <[EMAIL PROTECTED]> wrote:
> Please vote for SOLR-218.  I'm not aware of any other way to accomplish the
> leading wildcard functionality that would be convenient.  SOLR-218 is not
> asking that it be enabled by default, only that it be functionality that is
> exposed to SOLR admins via config.xml.

I'm actually still in favor of it being enabled by default.
There are a lot of ways to make really slow queries, and it's not
Solr's job to protect against these IMO (that's the job of the app
that uses Solr).  Preventing a leading wildcard simply reduces
functionality.

-Yonik


Re: Backup of a Solr index

2008-01-03 Thread Jörg Kiegeland

Charlie Jackson wrote:
Solr indexes are file-based, so there's no need to "dump" the index to a file. 
  
But however one has first to shutdown the Solr server before copying the 
index folder?


In terms of how to create backups and move those backups to other servers, check out this page http://wiki.apache.org/solr/CollectionDistribution. 
  
It notes a script "abc", but I cannot find it in my Solr distribution 
(nightly build)? Run those scripts on Windows XP?




RE: Backup of a Solr index

2008-01-03 Thread Charlie Jackson
> But however one has first to shutdown the Solr server before copying the 
index folder?

If you want to copy the hard files from the data/index directory, yes, you'll 
probably want to shut down the server first. You may be able to get away with 
leaving the server up but stopping any index/commit operations, but I could be 
wrong.

> It notes a script "abc", but I cannot find it in my Solr distribution 
(nightly build)?

All of the collection distribution scripts can be found in src/scripts in the 
nightly build if they aren't in the bin directory of the example solr 
directory. 

> Run those scripts on Windows XP?

No, unfortunately the Collection Distribution scripts won't work in Windows 
because they use Unix filesystem trickery to operate. 


-Original Message-
From: Jörg Kiegeland [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 03, 2008 11:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Backup of a Solr index

Charlie Jackson wrote:
> Solr indexes are file-based, so there's no need to "dump" the index to a 
> file. 
>   
But however one has first to shutdown the Solr server before copying the 
index folder?

> In terms of how to create backups and move those backups to other servers, 
> check out this page http://wiki.apache.org/solr/CollectionDistribution. 
>   
It notes a script "abc", but I cannot find it in my Solr distribution 
(nightly build)? Run those scripts on Windows XP?



Re: Field collapsing

2008-01-03 Thread Grant Ingersoll

Hi Doug,

Is the problem in applying the patch or getting it to work once it is  
applied?


-Grant

On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote:

Being able to collapse multiple documents into one result with Solr  
is a big deal for us here.  Has anyone been able to get field  
collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch  
to a recent checkout of Solr?  I've been unsuccessful so far in  
trying to modify the latest patch to work.


Thanks.
Doug




Re: Field collapsing

2008-01-03 Thread Ryan McKinley
I think the last patch is pre QueryComponent infrastructure  it 
needs to be transformed into a QueryComponent to work.


I don't think anyone has tackled that yet...

ryan


Doug Steigerwald wrote:
Modifying the patch to apply.  StandardRequestHandler and 
DisMaxRequestHandler were changed a lot in mid-November and I've been 
having a hard time figuring out where the changes should be reapplied.


Doug

Grant Ingersoll wrote:

Hi Doug,

Is the problem in applying the patch or getting it to work once it is 
applied?


-Grant

On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote:

Being able to collapse multiple documents into one result with Solr 
is a big deal for us here.  Has anyone been able to get field 
collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch 
to a recent checkout of Solr?  I've been unsuccessful so far in 
trying to modify the latest patch to work.


Thanks.
Doug






Re: Performance stats for indeces with over 10MM documents

2008-01-03 Thread Walter Underwood
I had exactly the same thought. That query is not an information
retrieval (text search) query. It is data retrieval and would
work great on a relational database.

wunder

On 1/2/08 9:53 PM, "John Stewart" <[EMAIL PROTECTED]> wrote:

> Alex,
> 
> Not to be a pain, but the response I had when looking at the query
> was, why not do this in a SQL database, which is designed precisely to
> process this sort of request at speed?  I've noticed that people
> sometimes try to get Solr to act as a generalized information store --
> I'm not sure that's what you're doing, but be aware of this pitfall.
> 
> jds
> 
> On Jan 3, 2008 12:52 AM, Alex Benjamen <[EMAIL PROTECTED]> wrote:
>> Mike,
>> 
>> Thanks for the input, it's really valueable. Several forum users have
>> suggested using fq to separate
>> the caching of filters, and I can immediately see how this would help. I'm
>> changing the code right now
>> and going to run some benchmarks, hopefully see a big gain just from that
>> 
>> 
>>> - use range queries when querying contiguous disjunctions (age:[28 TO 33]
>>> rather than what you have above).
>> I actually started with the above, using int type field, and it somehow
>> seemed slower than using explicit, but I will
>> certainly try again.
>> 
>> 
>>>  - convert the expensive, heap-based age filter disjunction into a bitset
>>> created directly from the term enum
>> Can you pls. elaborate a little more? Are you advising to use fq=age:[28 TO
>> 33], or should that simply be part
>> of the regular query? Also, what is the best "type" to use when defining age?
>> I'm currently using "text", should
>> I use "int" instead... I didn't see any difference with using the type "int".
>> 
>> One of the issues is that the age ranges are not "pre-defined" - they can be
>> any combination, 22-23, 22-85, 45-49, etc.
>> I realize that pre-defining age ranges would drastically improve performance
>> but then we're greatly reducing the value
>> of this type of search
>> 
>> Thanks,
>> Alex
>> 



Re: Field collapsing

2008-01-03 Thread Doug Steigerwald
Modifying the patch to apply.  StandardRequestHandler and DisMaxRequestHandler were changed a lot in 
mid-November and I've been having a hard time figuring out where the changes should be reapplied.


Doug

Grant Ingersoll wrote:

Hi Doug,

Is the problem in applying the patch or getting it to work once it is 
applied?


-Grant

On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote:

Being able to collapse multiple documents into one result with Solr is 
a big deal for us here.  Has anyone been able to get field collapsing 
(http://issues.apache.org/jira/browse/SOLR-236) to patch to a recent 
checkout of Solr?  I've been unsuccessful so far in trying to modify 
the latest patch to work.


Thanks.
Doug


RE: Performance stats for indeces with over 10MM documents

2008-01-03 Thread Alex Benjamen
we currently use a relational system, and it doesn't perform. Also, even though
a lot of our queries are structured, we do combine them with text search, so 
for instance, there could be an additional clause which is a free text search 
for 
a favorite TV show

--

I had exactly the same thought. That query is not an information
retrieval (text search) query. It is data retrieval and would
work great on a relational database.

wunder





Re: Mixing adds, deletes and commit in the same message

2008-01-03 Thread Mike Klaas

On 3-Jan-08, at 11:38 AM, Leonardo Santagada wrote:

I tried to put some adds and deletes in the same request to solr  
but it didn't work, have I done something wrong or this is really  
not suported?


It isn't supported.

-Mike


Re: Solr RPS is painfully low

2008-01-03 Thread Chris Hostetter

: fq=gender:f&fq=( friends:y )&fq= country:us&fq= age:(18 || 19 || 20 ||
: 21)&fq=photos:y

that would be my suggestion based on waht i'm guessing your typical use 
cases are ... but it's really hard to infer patterns from only a single 
example URL.

the queryResultCache isn't nearly as interesting in cases like this as the 
filterCache is ... your filterCache doesn't even need to be very big to 
give you huge wins for the type of use cases i'm guessing you have.



-Hoss



RE: Solr RPS is painfully low

2008-01-03 Thread Chris Hostetter

: I'm only requesting 20 rows, and I'm not specifically sorting by any field. 
Does solr
: automatically induce sort by default, and if so, how do I disable it?

default sorting is by score, which is cheap ... walter's question was 
mainly to verify that you are not sorting sice it is expensive (we 
have to make guesses as to what might be causing you problems in 
the absence of seeing your configs or full URLs)

-Hoss



Mixing adds, deletes and commit in the same message

2008-01-03 Thread Leonardo Santagada
I tried to put some adds and deletes in the same request to solr but  
it didn't work, have I done something wrong or this is really not  
suported?


This is one example:



document
one9a10b11c12d
Test Document





Thanks in advance
[]'s
--
Leonardo Santagada





Re: Mixing adds, deletes and commit in the same message

2008-01-03 Thread Ryan McKinley

You can commit after a update command by adding a request parameter:

/update?commit=true
POST: your xml ...

ryan


Mike Klaas wrote:

On 3-Jan-08, at 11:38 AM, Leonardo Santagada wrote:

I tried to put some adds and deletes in the same request to solr but 
it didn't work, have I done something wrong or this is really not 
suported?


It isn't supported.

-Mike





Re: Field collapsing

2008-01-03 Thread Doug Steigerwald
I finally took more than 30 minutes to try and apply the patch and got it to (mostly) work.  Will 
try to submit it tomorrow for review if there's interest.


Doug

Ryan McKinley wrote:
I think the last patch is pre QueryComponent infrastructure  it 
needs to be transformed into a QueryComponent to work.


I don't think anyone has tackled that yet...

ryan


Doug Steigerwald wrote:
Modifying the patch to apply.  StandardRequestHandler and 
DisMaxRequestHandler were changed a lot in mid-November and I've been 
having a hard time figuring out where the changes should be reapplied.


Doug

Grant Ingersoll wrote:

Hi Doug,

Is the problem in applying the patch or getting it to work once it is 
applied?


-Grant

On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote:

Being able to collapse multiple documents into one result with Solr 
is a big deal for us here.  Has anyone been able to get field 
collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch 
to a recent checkout of Solr?  I've been unsuccessful so far in 
trying to modify the latest patch to work.


Thanks.
Doug




Re: Field collapsing

2008-01-03 Thread Ryan McKinley

excellent!  Yes, there is interest.


Doug Steigerwald wrote:
I finally took more than 30 minutes to try and apply the patch and got 
it to (mostly) work.  Will try to submit it tomorrow for review if 
there's interest.


Doug

Ryan McKinley wrote:
I think the last patch is pre QueryComponent infrastructure  it 
needs to be transformed into a QueryComponent to work.


I don't think anyone has tackled that yet...

ryan


Doug Steigerwald wrote:
Modifying the patch to apply.  StandardRequestHandler and 
DisMaxRequestHandler were changed a lot in mid-November and I've been 
having a hard time figuring out where the changes should be reapplied.


Doug

Grant Ingersoll wrote:

Hi Doug,

Is the problem in applying the patch or getting it to work once it 
is applied?


-Grant

On Jan 3, 2008, at 8:52 AM, Doug Steigerwald wrote:

Being able to collapse multiple documents into one result with Solr 
is a big deal for us here.  Has anyone been able to get field 
collapsing (http://issues.apache.org/jira/browse/SOLR-236) to patch 
to a recent checkout of Solr?  I've been unsuccessful so far in 
trying to modify the latest patch to work.


Thanks.
Doug








Configure solr on tomcat with different indexes

2008-01-03 Thread Laxmilal Menaria
Hello,

I have configured solr with tomcat for multiple webapp. This configuration
use common index, so now I want to configure solr on different Indexes with
tomcat, Please let me how it is possible.

-- 
Thanks,
Laxmilal menaria

http://www.chambal.com/
http://www.minalyzer.com/
http://www.bucketexplorer.com/


Re: Configure solr on tomcat with different indexes

2008-01-03 Thread Ryan McKinley

http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac

using different values for solr home should give you new indexes for each.

ryan

Laxmilal Menaria wrote:

Hello,

I have configured solr with tomcat for multiple webapp. This configuration
use common index, so now I want to configure solr on different Indexes with
tomcat, Please let me how it is possible.





Re: Configure solr on tomcat with different indexes

2008-01-03 Thread Laxmilal Menaria
I have tried with solr1.xml and add a solr/home in that, but after that its
not showing any results, because its search by default in
Tomcat\solr\data\index.

LM

On 1/4/08, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
>
> http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac
>
> using different values for solr home should give you new indexes for each.
>
> ryan
>
> Laxmilal Menaria wrote:
> > Hello,
> >
> > I have configured solr with tomcat for multiple webapp. This
> configuration
> > use common index, so now I want to configure solr on different Indexes
> with
> > tomcat, Please let me how it is possible.
> >
>
>


-- 
Thanks,
Laxmilal menaria

http://www.chambal.com/
http://www.minalyzer.com/
http://www.bucketexplorer.com/