Re: order of word in the request

2009-02-27 Thread sunnyfr

Thanks Yonik,



Yonik Seeley-2 wrote:
> 
> On Thu, Feb 26, 2009 at 11:25 AM, sunnyfr  wrote:
>> How can I tell it to put a lot of more weight for the book which has
>> exactly
>> the same title.
> 
> A sloppy phrase query should work.
> See the "pf" param in the dismax query parser.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/order-of-word-in-the-request-tp7783p22241361.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCoreAware analyzer

2009-02-27 Thread Bojan Šmid
Thanks for you suggestions.

I do need SolrCore, but I could probably live with just SolrResourceLoader,
while also creating my own FieldType (which can be ResourceLoaderAware).

Bojan


On Thu, Feb 26, 2009 at 11:48 PM, Chris Hostetter
wrote:

>
> : I am writing a custom analyzer for my field type. This analyzer would
> need
> : to use SolrResourceLoader and SolrConfig, so I want to make it
> : SolrCoreAware.
>
> 1) Solr's support for using Analyzer instances is mainly just to make it
> easy for people who already have existing ANalyzer impls that they want to
> use -- if you're writing something new, i would suggest implementing the
> TokenizerFactory API.
>
> 2) Do you really need access to the SolrCore, or do you just need access
> to the SolrResourceLoader?  Because there is also the ResourceLoaderAware
> API.  If you take a look at StopFilterFactory you can see an example of
> how it's used.
>
> FWIW: The reasons Solr doesn't support SolrCoreAware Analysis related
> plugins (TokenizerFactory and TokenFilterFactory) are:
>
> a. it kept the initalization a lot simpler.  currently SOlrCore knows
> about the INdexSchema, but the IndexSchema doesnt' know anythng about the
> SolrCore.
> b. it allows for more reuse of the schema related code independent of the
> rest of Solr (there was talk at one point of promoting all of the
> IndexSchema/FieldType/Token*Factory code into a Lucene-Java contrib but
> so far no one has steped up to work out the refactoring)
>
>
> -Hoss
>
>


Search in two core of solr with a single search query

2009-02-27 Thread Sagar Khetkade

Hi,

 

I have a issue here as i want to search a query. That query would be fired in 
two core of solr having different indexes and would merge the result set. This 
was possible in Lucene using multisearcher and then merging the result. Please 
suggest me how can i do it in solr.

 

Thanks,

Sagar Khetakde

_
Wish to Marry Now? Join MSN Matrimony FREE!
http://www.in.msn.com/matrimony

Re: Direct control over document position in search results

2009-02-27 Thread Erik Hatcher


On Feb 23, 2009, at 7:46 PM, Ercan, Tolga wrote:
I was wondering if there was any facility to directly manipulate  
search results based on business criteria to place documents at a  
fixed position in those results. For example, when I issue a query,  
the first four results would be based on natural search relevancy,  
then the fifth result would be based on the most relevant document  
when doctype:video (if I had a doctype field of course), then  
results 6...* would resume natural search relevancy?


Yes, Query Elevation:


Or perhaps a variation on this, if the document where doctype:video  
would appear at a fixed position or better... For example, if  
somebody searched for "my widget video", there would be a relevant  
document at a higher position than #5...


I don't believe the query elevation component does this, but you could  
certainly do boosting to nudge particular document types scores around.


Erik



Re: Search schema using q Query

2009-02-27 Thread Erik Hatcher
One first step is to use debugQuery=true as an additional parameter to  
your search request.  That'll return debug info in the response, which  
includes a couple of views of the parsed query.


Erik

On Feb 26, 2009, at 2:05 AM, dabboo wrote:



Hi,

I am trying to search the schema with the q query parameter. Query  
which

gets form is:

+(programJacketImage_program_s:test | courseCodeSeq_course_s:test |
authorLastName_product_s:test | Index_Type_s:test |  
prdMainTitle_s:test^10.0

| discCode_course_s:test | sourceGroupName_course_s:test |
indexType_course_s:test | prdMainTitle_product_s:test |
isbn10_product_s:test | displayName_course_s:test |  
groupNm_program_s:test |

discipline_product_s:test | courseJacketImage_course_s:test |
imprint_product_s:test | introText_program_s:test |
productType_product_s:test | isbn13_product_s:test |
copyrightYear_product_s:test | prdPubDate_product_s:test |
programType_program_s:test | editor_product_s:test |
courseType_course_s:test | productURL_s:test^1.0 |
courseId_course_s:test | categoryIds_product_s:test |
indexType_program_s:test | strapline_product_s:test |
subCompany_course_s:test | aluminator_product_s:test |  
readBy_product_s:test

| subject_product_s:test | edition_product_s:test |
programId_program_s:test)~0.01 () all:english^90.0 all:hindi^123.0
all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0
all:erer^4545.0

This query is correct and returns the result also. I am looking for  
the
class file, where the actual searching is taking place. I want to  
see as how

it is interpreting the query and how it is returning the result.

I am trying to customize the searching logic for our specific needs.

Please help.

Thanks,
Amit Garg
--
View this message in context: 
http://www.nabble.com/Search-schema-using-q-Query-tp22218801p22218801.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Search in two core of solr with a single search query

2009-02-27 Thread Otis Gospodnetic

Sagar,

You can use DistributedSearch (check this on the Wiki) for that.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Sagar Khetkade 
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, February 27, 2009 3:29:57 AM
> Subject: Search in two core of solr with a single search query
> 
> 
> Hi,
> 
> 
> 
> I have a issue here as i want to search a query. That query would be fired in 
> two core of solr having different indexes and would merge the result set. 
> This 
> was possible in Lucene using multisearcher and then merging the result. 
> Please 
> suggest me how can i do it in solr.
> 
> 
> 
> Thanks,
> 
> Sagar Khetakde
> 
> _
> Wish to Marry Now? Join MSN Matrimony FREE!
> http://www.in.msn.com/matrimony



Re: warming question

2009-02-27 Thread Marc Sturlese

Hey, 
I am working with a nighlty and just had to apply the modifications in the
code and add a couple of lines in solrconfig.xml (as it's shown in the
patch). Didn't it work for you?

Jonathan Haddad wrote:
> 
> Does anyone have any good documentation that explains how to set up
> the warming feature within the config?
> 
> On Wed, Feb 25, 2009 at 11:58 AM, Marc Sturlese 
> wrote:
>>
>> Shalin your patch worked perfect for my use case.
>> Thank's both for the information!
>>
>>
>>
>> Amit Nithian wrote:
>>>
>>> I'm actually working on one for my company which parses our tomcat log
>>> files
>>> to obtain queries to feed as warming queries (since GET queries are the
>>> dominant source of queries) to the firstSearcher. I am not sure what the
>>> interface is in Solr 1.3, but in 1.2, I implemented the
>>> SolrEventListener
>>> interface and overrode the newSearcher method. If you look at the source
>>> for
>>> the default warmer, you should be able to construct a list of queries
>>> from
>>> a
>>> different source without much trouble.
>>> I might be able to send you some code if you need it.
>>>
>>> - Amit
>>>
>>> On Tue, Feb 24, 2009 at 10:15 AM, Marc Sturlese
>>> wrote:
>>>

 Hey there,
 Is there any dynamic way to specify the queries to do the warming? I
 mean,
 not writing them hardcoded in solrconfig.xml but getting them from a
 database or from another file??
 Thanks in advance
 --
 View this message in context:
 http://www.nabble.com/warming-question-tp22187322p22187322.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/warming-question-tp22187322p22210458.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Jonathan Haddad
> http://www.rustyrazorblade.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/warming-question-tp22187322p22242609.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Use of scanned documents for text extraction and indexing

2009-02-27 Thread Vikram Kumar
Check this: http://code.google.com/p/ocropus/wiki/FrequentlyAskedQuestions

> How well does it work?
>
The character recognition accuracy of OCRopus right now (04/2007) is about
> like Tesseract. That's because the only character recognition plug-in in
> OCRopus is, in fact, Tesseract. In the future, there will be additional
> character recognition plug-ins, both for Latin and for other character sets.
>
The big area of improvement relative to other open source OCR systems right
> now is in the area of layout analysis; in our benchmarks, OCRopus greatly
> reduces layout errors compared to other open source systems."
>
OCR is only a part of the solution with scanned documents. i.,e they
recognize text.

For structural/semantic understanding of documents, you need engines like
OCRopus that can do layout analysis and provide meaningful data for document
analysis and understanding.

>From their own Wiki:

Should I use OCRopus or Tesseract?
>
You might consider using OCRopus right now if you require layout analysis,
> if you want to contribute to it, if you find its output format more
> convenient (HTML with embedded OR information), and/or if you anticipate
> requiring some of its other capabilities in the future (pluggability,
> multiple scripts, statistical language models, etc.).
>
In terms of character error rates, OCRopus performs similar to Tesseract. In
> terms of layout analysis, OCRopus is significantly better than Tesseract.
>
The main reasons not to use OCRopus yet is that it hasn't been packaged yet,
> that it has limited multi-platform support, and that it runs somewhat
> slower. We hope to address all those issues by the beta release."
>


On Thu, Feb 26, 2009 at 11:35 PM, Shashi Kant  wrote:

> Can anyone back that up?
>
> IMHO Tesseract is the state-of-the-art in OCR, but not sure that "Ocropus
> builds on Tesseract".
> Can you confirm that Vikram has a point?
>
> Shashi
>
>
>
>
> - Original Message 
> From: Vikram Kumar 
> To: solr-user@lucene.apache.org; Shashi Kant 
> Sent: Thursday, February 26, 2009 9:21:07 PM
> Subject: Re: Use of scanned documents for text extraction and indexing
>
> Tesseract is pure OCR. Ocropus builds on Tesseract.
> Vikram
>
> On Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant 
> wrote:
>
> > Another project worth investigating is Tesseract.
> >
> > http://code.google.com/p/tesseract-ocr/
> >
> >
> >
> >
> > - Original Message 
> > From: Hannes Carl Meyer 
> > To: solr-user@lucene.apache.org
> > Sent: Thursday, February 26, 2009 11:35:14 AM
> > Subject: Re: Use of scanned documents for text extraction and indexing
> >
> > Hi Sithu,
> >
> > there is a project called ocropus done by the DFKI, check the online demo
> > here: http://demo.iupr.org/cgi-bin/main.cgi
> >
> > And also http://sites.google.com/site/ocropus/
> >
> > Regards
> >
> > Hannes
> >
> > m...@hcmeyer.com
> > http://mimblog.de
> >
> > On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
> > sithu.sudar...@fda.hhs.gov> wrote:
> >
> > >
> > > Hi All:
> > >
> > > Is there any study / research done on using scanned paper documents as
> > > images (may be PDF), and then use some OCR or other technique for
> > > extracting text, and the resultant index quality?
> > >
> > >
> > > Thanks in advance,
> > > Sithu D Sudarsan
> > >
> > > sithu.sudar...@fda.hhs.gov
> > > sdsudar...@ualr.edu
> > >
> > >
> > >
> >
> >
>
>


Re: uploading binary file or rich document through SOLRJ

2009-02-27 Thread Erik Hatcher
It actually is possible now to make that sort of request now... note  
that he's not actually posting file content in the request, but using  
a simple HTTP get parameter stream.file.  Using the SolrJ library, you  
can simply use the add/set methods on SolrQuery.


But yes, it would be great for SolrJ to support posting arbitrary  
streams.


Erwin, the rich content update handler patch that you are apparently  
using has been refactored and built into Solr's trunk now (nicknamed  
Solr Cell, aka ExtractingRequestHandler).   You'll be better off using  
the new stuff.


Erik

On Feb 25, 2009, at 10:51 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



it is possible , but with some work.

you may need to write a new RequestWriter implementation which extends
org.apache.solr.client.solrj.request.RequestWriter for that .

It will be a nice addition to SolrJ if it can be contributed back.



On Thu, Feb 26, 2009 at 9:04 AM, Erwin Lawardy   
wrote:


Hi All,

I have been uploading my rich document(pdf/doc/xls) though url and  
it works properly.

http://localhost:8983/solr/update/rich?stream.type=doc&stream.file=SOLR_HOME/test.pdf.doc&id=101&stream.fieldname=name&commit=true

Is there a way to do it through solrj as I am trying to build an  
application to post/upload it programatically.


Thanks,

Erwin


_
Get the most out of your life online! Click here for the latest  
news and tips.

http://livelife.ninemsn.com.au/




--
--Noble Paul




passing parameters into the XSLTResponseWriter: particularly hostname

2009-02-27 Thread Fergus McMenemie
Hello all,

I was wondering if there was a way of passing parameters into 
the XSLTResponseWriter writer.

I always like the option of formatting my search results as an 
RSS feed. Users can therefore configure their phone, browser etc
to automatically redo a search every so often and have new items
in the result set highlighted to them.

However many RSS clients require links to the underlying content 
to be absolute. So I need to pass in the full hostname, of the
machine serving the results, to the transform generating my RSS
feed. How do I do this?

Regards Fergus
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: custom reranking

2009-02-27 Thread Grant Ingersoll


On Feb 26, 2009, at 11:16 PM, CIF Search wrote:

I believe the query component will generate the query in such a way  
that i
get the results that i want, but not process the returned results,  
is that
correct? Is there a way in which i can group the returned results,  
and rank
each group separately, and return the results together. In other  
words which

component do I need to write to reorder the returned results as per my
requirements.


I'd have a look at what I did for the Clustering patch, i.e.  
SOLR-769.  It may even be the case that you can simply plugin your own  
SolrClusterer or whatever it's called.  Or, if it doesn't quite fit  
your needs, give me feedback/patch and we can update it.  I'm  
definitely open to ideas on it.






Also, the deduplication patch seems interesting, but it doesnt  
appear to be

expected to work across multiple shards.



Yeah, that does seem a bit tricky.  Since Solr doesn't support  
distributed indexing, it would be tricky to support just yet.




Regards,
CI

On Thu, Feb 26, 2009 at 8:03 PM, Grant Ingersoll  
wrote:




On Feb 26, 2009, at 6:04 AM, CIF Search wrote:

We have a distributed index consisting of several shards. There  
could be
some documents repeated across shards. We want to remove the  
duplicate
records from the documents returned from the shards, and re-order  
the

results by grouping them on the basis of a clustering algorithm and
reranking the documents within a cluster on the basis of log of a
particular
returned field value.




I think you would have to implement your own QueryComponent.   
However, you

may be able to get away with implementing/using Solr's FunctionQuery
capabilities.

FieldCollapsing is also a likely source of inspiration/help (
http://www.lucidimagination.com/search/?q=Field+Collapsing#/
s:email,issues)

As a side note, have you looked at
http://issues.apache.org/jira/browse/SOLR-769 ?

You might also have a look at the de-duplication patch that is  
working it's

way through dev: http://wiki.apache.org/solr/Deduplication




How do we go about achieving this? Should we write this logic by
implementing QueryResponseWriter. Also if we remove duplicate  
records, the
total number of records that are actually returned are less than  
what were

asked for in the query.

Regards,
CI



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using

Solr/Lucene:
http://www.lucidimagination.com/search




--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



ApacheCon Lucene Meetup

2009-02-27 Thread Grant Ingersoll
If you're in or around Amsterdam during the week of ApacheCon (Mar  
23-27), check out the Lucene Meetup we are organizing: http://wiki.apache.org/lucene-java/LuceneMeetupMarch2009



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



RE: Use of scanned documents for text extraction and indexing

2009-02-27 Thread Sudarsan, Sithu D.
 

Thanks to all who have responded (Hanners, Shashi, Vikram, Bastian,
Renaud and the rest).

Using OCRopus might provide the flexibility to use multi-column
documents and formatted ones.

Regarding literature on OCR, few follow up of the paper link provided
Renaud do exist, but could not locate anything significant.

I'll update if I can find something useful to report.



Sincerely,
Sithu 
sithu.sudar...@fda.hhs.gov
sdsudar...@ualr.edu

-Original Message-
From: Vikram Kumar [mailto:vikrambku...@gmail.com] 
Sent: Friday, February 27, 2009 5:44 AM
To: solr-user@lucene.apache.org; Shashi Kant
Subject: Re: Use of scanned documents for text extraction and indexing

Check this:
http://code.google.com/p/ocropus/wiki/FrequentlyAskedQuestions

> How well does it work?
>
The character recognition accuracy of OCRopus right now (04/2007) is
about
> like Tesseract. That's because the only character recognition plug-in
in
> OCRopus is, in fact, Tesseract. In the future, there will be
additional
> character recognition plug-ins, both for Latin and for other character
sets.
>
The big area of improvement relative to other open source OCR systems
right
> now is in the area of layout analysis; in our benchmarks, OCRopus
greatly
> reduces layout errors compared to other open source systems."
>
OCR is only a part of the solution with scanned documents. i.,e they
recognize text.

For structural/semantic understanding of documents, you need engines
like
OCRopus that can do layout analysis and provide meaningful data for
document
analysis and understanding.

>From their own Wiki:

Should I use OCRopus or Tesseract?
>
You might consider using OCRopus right now if you require layout
analysis,
> if you want to contribute to it, if you find its output format more
> convenient (HTML with embedded OR information), and/or if you
anticipate
> requiring some of its other capabilities in the future (pluggability,
> multiple scripts, statistical language models, etc.).
>
In terms of character error rates, OCRopus performs similar to
Tesseract. In
> terms of layout analysis, OCRopus is significantly better than
Tesseract.
>
The main reasons not to use OCRopus yet is that it hasn't been packaged
yet,
> that it has limited multi-platform support, and that it runs somewhat
> slower. We hope to address all those issues by the beta release."
>


On Thu, Feb 26, 2009 at 11:35 PM, Shashi Kant 
wrote:

> Can anyone back that up?
>
> IMHO Tesseract is the state-of-the-art in OCR, but not sure that
"Ocropus
> builds on Tesseract".
> Can you confirm that Vikram has a point?
>
> Shashi
>
>
>
>
> - Original Message 
> From: Vikram Kumar 
> To: solr-user@lucene.apache.org; Shashi Kant 
> Sent: Thursday, February 26, 2009 9:21:07 PM
> Subject: Re: Use of scanned documents for text extraction and indexing
>
> Tesseract is pure OCR. Ocropus builds on Tesseract.
> Vikram
>
> On Thu, Feb 26, 2009 at 12:11 PM, Shashi Kant 
> wrote:
>
> > Another project worth investigating is Tesseract.
> >
> > http://code.google.com/p/tesseract-ocr/
> >
> >
> >
> >
> > - Original Message 
> > From: Hannes Carl Meyer 
> > To: solr-user@lucene.apache.org
> > Sent: Thursday, February 26, 2009 11:35:14 AM
> > Subject: Re: Use of scanned documents for text extraction and
indexing
> >
> > Hi Sithu,
> >
> > there is a project called ocropus done by the DFKI, check the online
demo
> > here: http://demo.iupr.org/cgi-bin/main.cgi
> >
> > And also http://sites.google.com/site/ocropus/
> >
> > Regards
> >
> > Hannes
> >
> > m...@hcmeyer.com
> > http://mimblog.de
> >
> > On Thu, Feb 26, 2009 at 5:29 PM, Sudarsan, Sithu D. <
> > sithu.sudar...@fda.hhs.gov> wrote:
> >
> > >
> > > Hi All:
> > >
> > > Is there any study / research done on using scanned paper
documents as
> > > images (may be PDF), and then use some OCR or other technique for
> > > extracting text, and the resultant index quality?
> > >
> > >
> > > Thanks in advance,
> > > Sithu D Sudarsan
> > >
> > > sithu.sudar...@fda.hhs.gov
> > > sdsudar...@ualr.edu
> > >
> > >
> > >
> >
> >
>
>


Re: Custom Sorting

2009-02-27 Thread psyron

I was sucessful with your hint and just need to solve another problem:

The problem I have is that I have implemented a custome sorting by following
your advice to code a
QParserPlugin and to create a custom comparator as described in your book,
and it really works
But now I also would like to return those computed sort values by adding
them to the SolrQueryResponse.
I am calculating distances and would like to return the distance from the
origin for each search result.

In your book you describe that it is possible by using this lucene search
function:
TopFieldDocs docs = searcher.search(query, null, 3, sort);

and then to read the sort values:
FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0];
return -> fieldDoc.fields[0]

But how can I do this inside Solr?
I am using the default QueryComponent and of course I don’t want to make too
many changes, because
I don’t understand the inside of solr so much – it’s quite big and
complicated and I didn’t find many
documents explaining Solr.

Is there maybe a workaround? Can I just store all my sort values and add
them to the SolrQueryResponse
at the end?

Thanks,
Markus


Erik Hatcher wrote:
> 
> Markus,
> 
> A couple of code pointers for you:
> 
>* QueryComponent - this is where results are generated, it uses a  
> SortSpec from the QParser.
> 
>* QParser#getSort - creating a custom QParser you'll be able to  
> wire in your own custom sort
> 
> You can write your own QParserPlugin and QParser, and configure it  
> into solrconfig.xml and should be good to go.  Subclassing existing  
> classes, this should only be a handful of lines of code to do.
> 
>   Erik
> 
> 
> On Dec 16, 2008, at 3:54 AM, psyron wrote:
> 
>>
>> I have the same problem, also need to plugin my "customComparator",  
>> but as
>> there is no explanation of the framework, how a RequestHandler is  
>> working,
>> what comes in, what comes out ... just impossible!
>>
>> Can someone explain where i have to add which code, to just have the  
>> same
>> functionality as the StandardRequestHandler, but also adding a custom
>> sorting?
>>
>> Thanks,
>> Markus
>>
>>
>> hossman wrote:
>>>
>>>
>>> : Sort sort = new Sort(new SortField[]
>>> : { SortField.FIELD_SCORE, new SortField(customValue,
>>> SortField.FLOAT,
>>> : true) });
>>> : indexSearcher.search(q, sort)
>>>
>>> that appears to just be a sort on score withe a secondary reversed
>>> float sort on whatever field name is in the variable  
>>> "customValue" ...
>>> assuming hte field name is "FIELD" that's hte same thing as...
>>>   sort=score+asc,+FIELD+desc
>>>
>>> : Sort sort = new Sort(new SortField(customValue, customComparator))
>>> : indexSearcher.search(q, sort)
>>>
>>> this is using a custom SortComparatorSource -- code you (or someone  
>>> else)
>>> has written which is not part of Lucene and which tells lucene how to
>>> order the documents using whatever crazy logic it wants ... for  
>>> obvious
>>> reasons Solr can't do that same logic (since it doesn't know what  
>>> it is)
>>>
>>> although many things in Solr are easily customizable, just by  
>>> writting a
>>> little factory and configuring it by class name, i'm afraind
>>> SortComparatorSources aren't once of them.  You could write a custom
>>> RequestHandler which used your SortComparatorSource, or you could  
>>> write a
>>> custom FieldType that used it anything someone sorted on that  
>>> field ...
>>> but those are the best options i cna think of.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Custom-Sorting-tp1659p21029370.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-tp1659p22248512.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Lucene sync bottleneck?

2009-02-27 Thread Yonik Seeley
I'm using trunk, but I set a breakpoint on SegmentReader.isDeleted()
on an index with deletions, and I couldn't get it to be called.

numDocs : 26
maxDoc : 130
reader:SolrIndexReader{this=1935e6f,r=readonlymultisegmentrea...@1935e6f,segments=5}


-Yonik
http://www.lucidimagination.com


On Thu, Feb 26, 2009 at 4:55 PM, Matthew Runo  wrote:
> I see a ReadOnlySegmentReader now - we're on an optimized index now which
> gets around the isDeleted() check.
>
> (solr4, optimized)
> searcherName : searc...@260f8e27 main
> caching : true
> numDocs : 139583
> maxDoc : 139583
> readerImpl : ReadOnlySegmentReader
> readerDir :
> org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta-main/index
> indexVersion : 1233423823917
> openedAt : Thu Feb 26 13:29:25 PST 2009
> registeredAt : Thu Feb 26 13:29:42 PST 2009
> warmupTime : 16910
>
> (solr1, non optimized)
> searcherName : searc...@36be11a1 main
> caching : true
> numDocs : 139561
> maxDoc : 139591
> readerImpl : ReadOnlyMultiSegmentReader
> readerDir :
> org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta-main/index
> indexVersion : 1233423823924
> openedAt : Thu Feb 26 13:48:16 PST 2009
> registeredAt : Thu Feb 26 13:49:11 PST 2009
> warmupTime : 54785
>
> I did a thread dump against the optimized server just now, but didn't find
> anything blocked to check which reader was actually in use this time.
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
> On Feb 26, 2009, at 1:39 PM, Yonik Seeley wrote:
>
>> That's interesting.
>> We should be using read-only readers, which should not synchronize on
>> the deleted docs check.  But as your stack trace shows, you're using
>> SegmentReader and MultiSegmentReader.
>>
>> Right now, if I look at the admin/statistics page at the searcher, it
>> shows the following for the reader:
>>
>>
>> reader:SolrIndexReader{this=42f352,r=readonlymultisegmentrea...@42f352,segments=6}
>>
>> Hopefully the fact that it's a ReadOnlyMultiSegmentReader means that
>> it contains ReadOnlySegmentReader instances, which don't synchronize
>> on isDeleted.
>>
>> What do you see?
>>
>> -Yonik
>>
>> On Thu, Feb 26, 2009 at 4:09 PM, Matthew Runo  wrote:
>>>
>>> Hello folks!
>>>
>>> I was under the impression that this sync bottleneck was fixed in recent
>>> versions of Solr/Lucene, but we're seeing it with 1.4-dev right now. When
>>> we
>>> load test a server with >100 threads (using jmeter), we see several
>>> threads
>>> all blocked at the same spot:
>>>
>>> "http-8080-exec-505" - Thread t...@594
>>>  java.lang.Thread.State: BLOCKED on
>>> org.apache.lucene.index.segmentrea...@2b6f5d18 owned by:
>>> http-8080-exec-434
>>>       at
>>> org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:737)
>>>       at
>>>
>>> org.apache.lucene.index.MultiSegmentReader.isDeleted(MultiSegmentReader.java:266)
>>>       at
>>>
>>> org.apache.solr.search.function.FunctionQuery$AllScorer.next(FunctionQuery.java:118)
>>>       at
>>>
>>> org.apache.solr.search.function.FunctionQuery$AllScorer.skipTo(FunctionQuery.java:137)
>>>       at
>>>
>>> org.apache.lucene.search.BooleanScorer2$SingleMatchScorer.skipTo(BooleanScorer2.java:170)
>>>       at
>>> org.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java:76)
>>>       at
>>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:357)
>>>       at
>>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:320)
>>>       at
>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:136)
>>>       at org.apache.lucene.search.Searcher.search(Searcher.java:126)
>>>       at org.apache.lucene.search.Searcher.search(Searcher.java:105)
>>>       at
>>>
>>> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1231)
>>>       at
>>>
>>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:917)
>>>       at
>>>
>>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:338)
>>>       at
>>>
>>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:164)
>>>       at
>>>
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171)
>>>       at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>>       at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>>       at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>>       at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>       at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>       at
>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>       at
>>>
>>> org.apache.catalina.core.Standard

Re: Lucene sync bottleneck?

2009-02-27 Thread Matthew Runo

We're using:

Solr Specification Version: 1.3.0.2009.01.23.10.46.02
Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23  
10:46:02

Lucene Specification Version: 2.9-dev
Lucene Implementation Version: 2.9-dev 724059 - 2008-12-06 20:08:54

We'll see about getting up to trunk and firing off our load test and  
seeing if we can get it to happen with that.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 7:44 AM, Yonik Seeley wrote:


I'm using trunk, but I set a breakpoint on SegmentReader.isDeleted()
on an index with deletions, and I couldn't get it to be called.

numDocs : 26
maxDoc : 130
reader:SolrIndexReader 
{this=1935e6f,r=readonlymultisegmentrea...@1935e6f,segments=5}



-Yonik
http://www.lucidimagination.com


On Thu, Feb 26, 2009 at 4:55 PM, Matthew Runo   
wrote:
I see a ReadOnlySegmentReader now - we're on an optimized index now  
which

gets around the isDeleted() check.

(solr4, optimized)
searcherName : searc...@260f8e27 main
caching : true
numDocs : 139583
maxDoc : 139583
readerImpl : ReadOnlySegmentReader
readerDir :
org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta-main/index
indexVersion : 1233423823917
openedAt : Thu Feb 26 13:29:25 PST 2009
registeredAt : Thu Feb 26 13:29:42 PST 2009
warmupTime : 16910

(solr1, non optimized)
searcherName : searc...@36be11a1 main
caching : true
numDocs : 139561
maxDoc : 139591
readerImpl : ReadOnlyMultiSegmentReader
readerDir :
org.apache.lucene.store.NIOFSDirectory@/opt/solr-data/zeta-main/index
indexVersion : 1233423823924
openedAt : Thu Feb 26 13:48:16 PST 2009
registeredAt : Thu Feb 26 13:49:11 PST 2009
warmupTime : 54785

I did a thread dump against the optimized server just now, but  
didn't find

anything blocked to check which reader was actually in use this time.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 26, 2009, at 1:39 PM, Yonik Seeley wrote:


That's interesting.
We should be using read-only readers, which should not synchronize  
on

the deleted docs check.  But as your stack trace shows, you're using
SegmentReader and MultiSegmentReader.

Right now, if I look at the admin/statistics page at the searcher,  
it

shows the following for the reader:


reader:SolrIndexReader 
{this=42f352,r=readonlymultisegmentrea...@42f352,segments=6}


Hopefully the fact that it's a ReadOnlyMultiSegmentReader means that
it contains ReadOnlySegmentReader instances, which don't synchronize
on isDeleted.

What do you see?

-Yonik

On Thu, Feb 26, 2009 at 4:09 PM, Matthew Runo   
wrote:


Hello folks!

I was under the impression that this sync bottleneck was fixed in  
recent
versions of Solr/Lucene, but we're seeing it with 1.4-dev right  
now. When

we
load test a server with >100 threads (using jmeter), we see several
threads
all blocked at the same spot:

"http-8080-exec-505" - Thread t...@594
 java.lang.Thread.State: BLOCKED on
org.apache.lucene.index.segmentrea...@2b6f5d18 owned by:
http-8080-exec-434
  at
org 
.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java: 
737)

  at

org 
.apache 
.lucene 
.index.MultiSegmentReader.isDeleted(MultiSegmentReader.java:266)

  at

org.apache.solr.search.function.FunctionQuery 
$AllScorer.next(FunctionQuery.java:118)

  at

org.apache.solr.search.function.FunctionQuery 
$AllScorer.skipTo(FunctionQuery.java:137)

  at

org 
.apache 
.lucene 
.search 
.BooleanScorer2$SingleMatchScorer.skipTo(BooleanScorer2.java:170)

  at
org 
.apache.lucene.search.ReqOptSumScorer.score(ReqOptSumScorer.java: 
76)

  at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
357)

  at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
320)

  at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
136)
  at org.apache.lucene.search.Searcher.search(Searcher.java: 
126)
  at org.apache.lucene.search.Searcher.search(Searcher.java: 
105)

  at

org 
.apache 
.solr 
.search 
.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1231)

  at

org 
.apache 
.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java: 
917)

  at

org 
.apache 
.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:338)

  at

org 
.apache 
.solr 
.handler.component.QueryComponent.process(QueryComponent.java:164)

  at

org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:171)

  at

org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
  at

org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
303)

  at

org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
232)

  at

org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationF

Re: Lucene sync bottleneck?

2009-02-27 Thread Chris Hostetter

: Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23 10:46:02

that M indicates there were local modifications (relative svn version 
#737141) at the time of compilation.

Do you have some local patches?
anything that would have affected the way IndexReaders get opened?



-Hoss



Re: Lucene sync bottleneck?

2009-02-27 Thread Matthew Runo
We're just using an SVN up, with no local modifications. It's probably  
a formatting difference from having opened solr in an IDE.


We're building from lucene and solr trunk right now, and I'll let you  
all know how that goes. We'll test it as best we can with JMeter. The  
build we had up there was breaking with between 100 and 200  
simultaneous threads (due to blocking on isDeleted).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 8:01 AM, Chris Hostetter wrote:



: Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23  
10:46:02


that M indicates there were local modifications (relative svn version
#737141) at the time of compilation.

Do you have some local patches?
anything that would have affected the way IndexReaders get opened?



-Hoss





Re: Lucene sync bottleneck?

2009-02-27 Thread Matthew Runo

OK. Call me chicken little.

We must have had bad class files or something hanging out in our build  
that had the issues. Having built from trunk, we're seeing perfectly  
fine response times even at 500 requests a second.


Thank you for your help, and sorry to bring it up without testing trunk.

Thanks again for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 8:11 AM, Matthew Runo wrote:

We're just using an SVN up, with no local modifications. It's  
probably a formatting difference from having opened solr in an IDE.


We're building from lucene and solr trunk right now, and I'll let  
you all know how that goes. We'll test it as best we can with  
JMeter. The build we had up there was breaking with between 100 and  
200 simultaneous threads (due to blocking on isDeleted).


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 8:01 AM, Chris Hostetter wrote:



: Solr Implementation Version: 1.4-dev 737141M - root - 2009-01-23  
10:46:02


that M indicates there were local modifications (relative svn version
#737141) at the time of compilation.

Do you have some local patches?
anything that would have affected the way IndexReaders get opened?



-Hoss







Trunk Replication Page Issue

2009-02-27 Thread Jeff Newburn
In trying trunk to fix the Lucene Sync issue we have now encountered a
severed java exception making the replication page non functional.  Am I
missing something or doing something wrong?

Info:
Slave server on the replication page.  Just a code dump as follows.

Feb 27, 2009 8:44:37 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.jasper.JasperException: java.lang.NullPointerException
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:4
18)
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:337)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:290)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:206)
at 
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.
java:630)
at 
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDis
patcher.java:436)
at 
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatch
er.java:374)
at 
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher
.java:302)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
273)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
879)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
ttp11NioProtocol.java:719)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
2080)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
07)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
at 
org.apache.jsp.admin.replication.index_jsp._jspService(index_jsp.java:294)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
74)
... 24 more


-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



Re: warming question

2009-02-27 Thread Jonathan Haddad
I'm using the latest stable - I'm brand new to solr and I don't know
where to find all the docs yet.  I'm guessing I should be looking at
this page: 
http://wiki.apache.org/solr/SolrCaching#head-34647c63c38782b2fc93c919bb34f8c795a1ee65

I have an index of 1.5 million documents.  It's updated every few minutes.

On Fri, Feb 27, 2009 at 1:54 AM, Marc Sturlese  wrote:
>
> Hey,
> I am working with a nighlty and just had to apply the modifications in the
> code and add a couple of lines in solrconfig.xml (as it's shown in the
> patch). Didn't it work for you?
>
> Jonathan Haddad wrote:
>>
>> Does anyone have any good documentation that explains how to set up
>> the warming feature within the config?
>>
>> On Wed, Feb 25, 2009 at 11:58 AM, Marc Sturlese 
>> wrote:
>>>
>>> Shalin your patch worked perfect for my use case.
>>> Thank's both for the information!
>>>
>>>
>>>
>>> Amit Nithian wrote:

 I'm actually working on one for my company which parses our tomcat log
 files
 to obtain queries to feed as warming queries (since GET queries are the
 dominant source of queries) to the firstSearcher. I am not sure what the
 interface is in Solr 1.3, but in 1.2, I implemented the
 SolrEventListener
 interface and overrode the newSearcher method. If you look at the source
 for
 the default warmer, you should be able to construct a list of queries
 from
 a
 different source without much trouble.
 I might be able to send you some code if you need it.

 - Amit

 On Tue, Feb 24, 2009 at 10:15 AM, Marc Sturlese
 wrote:

>
> Hey there,
> Is there any dynamic way to specify the queries to do the warming? I
> mean,
> not writing them hardcoded in solrconfig.xml but getting them from a
> database or from another file??
> Thanks in advance
> --
> View this message in context:
> http://www.nabble.com/warming-question-tp22187322p22187322.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/warming-question-tp22187322p22210458.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Jonathan Haddad
>> http://www.rustyrazorblade.com
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/warming-question-tp22187322p22242609.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Jonathan Haddad
http://www.rustyrazorblade.com


Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia

Is there a recommended unix flavor for deploying Solr on? I've benchmarked my
deployment on Red Hat. Our operations team asked if we can use FreeBSD
instead. Assuming that my benchmark numbers are consistent on FreeBSD, is
there anything else I should watch out for? 

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251134.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread Otis Gospodnetic

You should be fine on either Linux or FreeBSD (or any other UNIX flavour).  
Running on Solaris would probably give you access to goodness like dtrace, but 
you can live without it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: wojtekpia 
> To: solr-user@lucene.apache.org
> Sent: Friday, February 27, 2009 1:03:13 PM
> Subject: Redhat vs FreeBSD vs other unix flavors
> 
> 
> Is there a recommended unix flavor for deploying Solr on? I've benchmarked my
> deployment on Red Hat. Our operations team asked if we can use FreeBSD
> instead. Assuming that my benchmark numbers are consistent on FreeBSD, is
> there anything else I should watch out for? 
> 
> Thanks.
> 
> Wojtek
> -- 
> View this message in context: 
> http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251134.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia

Thanks Otis. Do you know what the most common deployment OS is? I couldn't
find much on the mailing list or http://wiki.apache.org/solr/PublicServers


Otis Gospodnetic wrote:
> 
> 
> You should be fine on either Linux or FreeBSD (or any other UNIX flavour). 
> Running on Solaris would probably give you access to goodness like dtrace,
> but you can live without it.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251260.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread Matthew Runo
I'm willing to be it'd be some flavor of Linux. We run on Gentoo. When  
it comes down to it, I'd think your application server (Tomcat, Resin,  
etc) would have more impact on Solr performance than the OS.


On that front, I'd bet that Tomcat 5 or 6 is the most commonly deployed.

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Feb 27, 2009, at 10:08 AM, wojtekpia wrote:



Thanks Otis. Do you know what the most common deployment OS is? I  
couldn't

find much on the mailing list or http://wiki.apache.org/solr/PublicServers


Otis Gospodnetic wrote:



You should be fine on either Linux or FreeBSD (or any other UNIX  
flavour).
Running on Solaris would probably give you access to goodness like  
dtrace,

but you can live without it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




--
View this message in context: 
http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251260.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: warming question

2009-02-27 Thread Otis Gospodnetic

That, plus:
http://wiki.apache.org/solr/SolrCaching#head-7d0ea6f02cb1d068bf6469201e013ce8e23e175b

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jonathan Haddad 
> To: solr-user@lucene.apache.org
> Sent: Friday, February 27, 2009 12:42:09 PM
> Subject: Re: warming question
> 
> I'm using the latest stable - I'm brand new to solr and I don't know
> where to find all the docs yet.  I'm guessing I should be looking at
> this page: 
> http://wiki.apache.org/solr/SolrCaching#head-34647c63c38782b2fc93c919bb34f8c795a1ee65
> 
> I have an index of 1.5 million documents.  It's updated every few minutes.
> 
> On Fri, Feb 27, 2009 at 1:54 AM, Marc Sturlese wrote:
> >
> > Hey,
> > I am working with a nighlty and just had to apply the modifications in the
> > code and add a couple of lines in solrconfig.xml (as it's shown in the
> > patch). Didn't it work for you?
> >
> > Jonathan Haddad wrote:
> >>
> >> Does anyone have any good documentation that explains how to set up
> >> the warming feature within the config?
> >>
> >> On Wed, Feb 25, 2009 at 11:58 AM, Marc Sturlese 
> >> wrote:
> >>>
> >>> Shalin your patch worked perfect for my use case.
> >>> Thank's both for the information!
> >>>
> >>>
> >>>
> >>> Amit Nithian wrote:
> 
>  I'm actually working on one for my company which parses our tomcat log
>  files
>  to obtain queries to feed as warming queries (since GET queries are the
>  dominant source of queries) to the firstSearcher. I am not sure what the
>  interface is in Solr 1.3, but in 1.2, I implemented the
>  SolrEventListener
>  interface and overrode the newSearcher method. If you look at the source
>  for
>  the default warmer, you should be able to construct a list of queries
>  from
>  a
>  different source without much trouble.
>  I might be able to send you some code if you need it.
> 
>  - Amit
> 
>  On Tue, Feb 24, 2009 at 10:15 AM, Marc Sturlese
>  wrote:
> 
> >
> > Hey there,
> > Is there any dynamic way to specify the queries to do the warming? I
> > mean,
> > not writing them hardcoded in solrconfig.xml but getting them from a
> > database or from another file??
> > Thanks in advance
> > --
> > View this message in context:
> > http://www.nabble.com/warming-question-tp22187322p22187322.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> 
> 
> >>>
> >>> --
> >>> View this message in context:
> >>> http://www.nabble.com/warming-question-tp22187322p22210458.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Jonathan Haddad
> >> http://www.rustyrazorblade.com
> >>
> >>
> >
> > --
> > View this message in context: 
> http://www.nabble.com/warming-question-tp22187322p22242609.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> 
> 
> 
> -- 
> Jonathan Haddad
> http://www.rustyrazorblade.com



Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread Andrzej Bialecki

Otis Gospodnetic wrote:

You should be fine on either Linux or FreeBSD (or any other UNIX
flavour).  Running on Solaris would probably give you access to
goodness like dtrace, but you can live without it.


There's dtrace on FreeBSD, too.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread Yonik Seeley
On Fri, Feb 27, 2009 at 1:08 PM, wojtekpia  wrote:
> Thanks Otis. Do you know what the most common deployment OS is? I couldn't
> find much on the mailing list or http://wiki.apache.org/solr/PublicServers

I would guess RHEL (red hat enterprise linux, or CentOS for the free version).
Ubuntu looks like it might come on strong, but RHEL has been in the
server space for ages.

-Yonik
http://www.lucidimagination.com


Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread Otis Gospodnetic

Same observations here.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Yonik Seeley 
> To: solr-user@lucene.apache.org
> Sent: Friday, February 27, 2009 1:20:53 PM
> Subject: Re: Redhat vs FreeBSD vs other unix flavors
> 
> On Fri, Feb 27, 2009 at 1:08 PM, wojtekpia wrote:
> > Thanks Otis. Do you know what the most common deployment OS is? I couldn't
> > find much on the mailing list or http://wiki.apache.org/solr/PublicServers
> 
> I would guess RHEL (red hat enterprise linux, or CentOS for the free version).
> Ubuntu looks like it might come on strong, but RHEL has been in the
> server space for ages.
> 
> -Yonik
> http://www.lucidimagination.com



Integrating Solr and Nutch

2009-02-27 Thread ahammad

Hello,

I'm wondering if it's possible to make Solr use a Nutch index. I used Nutch
to crawl some pages and I now have an index with about 2000 documents. I
want to explore the features of Solr, and since both Nutch and Solr are
based off Lucene, I assume that there is some way to integrate them with one
another.

Has this been implemented?

I am using the latest release versions of Nutch and Solr.

Cheers
-- 
View this message in context: 
http://www.nabble.com/Integrating-Solr-and-Nutch-tp22252531p22252531.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Integrating Solr and Nutch

2009-02-27 Thread Tony Wang
I heard Nutch 1.0 will have an easy way to integrate with Solr, but I
haven't found any documentation on that yet. anyone?

On Fri, Feb 27, 2009 at 12:14 PM, ahammad  wrote:

>
> Hello,
>
> I'm wondering if it's possible to make Solr use a Nutch index. I used Nutch
> to crawl some pages and I now have an index with about 2000 documents. I
> want to explore the features of Solr, and since both Nutch and Solr are
> based off Lucene, I assume that there is some way to integrate them with
> one
> another.
>
> Has this been implemented?
>
> I am using the latest release versions of Nutch and Solr.
>
> Cheers
> --
> View this message in context:
> http://www.nabble.com/Integrating-Solr-and-Nutch-tp22252531p22252531.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信


Re: Integrating Solr and Nutch

2009-02-27 Thread Andrzej Bialecki

Tony Wang wrote:

I heard Nutch 1.0 will have an easy way to integrate with Solr, but I
haven't found any documentation on that yet. anyone?


Indeed, this integration is already supported in Nutch trunk (soon to be 
released). Please download a nightly package and test it.


You will need to reindex your segments using the solrindex command, and 
change the searcher configuration. See nutch-default.xml for details.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Integrating Solr and Nutch

2009-02-27 Thread Tony Wang
Hi Andrez:

Could you please tell us how to do the nutch1.0/solr integration in a little
more detail? I'm very interested in implementing it. thanks

tony

On Fri, Feb 27, 2009 at 1:27 PM, Andrzej Bialecki  wrote:

> Tony Wang wrote:
>
>> I heard Nutch 1.0 will have an easy way to integrate with Solr, but I
>> haven't found any documentation on that yet. anyone?
>>
>
> Indeed, this integration is already supported in Nutch trunk (soon to be
> released). Please download a nightly package and test it.
>
> You will need to reindex your segments using the solrindex command, and
> change the searcher configuration. See nutch-default.xml for details.
>
> --
> Best regards,
> Andrzej Bialecki <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信


indexing while optimizing

2009-02-27 Thread Laimonas Simutis
Hey,

my SOLR setup looks like the following:

server running apache-tomcat with solr1.2, index size is about 1G (a
bit more than 4 million documents).

I have another machine that basically every minute or so sends some
documents to be indexed. I have autocommit turned on with maxDocs:
5000, maxTime: 30ms.

Also on the server cron job runs twice per day to optimize the index.
And sometimes it happens that the index messages come at the time of
optimize running. I know that running optimize on an index that is
about to have documents added or deleted is not that useful, but it
did help eliminate "too many file handles open" problem.

Is it bad that I try to index when the optimize is running? I do see
failures on the client side from time to time, but the messages get
resent and indexed eventually.

One recurring problem is that once per 36 hours or so SOLR server
becomes really unresponsive, just spinning crazy on CPU and it is all
in java (solr) process. When I try to shut down apache, apache goes
down but the java process is left running. I am trying to pin point
where the problem is, and wonder if my indexing-commit is not right.
The box is solely dedicated for solr, so there is really nothing else
running on it.

Any pointers or observations appreciated.

thanks,

L


Re: solr 1.3 - did something with deleting documents change?

2009-02-27 Thread Chris Hostetter

: image.1image.2 etc... (one
: delete node for each image we wanted to delete)
: 
: And that worked in 1.2.

that is really suprising ... it's not a legal XML doc (multiple root 
nodes) so it should have been an error.

Support was added in Solr 1.3 to support multiple  elements in a 
single  element...

image.1image.2...

: Also, we could get a statistic called "deletesPending" from the stats.jsp page
: in the admin console.
: 
: Now, when we upgraded to 1.3, I noticed right away that deletesPending was
: gone, but since it wasn't critical to the application we kinda let it go.
: Now, first of all, I'm wondering, is there any way to get this statistic back?

Not really  Solr kept that stat before because in order to delete by 
query it had to execute searches and maintain a queue of docs to deletes 
and then process them explicitly when the commit happened.  now the 
underlying lucene IndexWriter can handle hte deleteByQuery directly.

: And that's fine.  Just wondering if that was actually a change or what's going
: on there or if the fact that it ever worked was a fluke (which seems to be the
: case since I find people out on message boards essentially saying the delete
: by query method is the only way to delete multiple documents at once).  I'm

As i mentioned, that was only true before 1.3, but the syntax you were 
using was definitely a fluke.

switching to the multiple  nodes inside a single  node 
should be a lot faster then the  using "OR" syntax you are using 
right now BTW.


-Hoss



Re: warming question

2009-02-27 Thread Jonathan Haddad
I think this is exactly what I was looking for.  Thanks!

On Fri, Feb 27, 2009 at 10:06 AM, Otis Gospodnetic
 wrote:
>
> That, plus:
> http://wiki.apache.org/solr/SolrCaching#head-7d0ea6f02cb1d068bf6469201e013ce8e23e175b
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Jonathan Haddad 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, February 27, 2009 12:42:09 PM
>> Subject: Re: warming question
>>
>> I'm using the latest stable - I'm brand new to solr and I don't know
>> where to find all the docs yet.  I'm guessing I should be looking at
>> this page:
>> http://wiki.apache.org/solr/SolrCaching#head-34647c63c38782b2fc93c919bb34f8c795a1ee65
>>
>> I have an index of 1.5 million documents.  It's updated every few minutes.
>>
>> On Fri, Feb 27, 2009 at 1:54 AM, Marc Sturlese wrote:
>> >
>> > Hey,
>> > I am working with a nighlty and just had to apply the modifications in the
>> > code and add a couple of lines in solrconfig.xml (as it's shown in the
>> > patch). Didn't it work for you?
>> >
>> > Jonathan Haddad wrote:
>> >>
>> >> Does anyone have any good documentation that explains how to set up
>> >> the warming feature within the config?
>> >>
>> >> On Wed, Feb 25, 2009 at 11:58 AM, Marc Sturlese
>> >> wrote:
>> >>>
>> >>> Shalin your patch worked perfect for my use case.
>> >>> Thank's both for the information!
>> >>>
>> >>>
>> >>>
>> >>> Amit Nithian wrote:
>> 
>>  I'm actually working on one for my company which parses our tomcat log
>>  files
>>  to obtain queries to feed as warming queries (since GET queries are the
>>  dominant source of queries) to the firstSearcher. I am not sure what the
>>  interface is in Solr 1.3, but in 1.2, I implemented the
>>  SolrEventListener
>>  interface and overrode the newSearcher method. If you look at the source
>>  for
>>  the default warmer, you should be able to construct a list of queries
>>  from
>>  a
>>  different source without much trouble.
>>  I might be able to send you some code if you need it.
>> 
>>  - Amit
>> 
>>  On Tue, Feb 24, 2009 at 10:15 AM, Marc Sturlese
>>  wrote:
>> 
>> >
>> > Hey there,
>> > Is there any dynamic way to specify the queries to do the warming? I
>> > mean,
>> > not writing them hardcoded in solrconfig.xml but getting them from a
>> > database or from another file??
>> > Thanks in advance
>> > --
>> > View this message in context:
>> > http://www.nabble.com/warming-question-tp22187322p22187322.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
>> 
>> 
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>> http://www.nabble.com/warming-question-tp22187322p22210458.html
>> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Haddad
>> >> http://www.rustyrazorblade.com
>> >>
>> >>
>> >
>> > --
>> > View this message in context:
>> http://www.nabble.com/warming-question-tp22187322p22242609.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Haddad
>> http://www.rustyrazorblade.com
>
>



-- 
Jonathan Haddad
http://www.rustyrazorblade.com


Tomcat causing problem

2009-02-27 Thread Tony Wang
This appears to be a new problem for me. Whenever I try to stop tomcat, I
always got this error:

Using CATALINA_BASE:   /opt/tomcat6
Using CATALINA_HOME:   /opt/tomcat6
Using CATALINA_TMPDIR: /opt/tomcat6/temp
Using JRE_HOME:   /usr/lib/jvm/java-6-sun
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/catalina/startup/Bootstrap
Caused by: java.lang.ClassNotFoundException:
org.apache.catalina.startup.Bootstrap
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: org.apache.catalina.startup.Bootstrap.
Program will exit.

Could someone help me figure out what might go wrong? Thanks!

Tony

-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信


Re: concurrency problem with delta-import (indexing various cores simultaniously)

2009-02-27 Thread Ryuuichi KUMAI
Hello Marc,

I faced the similar problem, and I found a workaround.
If the performance degradation in your application is caused by GC,
this information might help you:

https://issues.apache.org/jira/browse/SOLR-1042

Regards,
Ryuuichi Kumai.

2009/2/21 Marc Sturlese :
>
> I am working with 3 index of 1 gig each. I am using the standard setting of
> the GC, haven't changed anything and using java version "1.6.0_07".
> I don't know so much about GV configuration... just read this
>
> http://marcus.net/blog/2007/11/10/solr-search-and-java-gc-tuning/
>
> when a month ago I exeprienced another problem with Solr (at the end it was
> not GV's fault). So, any advice about wich GC should I try or what should I
> tune?
>
> Thank you very much!
>
>
>
> Shalin Shekhar Mangar wrote:
>>
>> On Fri, Feb 20, 2009 at 11:23 PM, Marc Sturlese
>> wrote:
>>
>>>
>>> Yes,
>>> Now it's almost tree days non-stop since I am running updates with the 3
>>> cores with cron jobs. If there are updates of 1 docs everything is
>>> alrite. When I start doing updates of 30 is when that core runs
>>> really
>>> slow. I have to abort the import in that core and keep updating with less
>>> rows each time.
>>> Another thing to point is that tomcat reaches the maximum memory I allow
>>> (2Gig) and never goes down (but at least it doesn't run out of memory).
>>> Is
>>> that normal? Shouldn't the memory go down a lot after an update is
>>> completed?
>>>
>>
>> I guess you are being hit by garbage collection. Memory utilization should
>> go down once an import completes. Which GC are you using? There have been
>> a
>> few recent threads on GC settings. Perhaps you can try out a few of those
>> settings. I don't know how big your documents/index are but if possible
>> give
>> it more memory.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/concurrency-problem-with-delta-import-%28indexing-various-cores-simultaniously%29-tp22120430p22125716.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Tomcat causing problem

2009-02-27 Thread Otis Gospodnetic

It's a classpath problem it seems, but I can't tell exactly what's wrong.  It 
looks like a pure Tomcat problem, so you may get more help on a Tomcat list.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Tony Wang 
> To: solr-user@lucene.apache.org
> Sent: Saturday, February 28, 2009 12:39:05 AM
> Subject: Tomcat causing problem
> 
> This appears to be a new problem for me. Whenever I try to stop tomcat, I
> always got this error:
> 
> Using CATALINA_BASE:   /opt/tomcat6
> Using CATALINA_HOME:   /opt/tomcat6
> Using CATALINA_TMPDIR: /opt/tomcat6/temp
> Using JRE_HOME:   /usr/lib/jvm/java-6-sun
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/catalina/startup/Bootstrap
> Caused by: java.lang.ClassNotFoundException:
> org.apache.catalina.startup.Bootstrap
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> Could not find the main class: org.apache.catalina.startup.Bootstrap.
> Program will exit.
> 
> Could someone help me figure out what might go wrong? Thanks!
> 
> Tony
> 
> -- 
> Are you RCholic? www.RCholic.com
> 温 良 恭 俭 让 仁 义 礼 智 信