date:20091109

solr and hibernate integration

2009-11-09 Thread Kiwi de coder

hi,

I had a project which is required to index POJO and search it from database.

however, the current support for POJO is only limited to field value, which
still lack of support of complex domain object model like composite element,
collection etc.

hibernate search had done a great job that is able to index complex POJO, I
wondering is some one had wrote a plug-in that can handle complex POJO (like
what hibernate search doing for indexing) ?


kiwi
--
happy hacking !

Re: How to import multiple RSS-feeds with DIH

2009-11-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Mon, Nov 9, 2009 at 1:26 PM, Michael Lackhoff  wrote:
> [A new thread for this particular problem]
>
> On 09.11.2009 08:44 Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> The tried and tested strategy is to post the question in this mailing
>> list w/ your data-config.xml.
>
> See my data-config.xml below. The first is the usual slashdot example
> with my 'id' addition, the second a very simple addtional feed. The
> second example works if I delete the slashdot-feed but as I said I would
> like to have them both.
When you say , the second example does not work , what does it mean?
some exception?(if yes, please post the stacktrace)
>
> -Michael
>
> 
>  
>    
>              pk="link"
>        url="http://rss.slashdot.org/Slashdot/slashdot";
>        processor="XPathEntityProcessor"
>        forEach="/RDF/channel | /RDF/item"
>        transformer="TemplateTransformer,DateFormatTransformer">
>
>         commonField="true" />
>         commonField="true" />
>         commonField="true" />
>
>        
>        
>        
>        
>        
>        
>
>        
>        
>        
>         dateTimeFormat="-MM-dd'T'hh:mm:ss" />
>      
>              pk="link"
>        url="http://www.heise.de/newsticker/heise.rdf";
>        processor="XPathEntityProcessor"
>        forEach="/RDF/channel | /RDF/item"
>        transformer="TemplateTransformer">
>         commonField="true" />
>         commonField="true" />
>
>        
>        
>        
>      
>    
> 
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: solr and hibernate integration

2009-11-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

The point is that the usual complex POJO mapping does not work in
Solr.  For all the supported cases , SolrJ mapping works well

To answer your question , I am not aware of anybody making it work w/ hibernate

On Mon, Nov 9, 2009 at 1:54 PM, Kiwi de coder  wrote:
> hi,
>
> I had a project which is required to index POJO and search it from database.
>
> however, the current support for POJO is only limited to field value, which
> still lack of support of complex domain object model like composite element,
> collection etc.
>
> hibernate search had done a great job that is able to index complex POJO, I
> wondering is some one had wrote a plug-in that can handle complex POJO (like
> what hibernate search doing for indexing) ?
>
>
> kiwi
> --
> happy hacking !
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: solr and hibernate integration

2009-11-09 Thread Kiwi de coder

so is that any plan to support in future version of solr ?

or anyone is interesting to write one :)

as i see that it can take advantages of solr features like facet, spell
check etc.

2009/11/9 Noble Paul നോബിള്‍ नोब्ळ् 

> The point is that the usual complex POJO mapping does not work in
> Solr.  For all the supported cases , SolrJ mapping works well
>
> To answer your question , I am not aware of anybody making it work w/
> hibernate
>
> On Mon, Nov 9, 2009 at 1:54 PM, Kiwi de coder  wrote:
> > hi,
> >
> > I had a project which is required to index POJO and search it from
> database.
> >
> > however, the current support for POJO is only limited to field value,
> which
> > still lack of support of complex domain object model like composite
> element,
> > collection etc.
> >
> > hibernate search had done a great job that is able to index complex POJO,
> I
> > wondering is some one had wrote a plug-in that can handle complex POJO
> (like
> > what hibernate search doing for indexing) ?
> >
> >
> > kiwi
> > --
> > happy hacking !
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: synonym payload boosting

2009-11-09 Thread David Ginzburg

I have found this
https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 patch
But i don't want to use any function, just the normal scoring and the
similarity class  I have written.
Can you point me to  modifications I need (if any) ?



On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN  wrote:

> Additionaly you need to modify your queryparser to return
> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>
> With these types of Queries scorePayload method invoked.
>
> Hope this helps.
>
> --- On Sun, 11/8/09, David Ginzburg  wrote:
>
> > From: David Ginzburg 
> > Subject: synonym payload boosting
> > To: solr-user@lucene.apache.org
> > Date: Sunday, November 8, 2009, 4:06 PM
> > Hi,
> > I have a field and a wighted synonym map.
> > I have indexed the synonyms with the weight as payload.
> > my code snippet from my filter
> >
> > *public Token next(final Token reusableToken) throws
> > IOException *
> > *. *
> > *. *
> > *.*
> >* Payload boostPayload;*
> > *
> > *
> > *for (Synonym synonym : syns)
> > {*
> > **
> > *Token newTok =
> > new Token(nToken.startOffset(),
> > nToken.endOffset(), "SYNONYM");*
> > *
> > newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
> > synonym.getToken().length());*
> > *// set the
> > position increment to zero*
> > *// this tells
> > lucene the synonym is*
> > *// in the exact
> > same location as the originating word*
> > *
> > newTok.setPositionIncrement(0);*
> > *boostPayload =
> > new
> > Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
> > *
> > newTok.setPayload(boostPayload);*
> > *
> > *
> > I have put it in the index time analyzer : this is my field
> > definition:
> >
> > *
> >  > positionIncrementGap="100" >
> >   
> >  > class="solr.WhitespaceTokenizerFactory"/>
> >  > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >  > class="solr.LowerCaseFilterFactory"/>
> >  > class="com.digitaltrowel.solr.DTSynonymFactory"
> > FreskoFunction="names_with_scoresPipe23Columns.txt"
> > ignoreCase="true"
> > expand="false"/>
> >
> > 
> > 
> >   
> >   
> >  > class="solr.WhitespaceTokenizerFactory"/>
> >  > class="solr.LowerCaseFilterFactory"/>
> > 
> >  > class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> > 
> > 
> >   
> > 
> >
> >
> > my similarity class is
> > public class BoostingSymilarity extends DefaultSimilarity
> > {
> >
> >
> > public BoostingSymilarity(){
> > super();
> >
> >   }
> > @Override
> > public  float scorePayload(String field,
> > byte [] payload, int offset,
> > int length)
> > {
> >  double weight = PayloadHelper.decodeFloat(payload, 0);
> > return (float)weight;
> >  }
> >
> > @Override public float coord(int overlap, int maxoverlap)
> >  {
> > return 1.0f;
> > }
> >
> > @Override public float idf(int docFreq, int numDocs)
> > {
> >  return 1.0f;
> > }
> >
> > @Override public float lengthNorm(String fieldName, int
> > numTerms)
> >  {
> > return 1.0f;
> > }
> >
> > @Override public float tf(float freq)
> > {
> >  return 1.0f;
> > }
> > }
> >
> > My problem is that scorePayload method does not get called
> > at search time
> > like the other methods in  my similarity class.
> > I tested and verified it with break points.
> > What am I doing wrong?
> > I used solr 1.3 and thinking of the payload boos support in
> > solr 1.4.
> >
> >
> > *
> >
>
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



-- 
Regards

_
David Ginzburg
Developer, Digital Trowel
1 Hayarden St., Airport City
[POB 169, NATBAG]
Lod, 70151, Israel
http://www.digitaltrowel.com/
Office: +972 73 240 522
Mobile: +972 50 496 0595

CHECK OUT OUR NEW TEXT MINING BLOG:
http://mineyourbusiness.wordpress.com/

[DIH] SqlEntityProcessor does not recognize onError attribute

2009-11-09 Thread Sascha Szott


Hi all,

as stated in the Solr-WIKI, Solr 1.4 allows it to specify an onError 
attribute for *each* entity listed in the data config file (it is 
considered as one of the default attributes).


Unfortunately, the SqlEntityProcessor does not recognize the attribute's 
value -- i.e., in case an SQL exception is thrown somewhere inside the 
constructor of ResultSetIterators (which is an inner class of 
JdbcDataSource), Solr's import exits immediately, even though onError is 
set to continue or skip.


Why are database related exceptions (e.g., table does not exists, or an 
error in query syntax occurs) not being covered by the onError 
attribute? In my opinion, use cases exist that will profit from such an 
exception handling inside of Solr (for example, in cases where the 
existence of certain database tables or views is not predictable).


Should I raise an JIRA-issue about this?

-Sascha

Re: How to import multiple RSS-feeds with DIH

2009-11-09 Thread Michael Lackhoff

On 09.11.2009 09:46 Noble Paul നോബിള്‍ नोब्ळ् wrote:

> When you say , the second example does not work , what does it mean?
> some exception?(if yes, please post the stacktrace)

Very mysterious. Now it works but I am sure I got an exception before.
All I remember is something like "java.io.IOException: FULL". In the
right frame of the DIH debugging screen I got an error message from
firefox: "the connection was reset while displaying the page".

But I don't think it is reproducable now, perhaps some unrelated problem
like low memory or such. Thanks anyway and sorry for the noise.

-Michael

Re: synonym payload boosting

2009-11-09 Thread Grant Ingersoll



On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:


I have found this
https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
patch
But i don't want to use any function, just the normal scoring and the
similarity class  I have written.
Can you point me to  modifications I need (if any) ?




Amhet's point is that you need some query that will actually invoke  
the payload in scoring.  PayloadTermQuery and PayloadNearQuery are the  
two that do this in Lucene.  You can certainly write your own, as well.


-Grant



On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN  wrote:


Additionaly you need to modify your queryparser to return
BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.

With these types of Queries scorePayload method invoked.

Hope this helps.

--- On Sun, 11/8/09, David Ginzburg  wrote:


From: David Ginzburg 
Subject: synonym payload boosting
To: solr-user@lucene.apache.org
Date: Sunday, November 8, 2009, 4:06 PM
Hi,
I have a field and a wighted synonym map.
I have indexed the synonyms with the weight as payload.
my code snippet from my filter

*public Token next(final Token reusableToken) throws
IOException *
*. *
*. *
*.*
  * Payload boostPayload;*
*
*
*for (Synonym synonym : syns)
{*
**
*Token newTok =
new Token(nToken.startOffset(),
nToken.endOffset(), "SYNONYM");*
*
newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
synonym.getToken().length());*
*// set the
position increment to zero*
*// this tells
lucene the synonym is*
*// in the exact
same location as the originating word*
*
newTok.setPositionIncrement(0);*
*boostPayload =
new
Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
*
newTok.setPayload(boostPayload);*
*
*
I have put it in the index time analyzer : this is my field
definition:

*

 
   
   
   
   

   
   
 
 
   
   
   
   
   
   

 
   


my similarity class is
public class BoostingSymilarity extends DefaultSimilarity
{


   public BoostingSymilarity(){
   super();

 }
   @Override
   public  float scorePayload(String field,
byte [] payload, int offset,
int length)
{
double weight = PayloadHelper.decodeFloat(payload, 0);
return (float)weight;
}

@Override public float coord(int overlap, int maxoverlap)
{
return 1.0f;
}

@Override public float idf(int docFreq, int numDocs)
{
return 1.0f;
}

@Override public float lengthNorm(String fieldName, int
numTerms)
{
return 1.0f;
}

@Override public float tf(float freq)
{
return 1.0f;
}
}

My problem is that scorePayload method does not get called
at search time
like the other methods in  my similarity class.
I tested and verified it with break points.
What am I doing wrong?
I used solr 1.3 and thinking of the payload boos support in
solr 1.4.


*



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com





--
Regards

_
David Ginzburg
Developer, Digital Trowel
1 Hayarden St., Airport City
[POB 169, NATBAG]
Lod, 70151, Israel
http://www.digitaltrowel.com/
Office: +972 73 240 522
Mobile: +972 50 496 0595

CHECK OUT OUR NEW TEXT MINING BLOG:
http://mineyourbusiness.wordpress.com/


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: using different field for search and boosting

2009-11-09 Thread Erick Erickson

I'm not sure I understand this. Unless you're including model in
the search, how can you expect model^4 to mean anything? And
if you are including model, would just including it as an OR
clause and boosting it work?

Perhaps a couple of examples of queries you'd like to run and how
you'd like boosting to influence the results would help

Best
Erick

On Sun, Nov 8, 2009 at 9:37 PM, darniz  wrote:

>
> hello
> i wanted to know if its possible to search on one field and provide
> boosting
> relevancy on other fields.
>
> For example if i have fields like make, model, description etc and all are
> copied to text field.
> So can i define a handler where i do a search on text field but can define
> relevancy models on make,model and description ie make^4 model^2
>
> Any advice.
> --
> View this message in context:
> http://old.nabble.com/using-different-field-for-search-and-boosting-tp26260479p26260479.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

RE: using different field for search and boosting

2009-11-09 Thread Birger Lie

Se the Dismax Search Handler 

this handler support search on multiple fields and a relevancy biasing :)


-Birger

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 9. november 2009 14:35
To: solr-user@lucene.apache.org
Subject: Re: using different field for search and boosting

I'm not sure I understand this. Unless you're including model in
the search, how can you expect model^4 to mean anything? And
if you are including model, would just including it as an OR
clause and boosting it work?

Perhaps a couple of examples of queries you'd like to run and how
you'd like boosting to influence the results would help

Best
Erick

On Sun, Nov 8, 2009 at 9:37 PM, darniz  wrote:

>
> hello
> i wanted to know if its possible to search on one field and provide
> boosting
> relevancy on other fields.
>
> For example if i have fields like make, model, description etc and all are
> copied to text field.
> So can i define a handler where i do a search on text field but can define
> relevancy models on make,model and description ie make^4 model^2
>
> Any advice.
> --
> View this message in context:
> http://old.nabble.com/using-different-field-for-search-and-boosting-tp26260479p26260479.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Highlighting is very slow

2009-11-09 Thread Nicolas Dessaigne

Hi Andrew,

Alternatively, you could use a copyfield with a maxChars limit as your
highlighting field. Works well in my case.

See https://issues.apache.org/jira/browse/SOLR-538

Nicolas

2009/11/5 Andrew Clegg 

>
>
> Indeed -- it actually went slightly slower but only by a few seconds, I
> suspect that's within normal variance.
>
> I'll hold out for the new version then -- it's certainly not mission
> critical.
>
> Thanks,
>
> Andrew.
>
>
> markrmiller wrote:
> >
> > It should be the same speed wither way for a term query. The
> > highlighted is going to be slow on general for a 1mb + doc. It
> > processes a token at a time. The fast vector highlighter is much
> > faster in those cases and should be in the next release. It handles
> > fewer query types though.
> >
> > - Mark
> >
> > http://www.lucidimagination.com (mobile)
> >
> > On Nov 4, 2009, at 1:26 PM, Chris Hostetter 
> > wrote:
> >
> >>
> >> : Has anyone else seen this sort of behaviour before? This is with a
> >> nightly
> >> : from 2009-10-26.
> >>
> >> have you tried hl.usePhraseHighlighter=false ? ...
> >>
> >>
> http://old.nabble.com/Highlighting-performance-between-1.3-and-1.4rc-to26190790.html
> >>
> >> ...it doesn't seem like it should be affecting you for a simple term
> >> query, but i'm not sure.
> >>
> >>
> >>
> >> -Hoss
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26211697.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: [DIH] SqlEntityProcessor does not recognize onError attribute

2009-11-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Mon, Nov 9, 2009 at 4:24 PM, Sascha Szott  wrote:
> Hi all,
>
> as stated in the Solr-WIKI, Solr 1.4 allows it to specify an onError
> attribute for *each* entity listed in the data config file (it is considered
> as one of the default attributes).
>
> Unfortunately, the SqlEntityProcessor does not recognize the attribute's
> value -- i.e., in case an SQL exception is thrown somewhere inside the
> constructor of ResultSetIterators (which is an inner class of
> JdbcDataSource), Solr's import exits immediately, even though onError is set
> to continue or skip.
>
> Why are database related exceptions (e.g., table does not exists, or an
> error in query syntax occurs) not being covered by the onError attribute? In
> my opinion, use cases exist that will profit from such an exception handling
> inside of Solr (for example, in cases where the existence of certain
> database tables or views is not predictable).
We thought DB errors are not to be ignored because errors such as
table does not exist can be really serious.

>
> Should I raise an JIRA-issue about this?
Raise an issue it can be fixed
>
> -Sascha
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Highlighting is very slow

2009-11-09 Thread Andrew Clegg

Nicolas Dessaigne wrote:
> 
> Alternatively, you could use a copyfield with a maxChars limit as your
> highlighting field. Works well in my case.
> 

Thanks for the tip. We did think about doing something similar (only
enabling highlighting for certain shorter fields) but we decided that
perhaps users would be confused if search terms were sometimes
snippeted+highlighted and sometimes not. (A brief run through with a single
user suggested this, although that's not statistically significant...) So we
decided to avoid highlighting altogether until we can do it across the
board.

Cheers,

Andrew.
-- 
View this message in context: 
http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26267441.html
Sent from the Solr - User mailing list archive at Nabble.com.

Overwriting of column data from DataImportHandler

2009-11-09 Thread Mark Ellul

Hi,

I am using solr 1.3 with DataImportHandler from a postgres db.

I have a select statement similar to the below

Select id, id as pk, name, description from my_table;

and a data-config.xml

 







Anyway my issue is that the data thats getting imported into the documents
seems to overwriting itself.

Basically the pk and the site_id fields is getting api__tweeter__[[id]]
where the [[id]] is the id thats returned from the query.

Is there something that I am missing?

Regards

Mark

Solr Training in Europe

2009-11-09 Thread Uri Boness


Hi All,

For those who are interested, the official Lucid Solr trainings are now 
available in Europe. The first training - "Introduction to Solr" is a 3 
days training covering the basics and some of the more advance features 
of Solr. It is scheduled for 30th November (till 2nd December) and will 
take place in Amsterdam, The Netherlands. For more information please 
visit: 
http://www.lucidimagination.com/How-We-Can-Help/Training/Classroom-Training-Schedule


cheers,
Uri

Re: [DIH] SqlEntityProcessor does not recognize onError attribute

2009-11-09 Thread Sascha Szott


Hi,

Noble Paul നോബിള്‍ नोब्ळ् wrote:

On Mon, Nov 9, 2009 at 4:24 PM, Sascha Szott  wrote:

Hi all,

as stated in the Solr-WIKI, Solr 1.4 allows it to specify an onError
attribute for *each* entity listed in the data config file (it is considered
as one of the default attributes).

Unfortunately, the SqlEntityProcessor does not recognize the attribute's
value -- i.e., in case an SQL exception is thrown somewhere inside the
constructor of ResultSetIterators (which is an inner class of
JdbcDataSource), Solr's import exits immediately, even though onError is set
to continue or skip.

Why are database related exceptions (e.g., table does not exists, or an
error in query syntax occurs) not being covered by the onError attribute? In
my opinion, use cases exist that will profit from such an exception handling
inside of Solr (for example, in cases where the existence of certain
database tables or views is not predictable).

We thought DB errors are not to be ignored because errors such as
table does not exist can be really serious.
In principle, I agree with you, though I would consider it as a 
programmer's responsibility to be aware of it (in case he/she sets 
onError to skip or continue).



Should I raise an JIRA-issue about this?

Raise an issue it can be fixed

I've created issue SOLR-1549.

Best,
Sascha

Re: Why doesn't highlighting work on this document?

2009-11-09 Thread Paul Rosen


That was exactly the problem, thanks!

Jake Brownell wrote:

By default the highlighter only considers the first 50k of text. See 
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars

Obviously the larger the size, the longer highlighting will take. When I get no 
highlights, I find that the text matched was later in the doc.

Jake

-Original Message-
From: Paul Rosen [mailto:p...@performantsoftware.com] 
Sent: Friday, November 06, 2009 4:07 PM

To: solr-user@lucene.apache.org
Subject: Why doesn't highlighting work on this document?

I have a puzzling case that I don't know how to begin to debug. I 
present results with snippets highlighted, but it is not consistent, and 
it would be nice to know why some documents are returned without any 
highlighted text.


If you go to:

http://www.nines.org/search/saved?user=paul&name=tree

And look at the last entry on the page, (it should be titled "Nashe's 
Red Herring: Epistemologies of the Commodity in Lenten Stuffe (1599)")


you'll see that there is no text returned for that object. I looked in 
the solr index, and there is a fairly long text field, and in the middle 
of that field is:


"which the tree, for example, maintains its form as a tree (wood 
maintains itself in the specific form of the tree because this form is a 
form of"


That is probably what was matched, but why wasn't that text returned?

(I'm using solr 1.4 nightly build from Sept 25)

Solr/lucene/java/etc. person sought near Washington, DC/USA

2009-11-09 Thread Rich Kulawiec

Skills:
Required: solr, lucene, java
Desirable: nutch, tika, maven, python and/or perl

Experience:
2+ years, including some exposure to general
search engine concepts

Term:
permanent preferred, but contract available depending on the individual

Location:
greater Washington DC area

Please send along a resume and a cover letter.  I'm not the employer, I'm
just doing a favor for them by publishing this and collecting responses.

(Incidentally, I tried reach the owner of this list to enquire whether or not
this is appropriate, but didn't receive a response...so I took my best guess.
If that's wrong, I apologize to everyone.)

Re: dismax + wildcard

2009-11-09 Thread Peter Wolanin

There are some open issues (not for 1.4 at this point) to make dismax
more flexible or add wildcard handling, e.g:

https://issues.apache.org/jira/browse/SOLR-756
https://issues.apache.org/jira/browse/SOLR-758

You might participate in those to try to get this in a future version
and/or get a working patch for 1.4

-Peter

On Wed, Nov 4, 2009 at 7:04 PM, Koji Sekiguchi  wrote:
> Jan Kammer wrote:
>>
>> Hi there,
>>
>> what is the best way to search all fields AND use wildcards?
>> Somewhere I read that there are problems with this combination... (dismax
>> + wildcard)
>>
> It's a feature of dismax. WildcardQuery cannot be used in dismax q
> parameter.
>
> You can copy the "all fields" to a destination field by using
> copyField, then search the destination field with wildcards
> (without using dismax).
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Question about the message "Indexing failed. Rolled back all changes."

2009-11-09 Thread Shalin Shekhar Mangar

On Sat, Nov 7, 2009 at 1:10 PM, Bertie Shen  wrote:

>
>  When I use
> http://localhost:8180/solr/admin/dataimport.jsp?handler=/dataimport to
> debug
> the indexing config file, I always see the status message on the right part
> Indexing failed. Rolled back all changes., even the
> indexing process looks to be successful. I am not sure whether you guys
> have
> seen the same phenomenon or not.  BTW, I usually check the checkbox Clean
> and sometimes check Commit box, and then click Debug Now button.
>
>
Do you see any exceptions in the logs?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Wildcard searches within phrases to use proximity

2009-11-09 Thread Shalin Shekhar Mangar

On Sun, Nov 8, 2009 at 1:29 AM, AHMET ARSLAN  wrote:

>
> > You can do it with the
> > complexphrasequery parser in lucerne contrib (I think that's
> > the name). You have to plug it in to solr though - someone
> > has already donethis bit I'm not sure if it was controbbed
> > back.
>
> I would be happy to contribute it, what should i do?
>
>
That'd be great. Please open an issue in Jira and attach a patch. See
http://wiki.apache.org/solr/HowToContribute


-- 
Regards,
Shalin Shekhar Mangar.

Re: Similar documents from multiple cores with different schemas

2009-11-09 Thread Alexey Serba

> Or maybe it's
> possible to tweak MoreLikeThis just to return the fields and terms that
> could be used for a search on the other core?
Exactly

See parameter mlt.interestingTerms in MoreLikeThisHandler
http://wiki.apache.org/solr/MoreLikeThisHandler

You can get interesting terms and build query (with N optional clauses
+ boosts) to second core yourself

HIH,
Alex


On Mon, Nov 9, 2009 at 6:25 PM, Chantal Ackermann
 wrote:
> Hi all,
>
> my search for any postings answering the following question haven't produced
> any helpful hints so far. Maybe someone can point me into the right
> direction?
>
> Situation:
> I have two cores with slightly different schemas. Slightly means that some
> fields appear on both cores but there are some that are required in one core
> but optional in the other. Then there are fields that appear only in one
> core.
> (I don't want to put them in one index, right now, because of the fields
> that might be required for only one type but not the other. But it's
> certainly an option.)
>
> Question:
> Is there a way to get similar contents from core B when the input (seed) to
> the comparison is a document from core A?
>
> MoreLikeThis:
> I was searching for MoreLikeThis, multiple schemas etc. As these are cores
> with different schemas, the posts on distributed search/sharding in
> combination with MoreLikeThis are not helpful. But maybe there is some other
> functionality that I am not aware of? Some similarity search? Or maybe it's
> possible to tweak MoreLikeThis just to return the fields and terms that
> could be used for a search on the other core?
>
> Thanks for any input!
> Chantal
>

Re: Solr/lucene/java/etc. person sought near Washington, DC/USA

2009-11-09 Thread Grant Ingersoll



On Nov 9, 2009, at 10:36 AM, Rich Kulawiec wrote:


(Incidentally, I tried reach the owner of this list to enquire  
whether or not
this is appropriate, but didn't receive a response...so I took my  
best guess.

If that's wrong, I apologize to everyone.)


I personally think it's fine to post job openings on solr-user, but  
not on solr-dev


-Grant

[DIH] blocking import operation

2009-11-09 Thread Sascha Szott


Hi all,

currently, DIH's import operation(s) only works asynchronously. 
Therefore, after submitting an import request, DIH returns immediately, 
while the import process (in case a large amount of data needs to be 
indexed) continues asynchronously behind the scenes.


So, what is the recommended way to check if the import process has 
already finished? Or still better, is there any method / workaround that 
will block the import operation's caller until the operation has finished?


In my application, the DIH receives some URL parameters which are used 
for determining the database name that is used within data-config.xml, e.g.


http://localhost:8983/solr/dataimport?command=full-import&dbname=foo

Since only one DIH, /dataimport, is defined, but several database needs 
to be indexed, it is required to issue this command several times, e.g.


http://localhost:8983/solr/dataimport?command=full-import&dbname=foo

... wait until /dataimport?command=status says "Indexing completed" (but 
without using a loop that checks it again and again) ...


http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false


A suitable solution, at least IMHO, would be to have an additional DIH 
parameter which determines whether the import call is blocking on 
non-blocking, the default. As far as I see, this could be accomplished 
since Solr can execute more than one import operation at a time (it 
starts a new thread for each). Perhaps, my question is somehow related 
to the discussion [1] on ParallelDataImportHandler.


Best,
Sascha

[1] http://www.lucidimagination.com/search/document/a9b26ade46466ee

Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-09 Thread Vicky_Dev



Hi Team,
Is it possible to write subqueries in dismaxrequest handler?

~Vikrant


Edoardo Marcora wrote:
> 
> Does Solr have the ability to do subqueries, like this one (in SQL):
> 
> SELECT id, first_name
> FROM student_details
> WHERE first_name IN (SELECT first_name
> FROM student_details
> WHERE subject= 'Science'); 
> 
> If so, how performant is this kind of queries?
> 

-- 
View this message in context: 
http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26271600.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sanizing/filtering query string for security

2009-11-09 Thread michael8

Hi Julian,

Saw you post on exactly the question I have.  I'm curious if you got any
response directly, or figured out a way to do this by now that you could
share?  I'm in the same situation trying to 'sanitize' the query string
coming in before handing it to solr.  I do see that characters like ":"
could break the query, but am curious if anyone has come up with a general
solution as I think this must be a fairly common problem for any solr
deployment to tackle.

Thanks,
Michael

Julian Davchev wrote:
> 
> Hi,
> Is there anything special that can be done for sanitizing user input
> before passed as query to solr.
> Not allowing * and ? as first char is only thing I can thing of right
> now. Anything else it should somehow handle.
> 
> I am not able to find any relevant document.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr on OOM

2009-11-09 Thread Vauthrin, Laurent

Hello,

 

One of our deployed Solr (1.3) setup is having out of memory issues and
I'm not sure how to troubleshoot it.  I've read a few posts (including
http://old.nabble.com/Debugging-Solr-memory-usage-heap-problems-ts883279
4.html#a8832794) but I think this situation is slightly different.

 

Here's the setup:

1 master and 1 slave are located on a the same VM (using a 64-bit JVM)

1 slave running on its own VM (using a 64-bit JVM)

>From what I've been told, nothing else is running on those VMs.

Index size is about 100-200 MB.

 

Solrconfig.xml cache settings:

 







 

Both slaves at some point have gone out of memory (though not both at
the same time) when receiving a moderate load of queries (a few queries
per second - don't have an exact stat here).  We started with a heap
size of 1GB and ended up having to bump it up to 3.5GB.  It seems really
odd that we'd have to have a heap size that large when the index itself
is not really big.  Any thoughts on what could be really off here?   Is
there a way to determine the cache sizes in bytes?  I noticed that there
was a thread about other issues running Solr on VMs, has anyone else had
problems using VMWare?  From what I'm told, it seems like moving to
physical servers won't be a fast/easy change so I'm looking for any help
I can get for this configuration.

 

Thanks,
Laurent Vauthrin

DocumentObjectBinder.getBean in solrj

2009-11-09 Thread Christian López Espínola

Hi,

I need to create different beans from a search result, according to a
stored field with the classname.
What I'd find useful would be a getBean method in
DocumentObjectBinder, instead of the getBeans method (my list would
have different class instances).

Is there any workaround for this? If not, would anyone be interested on this?
I can contribute a patch for this.

TIA.

-- 
Cheers,

Christian López Espínola

default a parameter to a core's name

2009-11-09 Thread Michael

Hi,

Is there a way for me to set up my solr.xml so that slave cores
replicate from the master's identically-named cores by default, but I
can override the core to replicate from if I wish? Something like
this, where core1 and core2 on the slave default to replicating from
foo/solr/core1 and foo/solr/core2, but core3 replicates from
foo/solr/core15.

# slave's solrconfig.xml

http://foo/solr/${replicateCore}/replication

# slave's solr.xml

  
  
 
 

  
  

  


This doesn't quite work because solr.core.name is not a valid variable
outside the  section.  I also tried putting
"${replicateCore:${solr.core.name}}" in the solrconfig.xml, but the
default in that case is literally "${solr.core.name}" -- the variable
expansion isn't recursive.

Thanks in advance for any pointers.

Michael

Re: DocumentObjectBinder.getBean in solrj

2009-11-09 Thread Christian López Espínola

On Mon, Nov 9, 2009 at 8:26 PM, Christian López Espínola
 wrote:
> Hi,
>
> I need to create different beans from a search result, according to a
> stored field with the classname.
> What I'd find useful would be a getBean method in
> DocumentObjectBinder, instead of the getBeans method (my list would
> have different class instances).
>
> Is there any workaround for this? If not, would anyone be interested on this?
> I can contribute a patch for this.
>
> TIA.

Attached is the patch I would like to have in Solrj.
If there is any problem with it please let me know. I followed the
HowToContribute wiki page and I hope that I didn't miss any steps.

>
> --
> Cheers,
>
> Christian López Espínola 
>



-- 
Cheers,

Christian López Espínola

Solr Internal exception on startup...

2009-11-09 Thread William Pierce

Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 box,  
running tomcat 6.  I have deposited the solr nightly bits (as of October 7) 
into the folder: /usr/share/tomcat6/lib

The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib 
read)
at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)


This is strange because the documentation says that the "lib" folder is 
optional.  (As a point of reference, I don't have a lib folder for my windows 
installation).   In any event, I created an empty "lib' folder and I am still 
getting this same exception.   (I gave the lib folder 777 permission.)


   


Under the folder /home/ubuntu/apps/solr/tomcatweb/prod are all solr folders 
(conf, data).  

Can anybody help me here with what looks like a basic configuration error on my 
part?

Thanks,

- Bill

Solr Internal exception on startup...

2009-11-09 Thread William Pierce

Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 box,  
running tomcat 6.  I have deposited the solr nightly bits (as of October 7) 
into the folder: /usr/share/tomcat6/lib

The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib 
read)
at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)


This is strange because the documentation says that the "lib" folder is 
optional.  (As a point of reference, I don't have a lib folder for my windows 
installation).   In any event, I created an empty "lib' folder and I am still 
getting this same exception.   (I gave the lib folder 777 permission.)


   


Under the folder /home/ubuntu/apps/solr/tomcatweb/prod are all solr folders 
(conf, data).  

Can anybody help me here with what looks like a basic configuration error on my 
part?

Thanks,

- Bill

Re: Solr Internal exception on startup...

2009-11-09 Thread William Pierce

Sorry...folks...I saw that there were two copies sent outBeen having 
some email snafus at my end...so apologize in advance for the duplicate 
email


- Bill

--
From: "William Pierce" 
Sent: Monday, November 09, 2009 12:49 PM
To: 
Subject: Solr Internal exception on startup...


Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 
box,  running tomcat 6.  I have deposited the solr nightly bits (as of 
October 7) into the folder: /usr/share/tomcat6/lib


The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext 
filterStart

SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: 
java.security.AccessControlException: access denied 
(java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)



This is strange because the documentation says that the "lib" folder is 
optional.  (As a point of reference, I don't have a lib folder for my 
windows installation).   In any event, I created an empty "lib' folder and 
I am still getting this same exception.   (I gave the lib folder 777 
permission.)


debug="0" crossContext="true" >
  value="/home/ubuntu/apps/solr/tomcatweb/prod" override="true" />



Under the folder /home/ubuntu/apps/solr/tomcatweb/prod are all solr 
folders (conf, data).


Can anybody help me here with what looks like a basic configuration error 
on my part?


Thanks,

- Bill

Re: Solr Internal exception on startup...

2009-11-09 Thread William Pierce


All,

I realized that the stack trace I had sent in my previous email was 
truncated to not include the solr portions.here is the fuller stack 
trace:


SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission 
/home/ubuntu/apps/solr/tomcatweb/resumes/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
   at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
   at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
   at 
org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123)
   at 
org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)

   at java.security.AccessController.doPrivileged(Native Method)
   at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769)
   at 
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
   at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
   at 
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
   at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
   at 
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
   at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
   at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)

Nov 9, 2009 9:08:57 PM org.apache.catalina.core.StandardContext filterStart
SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: java.security.AccessControlException: 
access denied (java.io.FilePermission 
/home/ubuntu/apps/solr/tomcatweb/resumes/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)


Cheers,

- Bill

--
From: "William Pierce" 
Sent: Monday, November 09, 2009 12:49 PM
To: 
Subject: Solr Internal exception on startup...


Folks:

I am encountering an internal exception running solr on an Ubuntu 9.04 
box,  running tomcat 6.  I have deposited the solr nightly bits (as of 
October 7) into the folder: /usr/share/tomcat6/lib


The exception from the log says:

Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext 
filterStart

SEVERE: Exception starting filter SolrRequestFilter
org.apache.solr.common.SolrException: 
java.security.AccessControlException: access denied 
(java.io.FilePermission /home/ubuntu/apps/solr/tomcatweb/prod/lib read)
   at 
org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at java.lang.Class.newInstance0(Class.java:355)
   at java.lang.Class.newInstance(Class.java:308)

Re: sanizing/filtering query string for security

2009-11-09 Thread michael8


Sounds like a nice approach you have  done.  BTW, I have not used DisMax
handler yet, but does it handle *:* properly?  IOW, do you care if users
issue this query, or does DisMax treat this query string differently than
standard request handler?  Basically given my UI, I'm trying to *hide* the
total count from users searching for *everything*, though this syntax has
helped me debug/monitor the state of my search doc pool size.

Thanks,
Michael


Alexey-34 wrote:
> 
> I added some kind of pre and post processing of Solr results for this,
> i.e.
> 
> If I find fieldname specified in query string in form of
> "fieldname:term" then I pass this query string to standard request
> handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
> doesn't break the query, at least I haven't seen yet ). If standard
> request handler throws error ( invalid field, too many clauses, etc )
> then I pass original query to DisMax request handler.
> 
> Alex
> 
> On Mon, Nov 9, 2009 at 10:05 PM, michael8  wrote:
>>
>> Hi Julian,
>>
>> Saw you post on exactly the question I have.  I'm curious if you got any
>> response directly, or figured out a way to do this by now that you could
>> share?  I'm in the same situation trying to 'sanitize' the query string
>> coming in before handing it to solr.  I do see that characters like ":"
>> could break the query, but am curious if anyone has come up with a
>> general
>> solution as I think this must be a fairly common problem for any solr
>> deployment to tackle.
>>
>> Thanks,
>> Michael
>>
>>
>> Julian Davchev wrote:
>>>
>>> Hi,
>>> Is there anything special that can be done for sanitizing user input
>>> before passed as query to solr.
>>> Not allowing * and ? as first char is only thing I can thing of right
>>> now. Anything else it should somehow handle.
>>>
>>> I am not able to find any relevant document.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
Sent from the Solr - User mailing list archive at Nabble.com.

deployment questions

2009-11-09 Thread Joel Nylund


Hi,

I have a java app that is deployed in jboss/tomcat container. I would  
like to add my solr index to it. I have read about this and it seems  
fairly straight forward, but im curious the best way to secure it.


I require my users to login to my app to use it, so I want the search  
functions to behave the same way. Ideally I would like to do the solr  
queries from the client using ajax/json calls.


So given this my thinking was I should wrapper the solr servlet and do  
a local proxy type interface to ensure security. Is there any easier  
way to do this, or an example of a good way to do this? Or does the  
solr servlet support a "interceptor" type pattern where I can have it  
call a piece of code before I execute the call (this application is  
old and not using std j2ee security so I dont think I can use that.)



Another option is to do solrj on the server, and not do the client  
side calls, in this case I think I could lock down the solr servlet  
interface to only allow local calls.


thanks
Joel

Re: sanizing/filtering query string for security

2009-11-09 Thread Otis Gospodnetic

Yes, DisMax does handle the match-all *:* query.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: michael8 
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 4:59:33 PM
> Subject: Re: sanizing/filtering query string for security
> 
> 
> Sounds like a nice approach you have  done.  BTW, I have not used DisMax
> handler yet, but does it handle *:* properly?  IOW, do you care if users
> issue this query, or does DisMax treat this query string differently than
> standard request handler?  Basically given my UI, I'm trying to *hide* the
> total count from users searching for *everything*, though this syntax has
> helped me debug/monitor the state of my search doc pool size.
> 
> Thanks,
> Michael
> 
> 
> Alexey-34 wrote:
> > 
> > I added some kind of pre and post processing of Solr results for this,
> > i.e.
> > 
> > If I find fieldname specified in query string in form of
> > "fieldname:term" then I pass this query string to standard request
> > handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
> > doesn't break the query, at least I haven't seen yet ). If standard
> > request handler throws error ( invalid field, too many clauses, etc )
> > then I pass original query to DisMax request handler.
> > 
> > Alex
> > 
> > On Mon, Nov 9, 2009 at 10:05 PM, michael8 wrote:
> >>
> >> Hi Julian,
> >>
> >> Saw you post on exactly the question I have.  I'm curious if you got any
> >> response directly, or figured out a way to do this by now that you could
> >> share?  I'm in the same situation trying to 'sanitize' the query string
> >> coming in before handing it to solr.  I do see that characters like ":"
> >> could break the query, but am curious if anyone has come up with a
> >> general
> >> solution as I think this must be a fairly common problem for any solr
> >> deployment to tackle.
> >>
> >> Thanks,
> >> Michael
> >>
> >>
> >> Julian Davchev wrote:
> >>>
> >>> Hi,
> >>> Is there anything special that can be done for sanitizing user input
> >>> before passed as query to solr.
> >>> Not allowing * and ? as first char is only thing I can thing of right
> >>> now. Anything else it should somehow handle.
> >>>
> >>> I am not able to find any relevant document.
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >> 
> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> > 
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: CPU Max Utilization

2009-11-09 Thread ba ba

After doing some more testing, I've seen the performance decrease yet again.
It happens after solr has been run for about 1/2 hour. I left my test
running over the weekend and saw the CPU usage go down to a reasonable level
at the end of the weekend. It is the same problem where the CPU has maximum
usage. I attached a profiler to the solr instance and found that 99% of the
CPU time is spent in the doFilter method of the SolrDispatchFilter class.

Does anyone know why all of the CPU would be hogged on this particular
method?

I'm requesting by relevance without sorting. I'm requesting 500 results per
query. There are no repititions in the query set.

As for the fields. I'm using String and SortableInt fields. There are 3
string fields and 3 Sortable Int fields in my schema. One of the String
Fields is multivalued. The fields are quite small. Since its 18 GB for a 100
million document index.

Thanks,
Brad

2009/11/6 ba ba 

> After looking at the question about the sorting. It seems that the schema
> was using the SortableIntField class. When I did not return these fields in
> the queries, I got reasonable CPU usage. If I search only on one of these
> SortableIntFields, I get the bad query performance. I think the problem is
> the schema is using a Sortable field when I don't need a sortable field.
>
> Thanks for the help.
>
> -Brad
>
> 2009/11/5 Otis Gospodnetic 
>
> You may also want to share some sample queries, your fields definitions,
>> and tell us how long a core remains 100% utilized.
>>
>>  Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> - Original Message 
>> > From: ba ba 
>> > To: solr-user@lucene.apache.org
>> > Sent: Thu, November 5, 2009 9:20:13 PM
>> > Subject: CPU Max Utilization
>> >
>> > Greetings,
>> >
>> > I'm running a solr instance with 100 million documents in it. The index
>> is
>> > 18 GB.
>> >
>> > The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
>> > running on an 8 core machine with 32 GB or ram. Every concurrent query I
>> run
>> > on it uses up one of the cores. So, if I am running 1 concurrent query
>> I'm
>> > using up the cpu of one of the cores. If I have 8 concurrent queries I'm
>> > using up all of the cores.
>> >
>> > Is this normal to have such a high CPU utilization. If not, what am I
>> doing
>> > wrong here. The only thing I have modified is the schema.xml file to
>> > correspond to the documents I want to store. Everything else is just
>> using
>> > the default values for all the config files.
>> >
>> > Thanks.
>>
>>
>

Re: Solr on OOM

2009-11-09 Thread Otis Gospodnetic

Laurent,

The autowarmCounts look biggest, but they are probably not causing OOMs.  Maybe 
you can see how big the caches are right before you OOM.
Or you can also start the JVM with -XX:+HeapDumpOnOutOfMemoryError and even 
specify the file where the heap should be dumped.  You can then analyze it and 
see what's eating the memory.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: "Vauthrin, Laurent" 
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 2:22:26 PM
> Subject: Solr on OOM
> 
> Hello,
> 
> 
> 
> One of our deployed Solr (1.3) setup is having out of memory issues and
> I'm not sure how to troubleshoot it.  I've read a few posts (including
> http://old.nabble.com/Debugging-Solr-memory-usage-heap-problems-ts883279
> 4.html#a8832794) but I think this situation is slightly different.
> 
> 
> 
> Here's the setup:
> 
> 1 master and 1 slave are located on a the same VM (using a 64-bit JVM)
> 
> 1 slave running on its own VM (using a 64-bit JVM)
> 
> From what I've been told, nothing else is running on those VMs.
> 
> Index size is about 100-200 MB.
> 
> 
> 
> Solrconfig.xml cache settings:
> 
> 
> 
> 
> autowarmCount="5000"/>
> 
> 
> initialSize="5000" autowarmCount="5000"/>
> 
> 
> initialSize="5000"/>
> 
> 
> 
> Both slaves at some point have gone out of memory (though not both at
> the same time) when receiving a moderate load of queries (a few queries
> per second - don't have an exact stat here).  We started with a heap
> size of 1GB and ended up having to bump it up to 3.5GB.  It seems really
> odd that we'd have to have a heap size that large when the index itself
> is not really big.  Any thoughts on what could be really off here?   Is
> there a way to determine the cache sizes in bytes?  I noticed that there
> was a thread about other issues running Solr on VMs, has anyone else had
> problems using VMWare?  From what I'm told, it seems like moving to
> physical servers won't be a fast/easy change so I'm looking for any help
> I can get for this configuration.
> 
> 
> 
> Thanks,
> Laurent Vauthrin

Re: sanizing/filtering query string for security

2009-11-09 Thread Alexey Serba

I added some kind of pre and post processing of Solr results for this, i.e.

If I find fieldname specified in query string in form of
"fieldname:term" then I pass this query string to standard request
handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
doesn't break the query, at least I haven't seen yet ). If standard
request handler throws error ( invalid field, too many clauses, etc )
then I pass original query to DisMax request handler.

Alex

On Mon, Nov 9, 2009 at 10:05 PM, michael8  wrote:
>
> Hi Julian,
>
> Saw you post on exactly the question I have.  I'm curious if you got any
> response directly, or figured out a way to do this by now that you could
> share?  I'm in the same situation trying to 'sanitize' the query string
> coming in before handing it to solr.  I do see that characters like ":"
> could break the query, but am curious if anyone has come up with a general
> solution as I think this must be a fairly common problem for any solr
> deployment to tackle.
>
> Thanks,
> Michael
>
>
> Julian Davchev wrote:
>>
>> Hi,
>> Is there anything special that can be done for sanitizing user input
>> before passed as query to solr.
>> Not allowing * and ? as first char is only thing I can thing of right
>> now. Anything else it should somehow handle.
>>
>> I am not able to find any relevant document.
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-09 Thread Otis Gospodnetic

You can mimic them by combining 2 clauses with an AND.
e.g.
cookies
vs.
cookies AND vanilla

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Vicky_Dev 
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 1:48:03 PM
> Subject: Re: Are subqueries possible in Solr? If so, are they performant?
> 
> 
> 
> Hi Team,
> Is it possible to write subqueries in dismaxrequest handler?
> 
> ~Vikrant
> 
> 
> Edoardo Marcora wrote:
> > 
> > Does Solr have the ability to do subqueries, like this one (in SQL):
> > 
> > SELECT id, first_name
> > FROM student_details
> > WHERE first_name IN (SELECT first_name
> > FROM student_details
> > WHERE subject= 'Science'); 
> > 
> > If so, how performant is this kind of queries?
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26271600.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Similar documents from multiple cores with different schemas

2009-11-09 Thread Otis Gospodnetic

Chantal,

What you described in the last sentence should work.  You can search by example 
by using the whole or some portion of doc from core A as the query against core 
B.  That is, more or less, what MLT does under the hood anyway.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Chantal Ackermann 
> To: "solr-user@lucene.apache.org" 
> Sent: Mon, November 9, 2009 10:25:44 AM
> Subject: Similar documents from multiple cores with different schemas
> 
> Hi all,
> 
> my search for any postings answering the following question haven't produced 
> any 
> helpful hints so far. Maybe someone can point me into the right direction?
> 
> Situation:
> I have two cores with slightly different schemas. Slightly means that some 
> fields appear on both cores but there are some that are required in one core 
> but 
> optional in the other. Then there are fields that appear only in one core.
> (I don't want to put them in one index, right now, because of the fields that 
> might be required for only one type but not the other. But it's certainly an 
> option.)
> 
> Question:
> Is there a way to get similar contents from core B when the input (seed) to 
> the 
> comparison is a document from core A?
> 
> MoreLikeThis:
> I was searching for MoreLikeThis, multiple schemas etc. As these are cores 
> with 
> different schemas, the posts on distributed search/sharding in combination 
> with 
> MoreLikeThis are not helpful. But maybe there is some other functionality 
> that I 
> am not aware of? Some similarity search? Or maybe it's possible to tweak 
> MoreLikeThis just to return the fields and terms that could be used for a 
> search 
> on the other core?
> 
> Thanks for any input!
> Chantal

Re: sanizing/filtering query string for security

2009-11-09 Thread Alexey Serba

> BTW, I have not used DisMax handler yet, but does it handle *:* properly?
See q.alt DisMax parameter
http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt

You can specify q.alt=*:* and q as empty string to get all results.

> do you care if users issue this query
I allow users to issue an empty search and get all results with all
facets / etc. It's a nice navigation UI btw.

> Basically given my UI, I'm trying to *hide* the total count from users 
> searching for *everything*
If you don't specify q.alt parameter then Solr returns zero results
for empty search. *:* won't work either.

> though this syntax has helped me debug/monitor the state of my search doc 
> pool size.
see q.alt

Alex

On Tue, Nov 10, 2009 at 12:59 AM, michael8  wrote:
>
> Sounds like a nice approach you have  done.  BTW, I have not used DisMax
> handler yet, but does it handle *:* properly?  IOW, do you care if users
> issue this query, or does DisMax treat this query string differently than
> standard request handler?  Basically given my UI, I'm trying to *hide* the
> total count from users searching for *everything*, though this syntax has
> helped me debug/monitor the state of my search doc pool size.
>
> Thanks,
> Michael
>
>
> Alexey-34 wrote:
>>
>> I added some kind of pre and post processing of Solr results for this,
>> i.e.
>>
>> If I find fieldname specified in query string in form of
>> "fieldname:term" then I pass this query string to standard request
>> handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
>> doesn't break the query, at least I haven't seen yet ). If standard
>> request handler throws error ( invalid field, too many clauses, etc )
>> then I pass original query to DisMax request handler.
>>
>> Alex
>>
>> On Mon, Nov 9, 2009 at 10:05 PM, michael8  wrote:
>>>
>>> Hi Julian,
>>>
>>> Saw you post on exactly the question I have.  I'm curious if you got any
>>> response directly, or figured out a way to do this by now that you could
>>> share?  I'm in the same situation trying to 'sanitize' the query string
>>> coming in before handing it to solr.  I do see that characters like ":"
>>> could break the query, but am curious if anyone has come up with a
>>> general
>>> solution as I think this must be a fairly common problem for any solr
>>> deployment to tackle.
>>>
>>> Thanks,
>>> Michael
>>>
>>>
>>> Julian Davchev wrote:

 Hi,
 Is there anything special that can be done for sanitizing user input
 before passed as query to solr.
 Not allowing * and ? as first char is only thing I can thing of right
 now. Anything else it should somehow handle.

 I am not able to find any relevant document.


>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Segment file not found error - after replicating

2009-11-09 Thread Otis Gospodnetic

It's hard to troubleshoot blindly like this, but have you tried manually 
comparing the contents of the index dir on the master and on the slave(s)?
If they are out of sync, have you tried forcing of replication to see if one of 
the subsequent replication attempts gets the dirs in sync?
Do you have more than 1 slave and do they all start having this problem at the 
same time?
Any errors in the logs for any of the scripts involved in replication in 1.3?

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Sun, November 8, 2009 10:30:44 PM
> Subject: Segment file not found error - after replicating
> 
> Hi guys,
> 
> We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux 
> environment and use the replication scripts to make replicas those live in 
> load 
> balancing slaves.
> 
> The issue we face quite often (only in Linux servers) is that they tend to 
> not 
> been able to find the segment file (segment_x etc) after the replicating 
> completed. As this has become quite common, we started hitting a serious 
> issue.
> 
> Below is a stack trace, if that helps and any help on this matter is greatly 
> appreciated.
> 
> 
> 
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created gap: org.apache.solr.highlight.GapFragmenter
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created regex: org.apache.solr.highlight.RegexFragmenter
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created html: org.apache.solr.highlight.HtmlFormatter
> Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init
> SEVERE: Could not start SOLR. Check solr/home property
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /solrinstances/solrhome01/data/index/segments_v (No such file or directory)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
> at org.apache.solr.core.SolrCore.(SolrCore.java:470)
> at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> at 
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> at 
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> at 
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> at 
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> at 
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
> at 
> org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099)
> at 
> org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:916)
> at 
> org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:536)
> at 
> org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:114)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at com.jamonapi.JAMonFilter.doFilter(JAMonFilter.java:57)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at

Re: tracking solr response time

2009-11-09 Thread Otis Gospodnetic

Bharat,

No, you should not give the JVM so much memory.  Give it enough to avoid overly 
frequent GC, but don't steal memory from the OS cache.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: bharath venkatesh 
> To: solr-user@lucene.apache.org
> Sent: Sun, November 8, 2009 2:15:00 PM
> Subject: Re: tracking solr response time
> 
> Thanks  Lance for the clear explanation .. are you saying we should give
> solr JVM enough memory so that os cache can optimize disk I/O efficiently ..
> that means in our case we have  16 GB  index so  would it  be enough to
> allocated solr JVM 20GB memory and rely on the OS cache to optimize disk I/O
> i .e cache the index in memory  ??
> 
> 
> below is stats related to cache
> 
> 
> *name: * queryResultCache  *class: * org.apache.solr.search.LRUCache  *
> version: * 1.0  *description: * LRU Cache(maxSize=512, initialSize=512,
> autowarmCount=256,
> regenerator=org.apache.solr.search.solrindexsearche...@67e112b3)
> *stats: *lookups
> : 0
> hits : 0
> hitratio : 0.00
> inserts : 8
> evictions : 0
> size : 8
> cumulative_lookups : 15
> cumulative_hits : 7
> cumulative_hitratio : 0.46
> cumulative_inserts : 8
> cumulative_evictions : 0
> 
> 
> *name: * documentCache  *class: * org.apache.solr.search.LRUCache  *
> version: * 1.0  *description: * LRU Cache(maxSize=512, initialSize=512)  *
> stats: *lookups : 0
> hits : 0
> hitratio : 0.00
> inserts : 0
> evictions : 0
> size : 0
> cumulative_lookups : 744
> cumulative_hits : 639
> cumulative_hitratio : 0.85
> cumulative_inserts : 105
> cumulative_evictions : 0
> 
> 
> *name: * filterCache  *class: * org.apache.solr.search.LRUCache
> *version: *1.0
> *description: * LRU Cache(maxSize=512, initialSize=512, autowarmCount=256,
> regenerator=org.apache.solr.search.solrindexsearche...@1e3dbf67)
> *stats: *lookups
> : 0
> hits : 0
> hitratio : 0.00
> inserts : 20
> evictions : 0
> size : 12
> cumulative_lookups : 64
> cumulative_hits : 60
> cumulative_hitratio : 0.93
> cumulative_inserts : 12
> cumulative_evictions : 0
> 
> 
> hits and hit ratio are  zero for ducment cache , filter cache and query
> cache ..  only commulative hits and hitratio has a non zero numbers ..  is
> this how it is supposed to be .. or do we to configure it properly ?
> 
> Thanks,
> Bharath
> 
> 
> 
> 
> 
> On Sat, Nov 7, 2009 at 5:47 AM, Lance Norskog wrote:
> 
> > The OS cache is the memory used by the operating system (Linux or
> > Windows) to store a cache of the data stored on the disk. The cache is
> > usually by block numbers and are not correlated to files. Disk blocks
> > that are not used by programs are slowly pruned from the cache.
> >
> > The operating systems are very good at maintaining this cache. It
> > usually better to give the Solr JVM enough memory to run comfortably
> > and rely on the OS cache to optimize disk I/O, instead of giving it
> > all available ram.
> >
> > Solr has its own caches for certain data structures, and there are no
> > solid guidelines for tuning those. The solr/admin/stats.jsp page shows
> > the number of hits & deletes for the caches and most people just
> > reload that over & over.
> >
> > On Fri, Nov 6, 2009 at 3:09 AM, bharath venkatesh
> > wrote:
> > >>I have to state the obvious: you may really want to upgrade to 1.4 when
> > > it's out
> > >
> > > when would solr 1.4 be released .. is there any beta version available ?
> > >
> > >>We don't have the details, but a machine with 32 GB RAM and 16 GB index
> > > should have the whole index cached by >the OS
> > >
> > > do we have to configure solr  for the index to be cached  by OS in a
> > > optimised way   . how does this caching of index in memory happens ?  r
> > > there  any docs or link which gives details regarding the same
> > >
> > >>unless something else is consuming the memory or unless something is
> > > constantly throwing data out of the OS >cache (e.g. frequent index
> > > optimization).
> > >
> > > what are the factors which would cause constantly throwing data out of
> > the
> > > OS cache  (we are doing  index optimization only once in a day during
> > > midnight )
> > >
> > >
> > > Thanks,
> > > Bharath
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >

Re: sanizing/filtering query string for security

2009-11-09 Thread Otis Gospodnetic

Word of warning:
Careful with q.alt=*:* if you are dealing with large indices! :)

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Alexey Serba 
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 5:23:52 PM
> Subject: Re: sanizing/filtering query string for security
> 
> > BTW, I have not used DisMax handler yet, but does it handle *:* properly?
> See q.alt DisMax parameter
> http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt
> 
> You can specify q.alt=*:* and q as empty string to get all results.
> 
> > do you care if users issue this query
> I allow users to issue an empty search and get all results with all
> facets / etc. It's a nice navigation UI btw.
> 
> > Basically given my UI, I'm trying to *hide* the total count from users 
> searching for *everything*
> If you don't specify q.alt parameter then Solr returns zero results
> for empty search. *:* won't work either.
> 
> > though this syntax has helped me debug/monitor the state of my search doc 
> > pool 
> size.
> see q.alt
> 
> Alex
> 
> On Tue, Nov 10, 2009 at 12:59 AM, michael8 wrote:
> >
> > Sounds like a nice approach you have  done.  BTW, I have not used DisMax
> > handler yet, but does it handle *:* properly?  IOW, do you care if users
> > issue this query, or does DisMax treat this query string differently than
> > standard request handler?  Basically given my UI, I'm trying to *hide* the
> > total count from users searching for *everything*, though this syntax has
> > helped me debug/monitor the state of my search doc pool size.
> >
> > Thanks,
> > Michael
> >
> >
> > Alexey-34 wrote:
> >>
> >> I added some kind of pre and post processing of Solr results for this,
> >> i.e.
> >>
> >> If I find fieldname specified in query string in form of
> >> "fieldname:term" then I pass this query string to standard request
> >> handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
> >> doesn't break the query, at least I haven't seen yet ). If standard
> >> request handler throws error ( invalid field, too many clauses, etc )
> >> then I pass original query to DisMax request handler.
> >>
> >> Alex
> >>
> >> On Mon, Nov 9, 2009 at 10:05 PM, michael8 wrote:
> >>>
> >>> Hi Julian,
> >>>
> >>> Saw you post on exactly the question I have.  I'm curious if you got any
> >>> response directly, or figured out a way to do this by now that you could
> >>> share?  I'm in the same situation trying to 'sanitize' the query string
> >>> coming in before handing it to solr.  I do see that characters like ":"
> >>> could break the query, but am curious if anyone has come up with a
> >>> general
> >>> solution as I think this must be a fairly common problem for any solr
> >>> deployment to tackle.
> >>>
> >>> Thanks,
> >>> Michael
> >>>
> >>>
> >>> Julian Davchev wrote:
> 
>  Hi,
>  Is there anything special that can be done for sanitizing user input
>  before passed as query to solr.
>  Not allowing * and ? as first char is only thing I can thing of right
>  now. Anything else it should somehow handle.
> 
>  I am not able to find any relevant document.
> 
> 
> >>>
> >>> --
> >>> View this message in context:
> >>> 
> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >
> > --
> > View this message in context: 
> http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >

Re: CPU Max Utilization

2009-11-09 Thread Otis Gospodnetic

doFilter is a Servlet Filter method -- SolrDispatchFilter is a Servlet Filter 
and all requests go through that method.  You ened to dig deeper in your 
profiler.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: ba ba 
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 5:15:17 PM
> Subject: Re: CPU Max Utilization
> 
> After doing some more testing, I've seen the performance decrease yet again.
> It happens after solr has been run for about 1/2 hour. I left my test
> running over the weekend and saw the CPU usage go down to a reasonable level
> at the end of the weekend. It is the same problem where the CPU has maximum
> usage. I attached a profiler to the solr instance and found that 99% of the
> CPU time is spent in the doFilter method of the SolrDispatchFilter class.
> 
> Does anyone know why all of the CPU would be hogged on this particular
> method?
> 
> I'm requesting by relevance without sorting. I'm requesting 500 results per
> query. There are no repititions in the query set.
> 
> As for the fields. I'm using String and SortableInt fields. There are 3
> string fields and 3 Sortable Int fields in my schema. One of the String
> Fields is multivalued. The fields are quite small. Since its 18 GB for a 100
> million document index.
> 
> Thanks,
> Brad
> 
> 2009/11/6 ba ba 
> 
> > After looking at the question about the sorting. It seems that the schema
> > was using the SortableIntField class. When I did not return these fields in
> > the queries, I got reasonable CPU usage. If I search only on one of these
> > SortableIntFields, I get the bad query performance. I think the problem is
> > the schema is using a Sortable field when I don't need a sortable field.
> >
> > Thanks for the help.
> >
> > -Brad
> >
> > 2009/11/5 Otis Gospodnetic 
> >
> > You may also want to share some sample queries, your fields definitions,
> >> and tell us how long a core remains 100% utilized.
> >>
> >>  Otis
> >> --
> >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> >>
> >>
> >>
> >> - Original Message 
> >> > From: ba ba 
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Thu, November 5, 2009 9:20:13 PM
> >> > Subject: CPU Max Utilization
> >> >
> >> > Greetings,
> >> >
> >> > I'm running a solr instance with 100 million documents in it. The index
> >> is
> >> > 18 GB.
> >> >
> >> > The strange behavior I'm seeing is CPU utilization gets maxed out. I'm
> >> > running on an 8 core machine with 32 GB or ram. Every concurrent query I
> >> run
> >> > on it uses up one of the cores. So, if I am running 1 concurrent query
> >> I'm
> >> > using up the cpu of one of the cores. If I have 8 concurrent queries I'm
> >> > using up all of the cores.
> >> >
> >> > Is this normal to have such a high CPU utilization. If not, what am I
> >> doing
> >> > wrong here. The only thing I have modified is the schema.xml file to
> >> > correspond to the documents I want to store. Everything else is just
> >> using
> >> > the default values for all the config files.
> >> >
> >> > Thanks.
> >>
> >>
> >

Embedded Solr, creating new cores programatically

2009-11-09 Thread Jay Shollenberger

Hi folks,

I am working on a project in which we would like to create new cores
programatically within an embedded solr instance. My question is:  How much
of the directory/configuration file gruntwork do I have to do myself?  I
figure I have to create the solr.home directory myself, but do I have to
create solr.xml, as well as directories for each core I would like to run?
Do I have to create the configuration directory for each core manually, as
well as populate the required configuration files manually?

Thank you all,

Jay Shollenberger

Re: de-boosting certain facets during search

2009-11-09 Thread Paul Rosen

I'm still going around in a circle on this. I'm not sure why it's not 
sinking in...


If I could just create the desired URL, I can probably work backwards 
and construct the correct ruby call.


Here is the URL that I'm currently creating (I've added newlines here 
for readability):


http://localhost:8983/solr/resources/select?hl.fragsize=600
&hl=true
&facet.field=genre
&facet.field=archive
&facet.limit=-1
&qt=standard
&start=0
&fq=archive%3A%22blake%22
&hl.fl=text
&fl=uri%2Carchive%2Cdate_label%2Cgenre
&facet=true
&q=%28history%29
&rows=60
&facet.missing=true
&facet.mincount=1

What this search returns from my index is 53 hits. The first 43 contain 
the genre field value "Citation" and the last 10 do not (they contain 
other values in that field.)


Note: the genre field is multivalued, if that matters.

I'd like the search to put all of the objects that contain genre 
"Citation" below the 10 objects that do not contain that genre.


I've read the various pages on boosting, but since I'm not actively 
searching on the field that I want to put a boost value on, I'm not sure 
how to go about this.


Thanks for any hints.

Paul Rosen wrote:

Hi,

I'm using solr-ruby-0.0.8 and solr 1.4.

My data contains a faceted field called "genre". We would like one 
particular genre, (the one named "Citation") to show up last in the 
results.


I'm having trouble figuring out how to add the boost parameter to the 
solr-ruby call. Here is my code:


req = Solr::Request::Standard.new(:start => start,
  :rows => max,
  :sort => sort_param,
  :query => query,
  :filter_queries => filter_queries,
  :field_list => @field_list,
  :facets => {:fields => @facet_fields,
:mincount => 1,
:missing => true,
:limit => -1},
  :highlighting => {:field_list => ['text'],
:fragment_size => 600},
:shards => @cores)

response = @solr.send(req)

Do I just format it inside my query, like this:

query = query + "AND genre:Citation^.01"

or in filter_query, like this:

filter_queries.push("genre:Citation^.01")

or is there a hash parameter that I set?

(Note that the user can select Citation explicitly. I'll probably 
special case that.)


I've tried variations of the above, but I've had no luck so far.

Thanks,
Paul

Oddness with Phrase Query

2009-11-09 Thread Simon Wistow

I have a document with the title "Here, there be dragons" and a body.

When I search for 

q  = Here, there be dragons
qf = title^2.0 body^0.8
qt = dismax

Which is parsed as 

+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here 
dragon"^2.0)~0.01) ()

I get the document as the first hit which is what I'd suspect.

However, if change the query to 

q  = "Here, there be dragons"

(with quotes)

which is parsed as

+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here 
dragon"^2.0)~0.01) ()

then I don't get the document at all. Which is not what I'd suspect.

I've tried modifying the phrase slop but still don't get any results 
back.

Am I doing something wrong - do I have to have an untokenized copy of 
fields lying around?

Thanks,

Simon

Solr 1.4 - Pre release post (portuguese)

2009-11-09 Thread Lucas F. A. Teixeira

Hello all,

I've blogged about the Solr 1.4 and some of its features in my blog.
It's written in Brazilian Portuguese, hope you all can enjoy (thanks google
translate).

The link is: http://lucastex.com.br/2009/11/09/solr-1-4-mais-que-pronto/


[]s,


Lucas Frare Teixeira .·.
- lucas...@gmail.com
- lucastex.com.br
- blog.lucastex.com
- twitter.com/lucastex

RE: Segment file not found error - after replicating

2009-11-09 Thread Maduranga Kannangara

Thanks Otis!

Yes, I checked the index directories and they are 100% same, both timestamp and 
size wise.

Not all the slaves face this issue. I would say roughly 50% has this trouble.

Logs do not have any errors too :-(

Any other things I should do/look at?

Cheers
Madu


-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Tuesday, 10 November 2009 9:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Segment file not found error - after replicating

It's hard to troubleshoot blindly like this, but have you tried manually 
comparing the contents of the index dir on the master and on the slave(s)?
If they are out of sync, have you tried forcing of replication to see if one of 
the subsequent replication attempts gets the dirs in sync?
Do you have more than 1 slave and do they all start having this problem at the 
same time?
Any errors in the logs for any of the scripts involved in replication in 1.3?

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Sun, November 8, 2009 10:30:44 PM
> Subject: Segment file not found error - after replicating
> 
> Hi guys,
> 
> We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux 
> environment and use the replication scripts to make replicas those live in 
> load 
> balancing slaves.
> 
> The issue we face quite often (only in Linux servers) is that they tend to 
> not 
> been able to find the segment file (segment_x etc) after the replicating 
> completed. As this has become quite common, we started hitting a serious 
> issue.
> 
> Below is a stack trace, if that helps and any help on this matter is greatly 
> appreciated.
> 
> 
> 
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created gap: org.apache.solr.highlight.GapFragmenter
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created regex: org.apache.solr.highlight.RegexFragmenter
> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load
> INFO: created html: org.apache.solr.highlight.HtmlFormatter
> Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init
> SEVERE: Could not start SOLR. Check solr/home property
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /solrinstances/solrhome01/data/index/segments_v (No such file or directory)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
> at org.apache.solr.core.SolrCore.(SolrCore.java:470)
> at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> at 
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> at 
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> at 
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> at 
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> at 
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
> at 
> org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099)
> at 
> org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:916)
> at 
> org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:536)
> at 
> org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:114)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at com.jamonapi.JAMonFilter.doFilter(JAMonFilter.java:57)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>

RE: Solr on OOM

2009-11-09 Thread Vauthrin, Laurent

Ok, I'll see if I can get their heap dump.  Thanks.

-Original Message-
From:
solr-user-return-29005-laurent.vauthrin=disney@lucene.apache.org
[mailto:solr-user-return-29005-laurent.vauthrin=disney@lucene.apache
.org] On Behalf Of Otis Gospodnetic
Sent: Monday, November 09, 2009 2:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr on OOM

Laurent,

The autowarmCounts look biggest, but they are probably not causing OOMs.
Maybe you can see how big the caches are right before you OOM.
Or you can also start the JVM with -XX:+HeapDumpOnOutOfMemoryError and
even specify the file where the heap should be dumped.  You can then
analyze it and see what's eating the memory.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: "Vauthrin, Laurent" 
> To: solr-user@lucene.apache.org
> Sent: Mon, November 9, 2009 2:22:26 PM
> Subject: Solr on OOM
> 
> Hello,
> 
> 
> 
> One of our deployed Solr (1.3) setup is having out of memory issues
and
> I'm not sure how to troubleshoot it.  I've read a few posts (including
>
http://old.nabble.com/Debugging-Solr-memory-usage-heap-problems-ts883279
> 4.html#a8832794) but I think this situation is slightly different.
> 
> 
> 
> Here's the setup:
> 
> 1 master and 1 slave are located on a the same VM (using a 64-bit JVM)
> 
> 1 slave running on its own VM (using a 64-bit JVM)
> 
> From what I've been told, nothing else is running on those VMs.
> 
> Index size is about 100-200 MB.
> 
> 
> 
> Solrconfig.xml cache settings:
> 
> 
> 
> 
> autowarmCount="5000"/>
> 
> 
> initialSize="5000" autowarmCount="5000"/>
> 
> 
> initialSize="5000"/>
> 
> 
> 
> Both slaves at some point have gone out of memory (though not both at
> the same time) when receiving a moderate load of queries (a few
queries
> per second - don't have an exact stat here).  We started with a heap
> size of 1GB and ended up having to bump it up to 3.5GB.  It seems
really
> odd that we'd have to have a heap size that large when the index
itself
> is not really big.  Any thoughts on what could be really off here?
Is
> there a way to determine the cache sizes in bytes?  I noticed that
there
> was a thread about other issues running Solr on VMs, has anyone else
had
> problems using VMWare?  From what I'm told, it seems like moving to
> physical servers won't be a fast/easy change so I'm looking for any
help
> I can get for this configuration.
> 
> 
> 
> Thanks,
> Laurent Vauthrin

Re: Lowering ranking of certain documents while search in Solr

2009-11-09 Thread Lance Norskog

The ExternalFileField is the tool for this.

It is a field type which reads an array of floating point values from
a file, one per document. You can boost on it. (I don't know if sort
works.) There is no direct documentation that I found. Here are some
hints.

http://www.lucidimagination.com/search/?q=ExternalFileField

http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html


On Fri, Nov 6, 2009 at 9:44 PM, Bhavnik Gajjar
 wrote:
> Thanks again,
>
> This solution requires me to re-index 2 millions documents again in Solr!!!
>
> I would prefer something which can be done in ad-hoc way on my existing 
> working/live application.
>
> Also, regarding the solution [q=hello +(source:source1^100 source:source2^0.1 
> ...)], I think if sorting is being done on some other field (rather, score 
> desc) then it will messup documents!
>
> Is there a way to use function query or so then?
>
> Regards,
> Bhavnik Gajjar
>
> - Original Message -
> From: "Avlesh Singh" 
> To: 
> Sent: Friday, November 06, 2009 10:25 PM
> Subject: Re: Lowering ranking of certain documents while search in Solr
>
>
>> How about adding an extra sint field, say "source_boost". For your 199
>> sources the indexed value can be 0. For the "underprivileged" source it can
>> be -1. Adding "sort=source_boost desc,score desc" would do the needful.
>>
>> Works?
>>
>> Cheers
>> Avlesh
>>
>> On Fri, Nov 6, 2009 at 4:55 PM, Bhavnik Gajjar <
>> bhavnik.gaj...@gatewaynintec.com> wrote:
>>
>>> Thanks for your inputs,
>>>
>>> This solution works in general!!
>>>
>>> In my system, there are around 200 different sources and out of that only
>>> one source is needed with lower rank. Now, if I go with the solution you
>>> provided then I need 199 sources with individual boost and then one source
>>> with lower boost value. Now, I'm afraid that, this would cause performance
>>> issue in Solr. Is there any other solution to work with this then?
>>>
>>> Note: our massive application is already built with standard request
>>> handler (I can't use dismax at this stage!!)
>>>
>>> Seeking for positive reply!
>>>
>>> Thanks again,
>>> Bhavnik Gajjar
>>>
>>>
>>> - Original Message -
>>> From: "Avlesh Singh" 
>>> To: 
>>> Sent: Thursday, November 05, 2009 12:00 PM
>>> Subject: Re: Lowering ranking of certain documents while search in Solr
>>>
>>>
>>> > The query "q=hello&start=0&rows=20&fl=Source&bq=(*:*
>>> -Source:Source2^1)"
>>> > should work as expected. There can be two reasons for it not to work -
>>> >
>>> >   1. You might be using a StandardRequestHandler. Try Dismax.
>>> >   2. Your field "Source" might not be indexed.
>>> >
>>> > With the StandardRequestHandler, if your "sources" are known (and
>>> limited),
>>> > you can try this too -
>>> > q=hello +(source:source1^100 source:source2^0.1 ...)
>>> >
>>> > Cheers
>>> > Avlesh
>>> >
>>> > On Thu, Nov 5, 2009 at 11:42 AM, Bhavnik Gajjar <
>>> > bhavnik.gaj...@gatewaynintec.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I have tried with
>>> >> q=hello&start=0&rows=20&fl=Source&bq=(*:* -Source:Source2^1)
>>> >>
>>> >> But it doesn't change any ordering of results
>>> >>
>>> >> Am I doing wrong? Please advise.
>>> >>
>>> >> Note: We are also using _val_ hook in Solr query to get recent documents
>>> on
>>> >> top of results. (_val_:"recip(rord(DateAdded),1,1000,1000)")
>>> >>
>>> >> It would be really good if you can give me an exact example of Solr
>>> query
>>> >> that meets my need
>>> >>
>>> >> Thanks,
>>> >> Bhavnik
>>> >>
>>> >> - Original Message -
>>> >> From: "Avlesh Singh" 
>>> >> To: 
>>> >> Sent: Thursday, November 05, 2009 11:19 AM
>>> >> Subject: Re: Lowering ranking of certain documents while search in Solr
>>> >>
>>> >>
>>> >> > This is what I meant -
>>> >> >
>>> >>
>>> http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_very_low_boost_to_documents_that_match_my_query
>>> >> >
>>> >> > Cheers
>>> >> > Avlesh
>>> >> >
>>> >> > On Thu, Nov 5, 2009 at 11:13 AM, Bhavnik Gajjar <
>>> >> > bhavnik.gaj...@gatewaynintec.com> wrote:
>>> >> >
>>> >> >> Hi Avlesh,
>>> >> >>
>>> >> >> Thanks for your inputs.
>>> >> >>
>>> >> >> What exactly I mean is.
>>> >> >>
>>> >> >> Solr has two fields named, Text and Source. Search is performed in
>>> >> [Text]
>>> >> >> field. Now suppose 1000 results come for a search. By default, this
>>> >> search
>>> >> >> returns records according to boost factor specified in Solr query
>>> >> already.
>>> >> >> Result would look somewhat like..
>>> >> >> Documents with value in [Source] Solr field
>>> >> >> Source1
>>> >> >> Source2
>>> >> >> Source3
>>> >> >> Source1
>>> >> >> Source3
>>> >> >> Source2
>>> >> >> etc..
>>> >> >>
>>> >> >> But, what we want to achieve is.. this search should return documents
>>> >> which
>>> >> >> has value [Source2] in [Source] field should come with the lowest
>>> >> ranking
>>> >> >> like,
>>> >> >> Source1
>>> >> >> Source3
>>> >> >> Source1
>>> >> >> Source3
>>> >> >> Source2
>>> >> >>

Re: Lowering ranking of certain documents while search in Solr

2009-11-09 Thread Lance Norskog

Some hard-won knowledge: don't be afraid to re-index. Changing the
schema without re-indexing can lead to mountains of trouble.

On Mon, Nov 9, 2009 at 5:36 PM, Lance Norskog  wrote:
> The ExternalFileField is the tool for this.
>
> It is a field type which reads an array of floating point values from
> a file, one per document. You can boost on it. (I don't know if sort
> works.) There is no direct documentation that I found. Here are some
> hints.
>
> http://www.lucidimagination.com/search/?q=ExternalFileField
>
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>
>
> On Fri, Nov 6, 2009 at 9:44 PM, Bhavnik Gajjar
>  wrote:
>> Thanks again,
>>
>> This solution requires me to re-index 2 millions documents again in Solr!!!
>>
>> I would prefer something which can be done in ad-hoc way on my existing 
>> working/live application.
>>
>> Also, regarding the solution [q=hello +(source:source1^100 
>> source:source2^0.1 ...)], I think if sorting is being done on some other 
>> field (rather, score desc) then it will messup documents!
>>
>> Is there a way to use function query or so then?
>>
>> Regards,
>> Bhavnik Gajjar
>>
>> - Original Message -
>> From: "Avlesh Singh" 
>> To: 
>> Sent: Friday, November 06, 2009 10:25 PM
>> Subject: Re: Lowering ranking of certain documents while search in Solr
>>
>>
>>> How about adding an extra sint field, say "source_boost". For your 199
>>> sources the indexed value can be 0. For the "underprivileged" source it can
>>> be -1. Adding "sort=source_boost desc,score desc" would do the needful.
>>>
>>> Works?
>>>
>>> Cheers
>>> Avlesh
>>>
>>> On Fri, Nov 6, 2009 at 4:55 PM, Bhavnik Gajjar <
>>> bhavnik.gaj...@gatewaynintec.com> wrote:
>>>
 Thanks for your inputs,

 This solution works in general!!

 In my system, there are around 200 different sources and out of that only
 one source is needed with lower rank. Now, if I go with the solution you
 provided then I need 199 sources with individual boost and then one source
 with lower boost value. Now, I'm afraid that, this would cause performance
 issue in Solr. Is there any other solution to work with this then?

 Note: our massive application is already built with standard request
 handler (I can't use dismax at this stage!!)

 Seeking for positive reply!

 Thanks again,
 Bhavnik Gajjar


 - Original Message -
 From: "Avlesh Singh" 
 To: 
 Sent: Thursday, November 05, 2009 12:00 PM
 Subject: Re: Lowering ranking of certain documents while search in Solr


 > The query "q=hello&start=0&rows=20&fl=Source&bq=(*:*
 -Source:Source2^1)"
 > should work as expected. There can be two reasons for it not to work -
 >
 >   1. You might be using a StandardRequestHandler. Try Dismax.
 >   2. Your field "Source" might not be indexed.
 >
 > With the StandardRequestHandler, if your "sources" are known (and
 limited),
 > you can try this too -
 > q=hello +(source:source1^100 source:source2^0.1 ...)
 >
 > Cheers
 > Avlesh
 >
 > On Thu, Nov 5, 2009 at 11:42 AM, Bhavnik Gajjar <
 > bhavnik.gaj...@gatewaynintec.com> wrote:
 >
 >> Hi,
 >>
 >> I have tried with
 >> q=hello&start=0&rows=20&fl=Source&bq=(*:* -Source:Source2^1)
 >>
 >> But it doesn't change any ordering of results
 >>
 >> Am I doing wrong? Please advise.
 >>
 >> Note: We are also using _val_ hook in Solr query to get recent documents
 on
 >> top of results. (_val_:"recip(rord(DateAdded),1,1000,1000)")
 >>
 >> It would be really good if you can give me an exact example of Solr
 query
 >> that meets my need
 >>
 >> Thanks,
 >> Bhavnik
 >>
 >> - Original Message -
 >> From: "Avlesh Singh" 
 >> To: 
 >> Sent: Thursday, November 05, 2009 11:19 AM
 >> Subject: Re: Lowering ranking of certain documents while search in Solr
 >>
 >>
 >> > This is what I meant -
 >> >
 >>
 http://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_very_low_boost_to_documents_that_match_my_query
 >> >
 >> > Cheers
 >> > Avlesh
 >> >
 >> > On Thu, Nov 5, 2009 at 11:13 AM, Bhavnik Gajjar <
 >> > bhavnik.gaj...@gatewaynintec.com> wrote:
 >> >
 >> >> Hi Avlesh,
 >> >>
 >> >> Thanks for your inputs.
 >> >>
 >> >> What exactly I mean is.
 >> >>
 >> >> Solr has two fields named, Text and Source. Search is performed in
 >> [Text]
 >> >> field. Now suppose 1000 results come for a search. By default, this
 >> search
 >> >> returns records according to boost factor specified in Solr query
 >> already.
 >> >> Result would look somewhat like..
 >> >> Documents with value in [Source] Solr field
 >> >> Source1
 >> >> Source2
 >> >> Source3
 >> >> Source1
 >> >> Source3
>

Re: keep index in production and snapshots in separate phisical disks

2009-11-09 Thread Chris Hostetter


: Is there any way to make snapinstaller install the index in
: spanpshot20091023124543 (for example) from another disk? I am asking this

you're talking about the old script based replication correct? ... i don't 
think that's possible since it relied on hardlinks and atomic move 
operations.  i don't think those work across physical disks.

: because I would like not to optimize the index in the master (if I do that
: it takes a long time to send it via rsync if it is so big). This way I would
: just have to send the new segments.

in Solr 1.4 the disadvantages of having a non-optimized index are fading 
away ... before you worry too much about this you might want to run some 
tests and verify that you really need to optimize to meet your performance 
targets.

: In the slave I would have 2 phisical disks. Snappuller would send the
: snapshot to a disk (here the index would not be optimized). Snapinstaller
: would install the snapshot in the other disk, optimize it and open the
: newIndexReader. The optimization should be done in the disk wich contains
: the "not in production index" to not affect the search request speed.
: Any idea what should I hack to reach this goal in case it is possible?

I'm not really familiar with the new java based replication code, but i 
suspect this could be setup fairly easily by running two solr instances on 
your slave boxes ... one serving as a repeater (ie: both a slave and a 
master) to the other.  on the repeater port you would run the optimize, 
and then the leaf level slave (serving queries to end users) would 
replicate that optimized index over the loopback address (no network 
overhead, should be ~fast as a file copy from a different disk)

...this is all just theory mind you, i've never tried this.


-Hoss

Re: Specifying multiple documents in DataImportHandler dataConfig

2009-11-09 Thread Lance Norskog

There is a more fundamental problem here: Solr/Lucene index only
implements one table. If you have data from multiple tables in a
normalized index, you have denormalize the multi-table DB schema to
make a single-table Solr/Lucene index.

Your indexing will probably be faster if you a join in SQL to supply
your entire set of fields per database request.

2009/11/7 Noble Paul നോബിള്‍  नोब्ळ् :
> On Sun, Nov 8, 2009 at 8:25 AM, Bertie Shen  wrote:
>> I have figured out a way to solve this problem: just specify a
>> single  blah blah blah . Under , specify
>> multiple top level entity entries, each of which corresponds to one table
>> data.
>>
>> So each top level entry will map one row in it to a document in Lucene
>> index.  in DIH is *NOT* mapped to a document in Lucene index while
>> top-level entity is. I feel  tag is redundant and misleading in
>> data config and thus should be removed.
>
> There are some common attributes specified at the  level .
> It still acts as a container tag .
>>
>> Cheers.
>>
>> On Sat, Nov 7, 2009 at 9:43 AM, Bertie Shen  wrote:
>>
>>> I have the same problem. I had thought we could specify multiple 
>>> blah blah blahs, each of which is mapping one table in the RDBMS.
>>> But I found it was not the case. It only picks the first blah blah
>>> blah to do indexing.
>>>
>>> I think Rupert's  and my request are pretty common. Basically there are
>>> multiple tables in RDBMS, and we want each row in each table become a
>>> document in Lucene index. How can we write one data config.xml file to let
>>> DataImportHandler import multiple tables at the same time?
>>>
>>> Rupert, have you figured out a way to do it?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> On Tue, Sep 8, 2009 at 3:42 PM, Rupert Fiasco  wrote:
>>>
 Maybe I should be more clear: I have multiple tables in my DB that I
 need to save to my Solr index. In my app code I have logic to persist
 each table, which maps to an application model to Solr. This is fine.
 I am just trying to speed up indexing time by using DIH instead of
 going through my application. From what I understand of DIH I can
 specify one dataSource element and then a series of document/entity
 sets, for each of my models. But like I said before, DIH only appears
 to want to index the first document declared under the dataSource tag.

 -Rupert

 On Tue, Sep 8, 2009 at 4:05 PM, Rupert Fiasco wrote:
 > I am using the DataImportHandler with a JDBC datasource. From my
 > understanding of DIH, for each of my "content types" e.g. Blog posts,
 > Mesh Categories, etc I would construct a series of document/entity
 > sets, like
 >
 > 
 > 
 >
 >    
 >    
 >      
 >        
 >        
 >        
 >        
 >      
 >    
 >
 >    
 >    
 >      
 >        
 >        
 >        
 >        
 >        
 >        
 >      
 >    
 > 
 > 
 >
 >
 > Solr parses this just fine and allows me to issue a
 > /dataimport?command=full-import and it runs, but it only runs against
 > the "first" document (blog_entries). It doesnt run against the 2nd
 > document (mesh_categories).
 >
 > If I remove the 2 document elements and wrap both entity sets in just
 > one document tag, then both sets get indexed, which seemingly achieves
 > my goal. This just doesnt make sense from my understanding of how DIH
 > works. My 2 content types are indeed separate so they logically
 > represent two document types, not one.
 >
 > Is this correct? What am I missing here?
 >
 > Thanks
 > -Rupert
 >

>>>
>>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Lance Norskog
goks...@gmail.com

Re: synonym payload boosting

2009-11-09 Thread Lance Norskog

David, when you get this working would you consider writing a case
study on the wiki? Nothing complex, just something that describes how
you did several customizations to create a new feature.

On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll  wrote:
>
> On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:
>
>> I have found this
>>
>> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> patch
>> But i don't want to use any function, just the normal scoring and the
>> similarity class  I have written.
>> Can you point me to  modifications I need (if any) ?
>>
>>
>
> Amhet's point is that you need some query that will actually invoke the
> payload in scoring.  PayloadTermQuery and PayloadNearQuery are the two that
> do this in Lucene.  You can certainly write your own, as well.
>
> -Grant
>
>>
>> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN  wrote:
>>
>>> Additionaly you need to modify your queryparser to return
>>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>>>
>>> With these types of Queries scorePayload method invoked.
>>>
>>> Hope this helps.
>>>
>>> --- On Sun, 11/8/09, David Ginzburg  wrote:
>>>
 From: David Ginzburg 
 Subject: synonym payload boosting
 To: solr-user@lucene.apache.org
 Date: Sunday, November 8, 2009, 4:06 PM
 Hi,
 I have a field and a wighted synonym map.
 I have indexed the synonyms with the weight as payload.
 my code snippet from my filter

 *public Token next(final Token reusableToken) throws
 IOException *
 *        . *
 *        . *
 *        .*
      * Payload boostPayload;*
 *
 *
 *        for (Synonym synonym : syns)
 {*
 *            *
 *            Token newTok =
 new Token(nToken.startOffset(),
 nToken.endOffset(), "SYNONYM");*
 *
 newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
 synonym.getToken().length());*
 *            // set the
 position increment to zero*
 *            // this tells
 lucene the synonym is*
 *            // in the exact
 same location as the originating word*
 *
 newTok.setPositionIncrement(0);*
 *            boostPayload =
 new
 Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
 *
 newTok.setPayload(boostPayload);*
 *
 *
 I have put it in the index time analyzer : this is my field
 definition:

 *
 >>> positionIncrementGap="100" >
     
       >>> class="solr.WhitespaceTokenizerFactory"/>
       >>> class="solr.StopFilterFactory" ignoreCase="true"
 words="stopwords.txt"/>
       >>> class="solr.LowerCaseFilterFactory"/>
       >>> class="com.digitaltrowel.solr.DTSynonymFactory"
 FreskoFunction="names_with_scoresPipe23Columns.txt"
 ignoreCase="true"
 expand="false"/>

       
       
     
     
       >>> class="solr.WhitespaceTokenizerFactory"/>
       >>> class="solr.LowerCaseFilterFactory"/>
       
       >>> class="solr.StopFilterFactory" ignoreCase="true"
 words="stopwords.txt"/>
       
       

     
   


 my similarity class is
 public class BoostingSymilarity extends DefaultSimilarity
 {


   public BoostingSymilarity(){
       super();

  }
   @Override
   public  float scorePayload(String field,
 byte [] payload, int offset,
 int length)
 {
 double weight = PayloadHelper.decodeFloat(payload, 0);
 return (float)weight;
 }

 @Override public float coord(int overlap, int maxoverlap)
 {
 return 1.0f;
 }

 @Override public float idf(int docFreq, int numDocs)
 {
 return 1.0f;
 }

 @Override public float lengthNorm(String fieldName, int
 numTerms)
 {
 return 1.0f;
 }

 @Override public float tf(float freq)
 {
 return 1.0f;
 }
 }

 My problem is that scorePayload method does not get called
 at search time
 like the other methods in  my similarity class.
 I tested and verified it with break points.
 What am I doing wrong?
 I used solr 1.3 and thinking of the payload boos support in
 solr 1.4.


 *

>>>
>>> __
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>
>>
>>
>> --
>> Regards
>>
>> _
>> David Ginzburg
>> Developer, Digital Trowel
>> 1 Hayarden St., Airport City
>> [POB 169, NATBAG]
>> Lod, 70151, Israel
>> http://www.digitaltrowel.com/
>> Office: +972 73 240 522
>> Mobile: +972 50 496 0595
>>
>> CHECK OUT OUR NEW TEXT MINING BLOG:
>> http://mineyourbusiness.wordpress.com/
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>

Re: Segment file not found error - after replicating

2009-11-09 Thread Otis Gospodnetic

Madu,

So are you saying that all slaves have the exact same index, and that index is 
exactly the same as the one on the master, yet only some of those slaves 
exhibit this error, while others do not?  Mind listing index directories of 1) 
master 2) slave without errors, 3) slave with errors and doing:
du -s /path/to/index/on/master
du -s /path/to/index/on/slave/without/errors
du -s /path/to/index/on/slave/with/errors


Otis 
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Maduranga Kannangara 
> To: "solr-user@lucene.apache.org" 
> Sent: Mon, November 9, 2009 7:47:04 PM
> Subject: RE: Segment file not found error - after replicating
> 
> Thanks Otis!
> 
> Yes, I checked the index directories and they are 100% same, both timestamp 
> and 
> size wise.
> 
> Not all the slaves face this issue. I would say roughly 50% has this trouble.
> 
> Logs do not have any errors too :-(
> 
> Any other things I should do/look at?
> 
> Cheers
> Madu
> 
> 
> -Original Message-
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
> Sent: Tuesday, 10 November 2009 9:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Segment file not found error - after replicating
> 
> It's hard to troubleshoot blindly like this, but have you tried manually 
> comparing the contents of the index dir on the master and on the slave(s)?
> If they are out of sync, have you tried forcing of replication to see if one 
> of 
> the subsequent replication attempts gets the dirs in sync?
> Do you have more than 1 slave and do they all start having this problem at 
> the 
> same time?
> Any errors in the logs for any of the scripts involved in replication in 1.3?
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
> > From: Maduranga Kannangara 
> > To: "solr-user@lucene.apache.org" 
> > Sent: Sun, November 8, 2009 10:30:44 PM
> > Subject: Segment file not found error - after replicating
> > 
> > Hi guys,
> > 
> > We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux 
> > environment and use the replication scripts to make replicas those live in 
> load 
> > balancing slaves.
> > 
> > The issue we face quite often (only in Linux servers) is that they tend to 
> > not 
> 
> > been able to find the segment file (segment_x etc) after the replicating 
> > completed. As this has become quite common, we started hitting a serious 
> issue.
> > 
> > Below is a stack trace, if that helps and any help on this matter is 
> > greatly 
> > appreciated.
> > 
> > 
> > 
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created gap: org.apache.solr.highlight.GapFragmenter
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created regex: org.apache.solr.highlight.RegexFragmenter
> > Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader 
> > load
> > INFO: created html: org.apache.solr.highlight.HtmlFormatter
> > Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init
> > SEVERE: Could not start SOLR. Check solr/home property
> > java.lang.RuntimeException: java.io.FileNotFoundException: 
> > /solrinstances/solrhome01/data/index/segments_v (No such file or directory)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
> > at org.apache.solr.core.SolrCore.(SolrCore.java:470)
> > at 
> > 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> > at 
> > 
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> > at 
> > 
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> > at 
> > 
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> > at 
> > 
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> > at 
> > org.apache.catalina.core.StandardContext.start(StandardContext.java:4363)
> > at 
> > org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099)
> > at 
> > org.apache.catalina

Re: de-boosting certain facets during search

2009-11-09 Thread Erik Hatcher


Paul,

Inline below...

On Nov 9, 2009, at 6:28 PM, Paul Rosen wrote:
If I could just create the desired URL, I can probably work  
backwards and construct the correct ruby call.


Right, this list will always serve you best if you take the Ruby out  
of the equation.  solr-ruby, while cool and all, isn't very well known  
by many, but Solr URLs are universal lingo here.



http://localhost:8983/solr/resources/select?hl.fragsize=600
&hl=true
&facet.field=genre
&facet.field=archive
&facet.limit=-1
&qt=standard
&start=0
&fq=archive%3A%22blake%22
&hl.fl=text
&fl=uri%2Carchive%2Cdate_label%2Cgenre
&facet=true
&q=%28history%29
&rows=60
&facet.missing=true
&facet.mincount=1

What this search returns from my index is 53 hits. The first 43  
contain the genre field value "Citation" and the last 10 do not  
(they contain other values in that field.)


Note: the genre field is multivalued, if that matters.


It matters if you want to sort by genre.  It doesn't make sense to  
sort by a multivalued field though.


I'd like the search to put all of the objects that contain genre  
"Citation" below the 10 objects that do not contain that genre.


Are you dogmatic about them _all_ appearing below?  Or might it be ok  
if a Citation that has substantially better term matching than another  
type of object appear ahead in the results?


I've read the various pages on boosting, but since I'm not actively  
searching on the field that I want to put a boost value on, I'm not  
sure how to go about this.


How this is done is dependent on the query parser.  You're using the  
Lucene query parser.  Something like this might work for you:


http://localhost:8983/solr/select?q=ipod%20%20OR%20%28ipod%20-manu:Belkin%29 
^5&debugQuery=true


unurlencoded, that is q=ipod  OR (ipod -manu:Belkin)^5, where the  
users query is repeated in a second clause that boosts up all  
documents that are not of a particular manufacturer using the example  
docs that Solr ships with.


Be sure to use debugQuery=true to look at the score explanations (try  
looking at the output in the wt=ruby&indent=on format for best  
readability).


Additionally...




Thanks for any hints.

Paul Rosen wrote:

Hi,
I'm using solr-ruby-0.0.8 and solr 1.4.
My data contains a faceted field called "genre". We would like one  
particular genre, (the one named "Citation") to show up last in the  
results.
I'm having trouble figuring out how to add the boost parameter to  
the solr-ruby call. Here is my code:

req = Solr::Request::Standard.new(:start => start,
 :rows => max,
 :sort => sort_param,
 :query => query,
 :filter_queries => filter_queries,
 :field_list => @field_list,
 :facets => {:fields => @facet_fields,
   :mincount => 1,
   :missing => true,
   :limit => -1},
 :highlighting => {:field_list => ['text'],
   :fragment_size => 600},
   :shards => @cores)
response = @solr.send(req)
Do I just format it inside my query, like this:
query = query + "AND genre:Citation^.01"
or in filter_query, like this:
filter_queries.push("genre:Citation^.01")
or is there a hash parameter that I set?


filter queries (fq) do not contribute to the score, so boosting them  
makes no score difference at all.


(Note that the user can select Citation explicitly. I'll probably  
special case that.)

I've tried variations of the above, but I've had no luck so far.
Thanks,
Paul




Erik

Re: Specifying multiple documents in DataImportHandler dataConfig

2009-11-09 Thread Bertie Shen

HI Lance,

 I think you are discussing a different issue here.  We are talking about
each row from each table represents a document in index. You look to discuss
about some documents may have multi-value fields which are stored in a
separate table in RDBMS because of normalization.



On Mon, Nov 9, 2009 at 6:01 PM, Lance Norskog  wrote:

> There is a more fundamental problem here: Solr/Lucene index only
> implements one table. If you have data from multiple tables in a
> normalized index, you have denormalize the multi-table DB schema to
> make a single-table Solr/Lucene index.
>
> Your indexing will probably be faster if you a join in SQL to supply
> your entire set of fields per database request.
>
> 2009/11/7 Noble Paul നോബിള്‍  नोब्ळ् :
> > On Sun, Nov 8, 2009 at 8:25 AM, Bertie Shen 
> wrote:
> >> I have figured out a way to solve this problem: just specify a
> >> single  blah blah blah . Under , specify
> >> multiple top level entity entries, each of which corresponds to one
> table
> >> data.
> >>
> >> So each top level entry will map one row in it to a document in Lucene
> >> index.  in DIH is *NOT* mapped to a document in Lucene index
> while
> >> top-level entity is. I feel  tag is redundant and misleading
> in
> >> data config and thus should be removed.
> >
> > There are some common attributes specified at the  level .
> > It still acts as a container tag .
> >>
> >> Cheers.
> >>
> >> On Sat, Nov 7, 2009 at 9:43 AM, Bertie Shen 
> wrote:
> >>
> >>> I have the same problem. I had thought we could specify multiple
> 
> >>> blah blah blahs, each of which is mapping one table in the
> RDBMS.
> >>> But I found it was not the case. It only picks the first blah
> blah
> >>> blah to do indexing.
> >>>
> >>> I think Rupert's  and my request are pretty common. Basically there are
> >>> multiple tables in RDBMS, and we want each row in each table become a
> >>> document in Lucene index. How can we write one data config.xml file to
> let
> >>> DataImportHandler import multiple tables at the same time?
> >>>
> >>> Rupert, have you figured out a way to do it?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>
> >>> On Tue, Sep 8, 2009 at 3:42 PM, Rupert Fiasco 
> wrote:
> >>>
>  Maybe I should be more clear: I have multiple tables in my DB that I
>  need to save to my Solr index. In my app code I have logic to persist
>  each table, which maps to an application model to Solr. This is fine.
>  I am just trying to speed up indexing time by using DIH instead of
>  going through my application. From what I understand of DIH I can
>  specify one dataSource element and then a series of document/entity
>  sets, for each of my models. But like I said before, DIH only appears
>  to want to index the first document declared under the dataSource tag.
> 
>  -Rupert
> 
>  On Tue, Sep 8, 2009 at 4:05 PM, Rupert Fiasco
> wrote:
>  > I am using the DataImportHandler with a JDBC datasource. From my
>  > understanding of DIH, for each of my "content types" e.g. Blog
> posts,
>  > Mesh Categories, etc I would construct a series of document/entity
>  > sets, like
>  >
>  > 
>  >  />
>  >
>  >
>  >
>  >  
>  >
>  >
>  >
>  >
>  >  
>  >
>  >
>  >
>  >
>  >  
>  >
>  >
>  >
>  >
>  >
>  >
>  >  
>  >
>  > 
>  > 
>  >
>  >
>  > Solr parses this just fine and allows me to issue a
>  > /dataimport?command=full-import and it runs, but it only runs
> against
>  > the "first" document (blog_entries). It doesnt run against the 2nd
>  > document (mesh_categories).
>  >
>  > If I remove the 2 document elements and wrap both entity sets in
> just
>  > one document tag, then both sets get indexed, which seemingly
> achieves
>  > my goal. This just doesnt make sense from my understanding of how
> DIH
>  > works. My 2 content types are indeed separate so they logically
>  > represent two document types, not one.
>  >
>  > Is this correct? What am I missing here?
>  >
>  > Thanks
>  > -Rupert
>  >
> 
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Re: [DIH] blocking import operation

2009-11-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

DIH imports are really long running. There is a good chance that the
connection times out or breaks in between.

how about a callback?

On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott  wrote:
> Hi all,
>
> currently, DIH's import operation(s) only works asynchronously. Therefore,
> after submitting an import request, DIH returns immediately, while the
> import process (in case a large amount of data needs to be indexed)
> continues asynchronously behind the scenes.
>
> So, what is the recommended way to check if the import process has already
> finished? Or still better, is there any method / workaround that will block
> the import operation's caller until the operation has finished?
>
> In my application, the DIH receives some URL parameters which are used for
> determining the database name that is used within data-config.xml, e.g.
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>
> Since only one DIH, /dataimport, is defined, but several database needs to
> be indexed, it is required to issue this command several times, e.g.
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>
> ... wait until /dataimport?command=status says "Indexing completed" (but
> without using a loop that checks it again and again) ...
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>
>
> A suitable solution, at least IMHO, would be to have an additional DIH
> parameter which determines whether the import call is blocking on
> non-blocking, the default. As far as I see, this could be accomplished since
> Solr can execute more than one import operation at a time (it starts a new
> thread for each). Perhaps, my question is somehow related to the discussion
> [1] on ParallelDataImportHandler.
>
> Best,
> Sascha
>
> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr Internal exception on startup...

2009-11-09 Thread Lance Norskog

This looks like a file system access control configuration problem,
not a Solr, Lucene, or even Java problem.

You could disable the various security things like SELinux and test it again.

On Mon, Nov 9, 2009 at 1:14 PM, William Pierce  wrote:
> All,
>
> I realized that the stack trace I had sent in my previous email was
> truncated to not include the solr portions.here is the fuller stack
> trace:
>
> SEVERE: Exception starting filter SolrRequestFilter
> org.apache.solr.common.SolrException: java.security.AccessControlException:
> access denied (java.io.FilePermission
> /home/ubuntu/apps/solr/tomcatweb/resumes/lib read)
>       at
> org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>       at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at java.lang.Class.newInstance0(Class.java:355)
>       at java.lang.Class.newInstance(Class.java:308)
>       at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
>       at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>       at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>       at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
>       at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
>       at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>       at
> org.apache.catalina.core.ContainerBase.access$000(ContainerBase.java:123)
>       at
> org.apache.catalina.core.ContainerBase$PrivilegedAddChild.run(ContainerBase.java:145)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:769)
>       at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>       at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:630)
>       at
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:556)
>       at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:491)
>       at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
>       at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
>       at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
> Nov 9, 2009 9:08:57 PM org.apache.catalina.core.StandardContext filterStart
> SEVERE: Exception starting filter SolrRequestFilter
> org.apache.solr.common.SolrException: java.security.AccessControlException:
> access denied (java.io.FilePermission
> /home/ubuntu/apps/solr/tomcatweb/resumes/lib read)
>       at
> org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>       at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at java.lang.Class.newInstance0(Class.java:355)
>       at java.lang.Class.newInstance(Class.java:308)
>       at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:255)
>       at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
>       at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
>       at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
>
> Cheers,
>
> - Bill
>
> --
> From: "William Pierce" 
> Sent: Monday, November 09, 2009 12:49 PM
> To: 
> Subject: Solr Internal exception on startup...
>
>> Folks:
>>
>> I am encountering an internal exception running solr on an Ubuntu 9.04
>> box,  running tomcat 6.  I have deposited the solr nightly bits (as of
>> October 7) into the folder: /usr/share/tomcat6/lib
>>
>> The exception from the log says:
>>
>> Nov 9, 2009 8:26:13 PM org.apache.catalina.core.StandardContext
>> filterStart
>> SEVERE: Exception starting filter SolrRequestFilter
>> org.apache.solr.common.SolrException:
>> java.security.AccessControlException: access denied (java.io.FilePermission
>> /home/ubuntu/apps/solr/tomcatweb/prod/lib read)
>>       at
>> org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68)
>>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>       at
>> sun.reflect

How TEXT field make sortable?

2009-11-09 Thread deepak agrawal

Can some one help me how we can sort the text field.



-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.

Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-09 Thread Vicky_Dev


Thanks Otis for your response.

Is it possible to get result of one solr query feed into another Solr Query?

Issue which I am facing right now is::
I am getting results from one query and I just need 2 index attribute values
. These index attribute values are used for form new Query to Solr. 

Since Solr gives result only for GET request, hence there is restriction on
: forming query with all values.

Please do send your views on above problem

Thanks
~Vikrant




Otis Gospodnetic wrote:
> 
> You can mimic them by combining 2 clauses with an AND.
> e.g.
> cookies
> vs.
> cookies AND vanilla
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
>> From: Vicky_Dev 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, November 9, 2009 1:48:03 PM
>> Subject: Re: Are subqueries possible in Solr? If so, are they performant?
>> 
>> 
>> 
>> Hi Team,
>> Is it possible to write subqueries in dismaxrequest handler?
>> 
>> ~Vikrant
>> 
>> 
>> Edoardo Marcora wrote:
>> > 
>> > Does Solr have the ability to do subqueries, like this one (in SQL):
>> > 
>> > SELECT id, first_name
>> > FROM student_details
>> > WHERE first_name IN (SELECT first_name
>> > FROM student_details
>> > WHERE subject= 'Science'); 
>> > 
>> > If so, how performant is this kind of queries?
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26271600.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26278872.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Are subqueries possible in Solr? If so, are they performant?

2009-11-09 Thread Vicky_Dev


Thanks Otis for your response.

Is it possible to get result of one solr query feed into another Solr Query?

Issue which I am facing right now is::
I am getting results from one query and I just need 2 index attribute values
. These index attribute values are used for form new Query to Solr. 

Since Solr gives result only for GET request, hence there is restriction on
: forming query with all values.

Please do send your views on above problem

Thanks
~Vikrant




Otis Gospodnetic wrote:
> 
> You can mimic them by combining 2 clauses with an AND.
> e.g.
> cookies
> vs.
> cookies AND vanilla
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
>> From: Vicky_Dev 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, November 9, 2009 1:48:03 PM
>> Subject: Re: Are subqueries possible in Solr? If so, are they performant?
>> 
>> 
>> 
>> Hi Team,
>> Is it possible to write subqueries in dismaxrequest handler?
>> 
>> ~Vikrant
>> 
>> 
>> Edoardo Marcora wrote:
>> > 
>> > Does Solr have the ability to do subqueries, like this one (in SQL):
>> > 
>> > SELECT id, first_name
>> > FROM student_details
>> > WHERE first_name IN (SELECT first_name
>> > FROM student_details
>> > WHERE subject= 'Science'); 
>> > 
>> > If so, how performant is this kind of queries?
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26271600.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Are-subqueries-possible-in-Solr--If-so%2C-are-they-performant--tp24467023p26278875.html
Sent from the Solr - User mailing list archive at Nabble.com.

A question about how to make schema.xml change take effect

2009-11-09 Thread Bertie Shen

Hey folks,

  When I update schema.xml, I found most of time I do not need to restart
tomcat in order to make change take effect. But sometimes, I have to restart
tomcat server to make change take effect.

   For example, when I change a field data type from sint to tlong, I called
http://host:port/solr/dataimport?command=full-import&commit=true&clean=true.
I clicked [Schema] link from admin page and found data type is tlong; but
click [Schema Browser] and that field link, I found the data type is still
sint.  When I make a search, the result also shows the field is still sint.
The only way to make the change effective I found is to restart tomcat.

   I want to confirm whether it is intended or it is a bug.

   Thanks.

Re: A question about how to make schema.xml change take effect

2009-11-09 Thread Ritesh Gurung

Well everytime you make change in schema.xml file you need restart the
tomcat server.

On Tue, Nov 10, 2009 at 11:59 AM, Bertie Shen  wrote:
> Hey folks,
>
>  When I update schema.xml, I found most of time I do not need to restart
> tomcat in order to make change take effect. But sometimes, I have to restart
> tomcat server to make change take effect.
>
>   For example, when I change a field data type from sint to tlong, I called
> http://host:port/solr/dataimport?command=full-import&commit=true&clean=true.
> I clicked [Schema] link from admin page and found data type is tlong; but
> click [Schema Browser] and that field link, I found the data type is still
> sint.  When I make a search, the result also shows the field is still sint.
> The only way to make the change effective I found is to restart tomcat.
>
>   I want to confirm whether it is intended or it is a bug.
>
>   Thanks.
>

Re: A question about how to make schema.xml change take effect

2009-11-09 Thread Noble Paul നോബിള്‍ नोब्ळ्

if your are using a multicore instance you may just reload the core

On Tue, Nov 10, 2009 at 12:07 PM, Ritesh Gurung  wrote:
> Well everytime you make change in schema.xml file you need restart the
> tomcat server.
>
> On Tue, Nov 10, 2009 at 11:59 AM, Bertie Shen  wrote:
>> Hey folks,
>>
>>  When I update schema.xml, I found most of time I do not need to restart
>> tomcat in order to make change take effect. But sometimes, I have to restart
>> tomcat server to make change take effect.
>>
>>   For example, when I change a field data type from sint to tlong, I called
>> http://host:port/solr/dataimport?command=full-import&commit=true&clean=true.
>> I clicked [Schema] link from admin page and found data type is tlong; but
>> click [Schema Browser] and that field link, I found the data type is still
>> sint.  When I make a search, the result also shows the field is still sint.
>> The only way to make the change effective I found is to restart tomcat.
>>
>>   I want to confirm whether it is intended or it is a bug.
>>
>>   Thanks.
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: A question about how to make schema.xml change take effect

2009-11-09 Thread Bertie Shen

Oh. Sorry, take back what I said. Most of my config change is at
data-config.xml, not schema.xml.

I just made a change for field data type in schema.xml and noticed that I
have to restart tomcat.



On Mon, Nov 9, 2009 at 10:37 PM, Ritesh Gurung  wrote:

> Well everytime you make change in schema.xml file you need restart the
> tomcat server.
>
> On Tue, Nov 10, 2009 at 11:59 AM, Bertie Shen 
> wrote:
> > Hey folks,
> >
> >  When I update schema.xml, I found most of time I do not need to restart
> > tomcat in order to make change take effect. But sometimes, I have to
> restart
> > tomcat server to make change take effect.
> >
> >   For example, when I change a field data type from sint to tlong, I
> called
> > http://host:port
> /solr/dataimport?command=full-import&commit=true&clean=true.
> > I clicked [Schema] link from admin page and found data type is tlong; but
> > click [Schema Browser] and that field link, I found the data type is
> still
> > sint.  When I make a search, the result also shows the field is still
> sint.
> > The only way to make the change effective I found is to restart tomcat.
> >
> >   I want to confirm whether it is intended or it is a bug.
> >
> >   Thanks.
> >
>

Re: How TEXT field make sortable?

2009-11-09 Thread Avlesh Singh

>
> Can some one help me how we can sort the text field.
>
You CANNOT sort on a "text" field. Sorting can only be done on an
untokenized field (e.g string, sint, sfloat etc fields)

Cheers
Avlesh

On Tue, Nov 10, 2009 at 11:44 AM, deepak agrawal  wrote:

> Can some one help me how we can sort the text field.
>
> 
>
> --
> DEEPAK AGRAWAL
> +91-9379433455
> GOOD LUCK.
>

72 matches

Mail list logo