date:20100617

Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread MitchK




> Solr doesn't know anything about OPIC, but I suppose you can feed the OPIC
> score computed by Nutch into a Solr field and use it during scoring, if
> you want, say with a function query. 
> 
Oh! Yes, that makes more sense than using the OPIC as doc-boost-value. :-)
Anywhere at the Lucene Mailing lists I read that in future it will be
possible to change field's contents without reindexing the whole document.
If one stores the OPIC-Score (which is independent from the page's content)
in a field and uses functionQuery to influence the score of a document, one
saves the effort of reindexing the whole doc, if the content did not change.

Regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Collapsing SOLR-236

2010-06-17 Thread Rakhi Khatwani

Hi Moazzam,
   Yup i hv encountered the same thing.
Build errors after applying the patch.

Rakhi

On Thu, Jun 17, 2010 at 3:33 AM, Moazzam Khan  wrote:

> I got the code from trunk again and now I get this error:
>
>[javac] symbol  : class StringIndex
>[javac] location: interface org.apache.lucene.search.FieldCache
>[javac] private final Map
> fieldCaches =
> new HashMap();
>[javac] ^
>[javac]
> C:\svn\solr\src\java\org\apache\solr\search\SolrIndexSearcher.java:6
> 74: cannot find symbol
>[javac] symbol  : class DocSetScoreCollector
>[javac] location: class org.apache.solr.search.SolrIndexSearcher
>[javac] if (query instanceof TermQuery && !(collector instanceof
> Doc
> SetScoreCollector)) {
>[javac]
>  ^
>[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\AbstractDo
> cumentCollapser.java:257: cannot find symbol
>[javac] symbol  : method
> getStringIndex(org.apache.solr.search.SolrIndexRead
> er,java.lang.String)
>[javac] location: interface org.apache.lucene.search.FieldCache
>[javac] fieldValues =
> FieldCache.DEFAULT.getStringIndex(searcher.getRead
> er(), collapseField);
> [javac] ^
>[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> AggregateCollapseCollectorFactory.java:163: cannot find symbol
>[javac] symbol  : class StringIndex
>[javac] location: interface org.apache.lucene.search.FieldCache
>[javac] private final Map
> fieldCaches =
> new HashMap();
> [javac]
>  ^
>[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> AggregateCollapseCollectorFactory.java:173: cannot find symbol
>[javac] symbol  : method
> getStringIndex(org.apache.solr.search.SolrIndexRead
> er,java.lang.String)
>[javac] location: interface org.apache.lucene.search.FieldCache
>[javac]   fieldCaches.put(fieldName,
> FieldCache.DEFAULT.getStringInd
> ex(searcher.getReader(), fieldName));
> [javac]^
>[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> AggregateCollapseCollectorFactory.java:183: cannot find symbol
>[javac] symbol  : class StringIndex
>[javac] location: interface org.apache.lucene.search.FieldCache
>[javac] FieldCache.StringIndex stringIndex =
> fieldCaches.get(aggrega
> teField.getFieldName());
>[javac]   ^
>[javac] Note: Some input files use or override a deprecated API.
>[javac] Note: Recompile with -Xlint:deprecation for details.
>[javac] Note: Some input files use unchecked or unsafe operations.
>[javac] Note: Recompile with -Xlint:unchecked for details.
>[javac] 10 errors
>
>
> I am compiling using jdk 1.5 update 22. Does that have anything to do
> with the errors?
>
> -Moazzam
>
> On Wed, Jun 16, 2010 at 4:34 PM, Moazzam Khan  wrote:
> > I did the same thing. And, the code compiles without the patch but
> > when I apply the patch I get these errors:
> >
> >[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> > FieldValueCountCollapseCollectorFactory.java:127: class, interface, or
> enum expe
> > cted
> >[javac] import java.util.HashMap;
> >[javac] ^
> >[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> > FieldValueCountCollapseCollectorFactory.java:128: class, interface, or
> enum expe
> > cted
> >[javac] import java.util.Map;
> >[javac] ^
> >[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> > aggregate\AggregateFunction.java:70: class, interface, or enum expected
> >[javac] package
> org.apache.solr.search.fieldcollapse.collector.aggregate;
> >[javac] ^
> >[javac]
> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
> > aggregate\AggregateFunction.java:72: class, interface, or enum expected
> >[javac] import org.apache.solr.search.fieldcollapse.CollapseGroup;
> >[javac] ^
> >[javac] 52 errors
> >
> >
> > I got the source from :
> >
> > http://svn.apache.org/repos/asf/lucene/dev/trunk
> >
> > and got the patch from :
> >
> >
> https://issues.apache.org/jira/secure/attachment/12440108/SOLR-236-trunk.patch
> >
> >
> > Any ideas what's going wrong?
> >
> >
> >
> >
> > On Wed, Jun 16, 2010 at 11:40 AM, Eric Caron 
> wrote:
> >> I've had the best luck checking out the newest Solr/Lucene (so the
> 1.5-line)
> >> from SVN, then just doing "patch -p0 < SOLR-236-trunk.patch" from inside
> the
> >> trunk directory. I just did it against the newest checkout and it works
> fine
> >> still.
> >>
> >> On Wed, Jun 16, 2010 at 11:35 AM, Moazzam Khan 
> wrote:
> >>
> >>> Actually I take that back. I am just as lost as you. I wish there was
> >>> a tutorial on how to do this (although I get the feeling that once I
>

Get total number of results when field collapsing is enabled

2010-06-17 Thread Adrian Pemsel

Hi Folks,

Is there any way to get or estimate the total number of results when using
field collapsing (SOLR-236) without using faceting or a second query?

Kind Regards,
Adrian Pemsel
-- 
http://www.jusmeum.de

RejectedExecutionException when shutttingdown corecontainer

2010-06-17 Thread NarasimhaRaju

Hi,

I am using solr 1.3 and when indexing i am getting RejectedExecutionException 
after processing the last batch of update records from the database.
happening when coreContainer.shutdown() is called after processing the last 
record.
i have autocommits enabled based on maxTime which is 10 minutes.

from the exception below i see it's happening from commitTracker of 
DefaultUpdateHandler2.

looking at the SolrCore.close method searchExecutor.shutdown() is called before 
updateHandler.close()

I still don't understand why updateHandler is called after searchExcecutor when 
updateHandler has the possibility of adding/submitting to searchExecutor.

is this a bug or am i doing something wrong with my autocommit.




SEVERE: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
at 
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1029)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
at 
org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:515)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)



 
Regards,
Narasimha

Re: how to apply patch SOLR-1316

2010-06-17 Thread Koji Sekiguchi




As you can see both versions don't appear to be working. I tried building
each but neither would compile. Which version/tag should be used when
applying this patch?

   

In general, a patch is written against the latest trunk branch
as of then. For the SOLR-1316.patch, it was posted 2010-5-31,
you should check out the dated source from trunk:

$ svn co -r {2010-05-31} http://svn.apache.org/repos/asf/lucene/dev/trunk

Koji

--
http://www.rondhuit.com/en/

Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread Otis Gospodnetic

Mitch,

Yes, one day.  But it sounds like you are not aware of ExternalFieldFile, which 
you can use today:

http://search-lucene.com/?q=ExternalFileField&fc_project=Solr

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: MitchK 
> To: solr-user@lucene.apache.org
> Sent: Thu, June 17, 2010 4:15:27 AM
> Subject: Re: Re: Re: Solr and Nutch/Droids - to use or not to use?
> 
> 


> Solr doesn't know anything about OPIC, but I suppose you can 
> feed the OPIC
> score computed by Nutch into a Solr field and use it 
> during scoring, if
> you want, say with a function query. 
> 
Oh! 
> Yes, that makes more sense than using the OPIC as doc-boost-value. 
> :-)
Anywhere at the Lucene Mailing lists I read that in future it will 
> be
possible to change field's contents without reindexing the whole 
> document.
If one stores the OPIC-Score (which is independent from the page's 
> content)
in a field and uses functionQuery to influence the score of a 
> document, one
saves the effort of reindexing the whole doc, if the content 
> did not change.

Regards
- Mitch
-- 
View this message in 
> context: 
> href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

Re: About the example in the wiki page of FunctionQuery

2010-06-17 Thread Otis Gospodnetic

Hi,

I think that "+" there is just a "space" (like %20).
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Chia Hao Lo 
> To: solr-user@lucene.apache.org
> Sent: Thu, June 17, 2010 2:44:30 AM
> Subject: About the example in the wiki page of FunctionQuery
> 
> ( I've sent this mail two days ago, but I cannot find it in the 
> mail
archive.
  So I guess the mail is not sent 
> successfully.
  Sorry for sending this mail twice in case that it did 
> send. )

Hi,

I'm a newbie to Solr and have a question about the 
> example in FunctionQuery.

I've read the document of these:

  
>   
> >http://wiki.apache.org/solr/FunctionQuery

> href="http://wiki.apache.org/solr/SolrQuerySyntax"; target=_blank 
> >http://wiki.apache.org/solr/SolrQuerySyntax

and don't understand the 
> example in FunctionQuery.

It's said that the query below searches the 
> field boxname with value =
"finxbox", then order the results by x * y * 
> z.


> q=boxname:findbox+_val_:"product(product(x,y),z)"

What does the "+" 
> between findbox _val_ stand for? Is it mean "order by"?

According to the 
> document, the score is replaced by "+_val_:...". I tried
it,
but the 
> result is not like that. The scores are changed but I cannot
tell how it 
> works.

Thanks.

chlo

Re: Indexing HTML files in SOLR

2010-06-17 Thread seesiddharth


Thank you so much for the reply...The link suggested by you is helpful but
they have explain everything with use of curl command which I don't want to
use.
I was more interested in uploading the .html documents using HTTP web
request. 
So I have stored all .html files at one location & then created HTML parser
which will fetch the content from these html file & build an XML string
(like ..). Then I sent these
XML string using HTTP web request method (in .net ) to solr server to
add/update the document.
Now I am able to search the data in solr of all uploaded documents. 
It will be great if u answer my question : 
Is there any better approach to achieve the same functionality ? 

Regards,
Siddharth
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-HTML-files-in-SOLR-tp896530p902644.html
Sent from the Solr - User mailing list archive at Nabble.com.

Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy


How can I preserve phrases for either autosuggest/autocomplete/spellcheck?

For example we have a bunch of product listings and I want if someone types:
"louis" for it to common up with "Louis Vuitton". "World" ... "World cup". 

Would I need n-grams? Shingling? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p902951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Document boosting troubles

2010-06-17 Thread dbashford


Brand new to this sort of thing so bear with me.

For sake of simplicity, I've got a two field document, title and rank. 
Title gets searched on, rank has values from 1 to 10.  1 being highest. 
What I'd like to do is boost results of searches on title based on the
documents rank.

Because it's fairly cut and dry, I was hoping to do it during indexing.  I
have this in my DIH transformer..

var docBoostVal = 0;
switch (rank) {
case '1': 
docBoostVal = 3.0;
break;
case '2': 
docBoostVal = 2.6;
break;
case '3': 
docBoostVal = 2.2;
break;
case '4': 
docBoostVal = 1.8;
break;
case '5': 
docBoostVal = 1.5;
break;
case '6': 
docBoostVal = 1.2;
break;
case '7':
docBoostVal = 0.9;
break;
case '8': 
docBoostVal = 0.7;
break;
case '9': 
docBoostVal = 0.5;  
break;
}   
row.put('$docBoost',docBoostVal); 

It's my understanding that with this, I can simply do the same /select
queries I've been doing and expect documents to be boosted, but that doesn't
seem to be happening because I'm seeing things like this in the results...

{"title":"Some title 1",
"rank":10,
 "score":0.11726039},
{"title":"Some title 2",
 "rank":7,
 "score":0.11726039},

Pretty much everything with the same score.  Whatever I'm doing isn't making
its way through. (To cover my bases I did try the case statement with
integers rather than strings, same result)





With that not working I started looking at other options.  Starting playing
with dismax.  

I'm able to add this to a query string a get results I'm somewhat
expecting...

bq=rank:1^3.0 rank:2^2.6 rank:3^2.2 rank:4^1.8 rank:5^1.5 rank:6^1.2
rank:7^0.9 rank:8^0.7 rank:9^0.5

...but I guess I wasn't expecting it to ONLY rank based on those factors. 
That essentially gives me a sort by rank.  

Trying to be super inclusive with the search, so while I'm fiddling my
mm=1<1.  As expected, a q= like q=red door is returning everything that
contains Red and door.  But I was hoping that items that matched "red door"
exactly would sort closer to the top.  And if that exact match was a rank 7
that it's score wouldn't be exactly the same as all the other rank 7s? 
Ditto if I searched for "q=The Tales Of", anything possessing all 3 terms
would sort closer to the top...and possessing two terms behind them...and
possessing 1 term behind them, and within those groups weight heavily on by
rank.

I think I understand that the score is based entirely on the boosts I
provide...so how do I get something more like what I'm looking for?




Along those lines, I initially had put something like this in my defaults...

 
rank:1^10.0 rank:2^9.0 rank:3^8.0 rank:4^7.0 rank:5^6.0 rank:6^5.0
rank:7^4.0 rank:8^3.0 rank:9^2.0
 

...but that was not working, queries fail with a syntax exception.  Guessing
this won't work?



Thanks in advance for any help you can provide.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p902982.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Michael

Blargy,

I've been experimenting with this myself for a work project. What I
did was use a combination of the two running the indexed terms through
the Shingle factory and then through the edge n-gram filter. I did
this in order to be able to match terms like :

.net asp c#
asp .net c#
c# asp .net
c# asp.net
 for a word query like
asp c# .net

The edge ngrams are good, but they can also fail to match on queries
when the words in the index are in a different order than those in the
query.

My setup in schema.xml  looks like this :

Let me know how this works for you.

On Thu, Jun 17, 2010 at 11:05 AM, Blargy  wrote:
>
> How can I preserve phrases for either autosuggest/autocomplete/spellcheck?
>
> For example we have a bunch of product listings and I want if someone types:
> "louis" for it to common up with "Louis Vuitton". "World" ... "World cup".
>
> Would I need n-grams? Shingling? Thanks
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p902951.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread MitchK


Otis,

you are right. I wasn't aware of this. At least not with such a large
dataList (let's think of an index with 4mio docs, this would mean we got an
ExternalFile with 4mio records). But from what I've read at 
search-lucene.com it seems to perform very well. Thanks for the idea!

Btw: Otis, did you open a JIRA Issue for the distributed indexing ability of
Solr?
I would like to follow the issue, if it is open. 

Regards
- Mitch


Otis Gospodnetic-2 wrote:
> 
> Mitch,
> 
> Yes, one day.  But it sounds like you are not aware of ExternalFieldFile,
> which you can use today:
> 
> http://search-lucene.com/?q=ExternalFileField&fc_project=Solr
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: MitchK 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, June 17, 2010 4:15:27 AM
>> Subject: Re: Re: Re: Solr and Nutch/Droids - to use or not to use?
>> 
>> 
> 
> 
>> Solr doesn't know anything about OPIC, but I suppose you can 
>> feed the OPIC
>> score computed by Nutch into a Solr field and use it 
>> during scoring, if
>> you want, say with a function query. 
>> 
> Oh! 
>> Yes, that makes more sense than using the OPIC as doc-boost-value. 
>> :-)
> Anywhere at the Lucene Mailing lists I read that in future it will 
>> be
> possible to change field's contents without reindexing the whole 
>> document.
> If one stores the OPIC-Score (which is independent from the page's 
>> content)
> in a field and uses functionQuery to influence the score of a 
>> document, one
> saves the effort of reindexing the whole doc, if the content 
>> did not change.
> 
> Regards
> - Mitch
> -- 
> View this message in 
>> context: 
>> href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html";
>>  
>> target=_blank 
>> >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
> Sent 
>> from the Solr - User mailing list archive at Nabble.com.
> 
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p903148.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr multi-node

2010-06-17 Thread Antonello Mangone

Hi to every one I have a question and I hope someone can help me.
I know that mission critical reliability can be implemented with Lucene/Solr
by using multi-node configurations, and redundant architectures, but I
haven't found documentation on how to do it.
Can someone help me to find a link to read how to do it ?
Thank you all in advance ...

Re: Document boosting troubles

2010-06-17 Thread MitchK


Hi,

first of all, are you sure that row.put('$docBoost',docBoostVal) is correct?

I think it should be row.put($docBoost,docBoostVal); - unfortunately I am
not sure.

Hm, I think, until you can solve the problem with the docBoosts itself, you
should use a functionQuery.

Use "div(1, rank)" as boost function (bf).

The higher the rank value, the smaller the result.

Hope this helps!
- Mitch

 
dbashford wrote:
> 
> Brand new to this sort of thing so bear with me.
> 
> For sake of simplicity, I've got a two field document, title and rank. 
> Title gets searched on, rank has values from 1 to 10.  1 being highest. 
> What I'd like to do is boost results of searches on title based on the
> documents rank.
> 
> Because it's fairly cut and dry, I was hoping to do it during indexing.  I
> have this in my DIH transformer..
> 
> var docBoostVal = 0;
> switch (rank) {
>   case '1': 
>   docBoostVal = 3.0;
>   break;
>   case '2': 
>   docBoostVal = 2.6;
>   break;
>   case '3': 
>   docBoostVal = 2.2;
>   break;
>   case '4': 
>   docBoostVal = 1.8;
>   break;
>   case '5': 
>   docBoostVal = 1.5;
>   break;
>   case '6': 
>   docBoostVal = 1.2;
>   break;
>   case '7':
>   docBoostVal = 0.9;
>   break;
>   case '8': 
>   docBoostVal = 0.7;
>   break;
>   case '9': 
>   docBoostVal = 0.5;  
>   break;
> } 
> row.put('$docBoost',docBoostVal); 
> 
> It's my understanding that with this, I can simply do the same /select
> queries I've been doing and expect documents to be boosted, but that
> doesn't seem to be happening because I'm seeing things like this in the
> results...
> 
> {"title":"Some title 1",
> "rank":10,
>  "score":0.11726039},
> {"title":"Some title 2",
>  "rank":7,
>  "score":0.11726039},
> 
> Pretty much everything with the same score.  Whatever I'm doing isn't
> making its way through. (To cover my bases I did try the case statement
> with integers rather than strings, same result)
> 
> 
> 
> 
> 
> With that not working I started looking at other options.  Starting
> playing with dismax.  
> 
> I'm able to add this to a query string a get results I'm somewhat
> expecting...
> 
> bq=rank:1^3.0 rank:2^2.6 rank:3^2.2 rank:4^1.8 rank:5^1.5 rank:6^1.2
> rank:7^0.9 rank:8^0.7 rank:9^0.5
> 
> ...but I guess I wasn't expecting it to ONLY rank based on those factors. 
> That essentially gives me a sort by rank.  
> 
> Trying to be super inclusive with the search, so while I'm fiddling my
> mm=1<1.  As expected, a q= like q=red door is returning everything that
> contains Red and door.  But I was hoping that items that matched "red
> door" exactly would sort closer to the top.  And if that exact match was a
> rank 7 that it's score wouldn't be exactly the same as all the other rank
> 7s?  Ditto if I searched for "q=The Tales Of", anything possessing all 3
> terms would sort closer to the top...and possessing two terms behind
> them...and possessing 1 term behind them, and within those groups weight
> heavily on by rank.
> 
> I think I understand that the score is based entirely on the boosts I
> provide...so how do I get something more like what I'm looking for?
> 
> 
> 
> 
> Along those lines, I initially had put something like this in my
> defaults...
> 
>  
> rank:1^10.0 rank:2^9.0 rank:3^8.0 rank:4^7.0 rank:5^6.0 rank:6^5.0
> rank:7^4.0 rank:8^3.0 rank:9^2.0
>  
> 
> ...but that was not working, queries fail with a syntax exception. 
> Guessing this won't work?
> 
> 
> 
> Thanks in advance for any help you can provide.
> 
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p903190.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document boosting troubles

2010-06-17 Thread MitchK


Sorry, I've overlooked your other question.



>   
> rank:1^10.0 rank:2^9.0 rank:3^8.0 rank:4^7.0 rank:5^6.0 rank:6^5.0
> rank:7^4.0 rank:8^3.0 rank:9^2.0 
>   
> 

This is wrong.
You need to change "bf" to "bq".
Bf -> boosting function
Bq -> boosting query.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p903208.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy


Thanks for the reply Michael. Ill definitely try that out and let you know
how it goes. Your solution sounds similar to the one I've read here:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
 

There are some good comments in there too.

I think I am having the biggest trouble distinguishing what needs to be done
for autocomplete/autosuggestion (google like behavior) and a separate issue
involving spellchecking (Did you mean...). I guess I originally thought
those 2 distinct features would involve the same solution but it appears
that they are completely different. Your solution sounds like its works best
for autocomplete and I will be using it for that exact purpose ;) One
question though... how do you handle more popular words/documents over
others? 

Now my next question is, how would I get spellchecker to work with phrases.
So if I typed "vitton" it would come back with something like: "Did you
mean: 'Louis Vuitton'?" Will this also require a combination of ngrams and
shingles? 

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903225.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr multi-node

2010-06-17 Thread MitchK


Antonello,

here are a few links to the Solr Wiki:
http://wiki.apache.org/solr/SolrReplication Solr Replication 
http://wiki.apache.org/solr/DistributedSearchDesign Distributed Search
Design 
http://wiki.apache.org/solr/DistributedSearch Distributed Search 
http://wiki.apache.org/solr/SolrCloud Solr Cloud 

Hope this helps.
- Mitch

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-multi-node-tp903159p903228.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Master master?

2010-06-17 Thread MitchK


What is the usecase for such an architecture?
Do you send requests to two different masters for indexing and that's why
they need to be synchronized?

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Master-master-tp884253p903233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Michael

We base the auto-suggest on popular searches. Our site logs the search
terms in a database and a simple query can give us a summary counting
the number of times the search was entered and the number of results
it returned, similar to the criteria used in the lucid imagination
article you cite. Each record includes the search terms, the total
number of times it was entered and the maximum number of hits
returned. Each record is fed in as a document. On a regular interval,
older documents are deleted and newer ones are added.

On Thu, Jun 17, 2010 at 12:29 PM, Blargy  wrote:
>
> Thanks for the reply Michael. Ill definitely try that out and let you know
> how it goes. Your solution sounds similar to the one I've read here:
> http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
>
> There are some good comments in there too.
>
> I think I am having the biggest trouble distinguishing what needs to be done
> for autocomplete/autosuggestion (google like behavior) and a separate issue
> involving spellchecking (Did you mean...). I guess I originally thought
> those 2 distinct features would involve the same solution but it appears
> that they are completely different. Your solution sounds like its works best
> for autocomplete and I will be using it for that exact purpose ;) One
> question though... how do you handle more popular words/documents over
> others?
>
> Now my next question is, how would I get spellchecker to work with phrases.
> So if I typed "vitton" it would come back with something like: "Did you
> mean: 'Louis Vuitton'?" Will this also require a combination of ngrams and
> shingles?
>
> Thanks
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903225.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Field Collapsing SOLR-236

2010-06-17 Thread Moazzam Khan

Hi Mark,

Thanks for posting those links. I know this is probably a dumb
question, but how do I make Solr work through your repository? I ask
this because I don't see a build xml file and the folder structure is
a bit different (I'm guessing I am not supposed to use ant on that :D)

Thanks,

Moazzam




On Thu, Jun 17, 2010 at 4:23 AM, Rakhi Khatwani  wrote:
> Hi Moazzam,
>               Yup i hv encountered the same thing.
> Build errors after applying the patch.
>
> Rakhi
>
> On Thu, Jun 17, 2010 at 3:33 AM, Moazzam Khan  wrote:
>
>> I got the code from trunk again and now I get this error:
>>
>>    [javac] symbol  : class StringIndex
>>    [javac] location: interface org.apache.lucene.search.FieldCache
>>    [javac]     private final Map
>> fieldCaches =
>> new HashMap();
>>    [javac]                                         ^
>>    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\SolrIndexSearcher.java:6
>> 74: cannot find symbol
>>    [javac] symbol  : class DocSetScoreCollector
>>    [javac] location: class org.apache.solr.search.SolrIndexSearcher
>>    [javac]         if (query instanceof TermQuery && !(collector instanceof
>> Doc
>> SetScoreCollector)) {
>>    [javac]
>>  ^
>>    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\AbstractDo
>> cumentCollapser.java:257: cannot find symbol
>>    [javac] symbol  : method
>> getStringIndex(org.apache.solr.search.SolrIndexRead
>> er,java.lang.String)
>>    [javac] location: interface org.apache.lucene.search.FieldCache
>>    [javac]     fieldValues =
>> FieldCache.DEFAULT.getStringIndex(searcher.getRead
>> er(), collapseField);
>>     [javac]                                     ^
>>    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> AggregateCollapseCollectorFactory.java:163: cannot find symbol
>>    [javac] symbol  : class StringIndex
>>    [javac] location: interface org.apache.lucene.search.FieldCache
>>    [javac]     private final Map
>> fieldCaches =
>> new HashMap();
>>     [javac]
>>                              ^
>>    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> AggregateCollapseCollectorFactory.java:173: cannot find symbol
>>    [javac] symbol  : method
>> getStringIndex(org.apache.solr.search.SolrIndexRead
>> er,java.lang.String)
>>    [javac] location: interface org.apache.lucene.search.FieldCache
>>    [javac]           fieldCaches.put(fieldName,
>> FieldCache.DEFAULT.getStringInd
>> ex(searcher.getReader(), fieldName));
>>     [javac]                                                        ^
>>    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> AggregateCollapseCollectorFactory.java:183: cannot find symbol
>>    [javac] symbol  : class StringIndex
>>    [javac] location: interface org.apache.lucene.search.FieldCache
>>    [javac]         FieldCache.StringIndex stringIndex =
>> fieldCaches.get(aggrega
>> teField.getFieldName());
>>    [javac]                   ^
>>    [javac] Note: Some input files use or override a deprecated API.
>>    [javac] Note: Recompile with -Xlint:deprecation for details.
>>    [javac] Note: Some input files use unchecked or unsafe operations.
>>    [javac] Note: Recompile with -Xlint:unchecked for details.
>>    [javac] 10 errors
>>
>>
>> I am compiling using jdk 1.5 update 22. Does that have anything to do
>> with the errors?
>>
>> -Moazzam
>>
>> On Wed, Jun 16, 2010 at 4:34 PM, Moazzam Khan  wrote:
>> > I did the same thing. And, the code compiles without the patch but
>> > when I apply the patch I get these errors:
>> >
>> >    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> > FieldValueCountCollapseCollectorFactory.java:127: class, interface, or
>> enum expe
>> > cted
>> >    [javac] import java.util.HashMap;
>> >    [javac] ^
>> >    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> > FieldValueCountCollapseCollectorFactory.java:128: class, interface, or
>> enum expe
>> > cted
>> >    [javac] import java.util.Map;
>> >    [javac] ^
>> >    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> > aggregate\AggregateFunction.java:70: class, interface, or enum expected
>> >    [javac] package
>> org.apache.solr.search.fieldcollapse.collector.aggregate;
>> >    [javac] ^
>> >    [javac]
>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>> > aggregate\AggregateFunction.java:72: class, interface, or enum expected
>> >    [javac] import org.apache.solr.search.fieldcollapse.CollapseGroup;
>> >    [javac] ^
>> >    [javac] 52 errors
>> >
>> >
>> > I got the source from :
>> >
>> > http://svn.apache.org/repos/asf/lucene/dev/trunk
>> >
>> > and got the patch from :
>> >
>> >
>> https://issues.apache.org/jira/secure/attachment/12440108/SOLR-236-trunk.patch
>> >
>> >
>> > Any ideas what's going wrong?
>> >
>> >
>> >
>> >
>> > On Wed, Jun 16, 2010 at 11:40 AM, Eric Caron 
>> wr

Plural only stemmer

2010-06-17 Thread Rachel Arbit

Hi all,
I'm having trouble finding a stemmer that's less aggressive than the
porter-stemmer, ideally, one that does only plural stemming.
I've been trying to get KStem to work by copying the lucid-kstem and
lucid-solr-kstem jars from the lucid distribution into solr/lib, but I get a
classNotFound Exception for CharArraySet when I do that.

Does anyone know where I can get a stemmer that fits my needs, or have tips
on how to make it work with the KStem jars?

Thanks!

Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy


Ok that makes perfect sense.

"What I did was use a combination of the two running the indexed terms
through " - I initially read this as you used your current index and use
the terms from that to buildup your dictionary.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903299.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Plural only stemmer

2010-06-17 Thread Ahmet Arslan

> I'm having trouble finding a stemmer that's less aggressive
> than the
> porter-stemmer, ideally, one that does only plural
> stemming.

Looks like PlingStemmer does this.

http://www.mpi-inf.mpg.de/yago-naga/javatools/doc/javatools/parsers/PlingStemmer.html

federated / meta search

2010-06-17 Thread Sascha Szott


Hi folks,

if I'm seeing it right Solr currently does not provide any support for 
federated / meta searching. Therefore, I'd like to know if anyone has 
already put efforts into this direction? Moreover, is federated / meta 
search considered a scenario Solr should be able to deal with at all or 
is it (far) beyond the scope of Solr?


To be more precise, I'll give you a short explanation of my 
requirements. Assume, there are a couple of Solr instances running at 
different places. The documents stored within those instances are all 
from the same domain (bibliographic records), but it can not be ensured 
that the schema definitions conform to 100%. But lets say, there are at 
least some index fields that are present in all instances (fields with 
the same name and type definition). Now, I'd like to perform a search on 
all instances at the same time (with the restriction that the query 
contains only those fields that overlap among the different schemas) and 
combine the results in a reasonable way by utilizing the score 
information associated with each hit. Please note, that due to legal 
issues it is not feasible to build a single index that integrates the 
documents of all Solr instances under consideration.


Thanks in advance,
Sascha

Re: Field Collapsing SOLR-236

2010-06-17 Thread Mark Diggory

Correct, it uses maven and just constructs the War executable, its upto you to 
configure the location of your solr home directory still.

svn co https://scm.dspace.org/svn/repo/modules/dspace-solr/trunk solr
cd solr
mvn package

then you can go into the webapp/target directory and get the generated war file 
there and/or manipulate your solr home settings prior to deploying it.

note if you just want to use it, you don't have build it, you can just get the 
precompiled binary from the maven central repo

http://repo2.maven.org/maven2/org/dspace/dependencies/solr/dspace-solr-webapp/1.4.0.1/dspace-solr-webapp-1.4.0.1.war

likewise for the modified solrj

http://repo2.maven.org/maven2/org/dspace/dependencies/solr/dspace-solr-solrj/1.4.0.1/dspace-solr-solrj-1.4.0.1.jar

Cheers,
Mark

On Jun 17, 2010, at 9:50 AM, Moazzam Khan wrote:

> Hi Mark,
> 
> Thanks for posting those links. I know this is probably a dumb
> question, but how do I make Solr work through your repository? I ask
> this because I don't see a build xml file and the folder structure is
> a bit different (I'm guessing I am not supposed to use ant on that :D)
> 
> Thanks,
> 
> Moazzam
> 
> 
> 
> 
> On Thu, Jun 17, 2010 at 4:23 AM, Rakhi Khatwani  wrote:
>> Hi Moazzam,
>>   Yup i hv encountered the same thing.
>> Build errors after applying the patch.
>> 
>> Rakhi
>> 
>> On Thu, Jun 17, 2010 at 3:33 AM, Moazzam Khan  wrote:
>> 
>>> I got the code from trunk again and now I get this error:
>>> 
>>>[javac] symbol  : class StringIndex
>>>[javac] location: interface org.apache.lucene.search.FieldCache
>>>[javac] private final Map
>>> fieldCaches =
>>> new HashMap();
>>>[javac] ^
>>>[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\SolrIndexSearcher.java:6
>>> 74: cannot find symbol
>>>[javac] symbol  : class DocSetScoreCollector
>>>[javac] location: class org.apache.solr.search.SolrIndexSearcher
>>>[javac] if (query instanceof TermQuery && !(collector instanceof
>>> Doc
>>> SetScoreCollector)) {
>>>[javac]
>>>  ^
>>>[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\AbstractDo
>>> cumentCollapser.java:257: cannot find symbol
>>>[javac] symbol  : method
>>> getStringIndex(org.apache.solr.search.SolrIndexRead
>>> er,java.lang.String)
>>>[javac] location: interface org.apache.lucene.search.FieldCache
>>>[javac] fieldValues =
>>> FieldCache.DEFAULT.getStringIndex(searcher.getRead
>>> er(), collapseField);
>>> [javac] ^
>>>[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>>> AggregateCollapseCollectorFactory.java:163: cannot find symbol
>>>[javac] symbol  : class StringIndex
>>>[javac] location: interface org.apache.lucene.search.FieldCache
>>>[javac] private final Map
>>> fieldCaches =
>>> new HashMap();
>>> [javac]
>>>  ^
>>>[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>>> AggregateCollapseCollectorFactory.java:173: cannot find symbol
>>>[javac] symbol  : method
>>> getStringIndex(org.apache.solr.search.SolrIndexRead
>>> er,java.lang.String)
>>>[javac] location: interface org.apache.lucene.search.FieldCache
>>>[javac]   fieldCaches.put(fieldName,
>>> FieldCache.DEFAULT.getStringInd
>>> ex(searcher.getReader(), fieldName));
>>> [javac]^
>>>[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
>>> AggregateCollapseCollectorFactory.java:183: cannot find symbol
>>>[javac] symbol  : class StringIndex
>>>[javac] location: interface org.apache.lucene.search.FieldCache
>>>[javac] FieldCache.StringIndex stringIndex =
>>> fieldCaches.get(aggrega
>>> teField.getFieldName());
>>>[javac]   ^
>>>[javac] Note: Some input files use or override a deprecated API.
>>>[javac] Note: Recompile with -Xlint:deprecation for details.
>>>[javac] Note: Some input files use unchecked or unsafe operations.
>>>[javac] Note: Recompile with -Xlint:unchecked for details.
>>>[javac] 10 errors
>>> 
>>> 
>>> I am compiling using jdk 1.5 update 22. Does that have anything to do
>>> with the errors?
>>> 
>>> -Moazzam
>>> 
>>> On Wed, Jun 16, 2010 at 4:34 PM, Moazzam Khan  wrote:
 I did the same thing. And, the code compiles without the patch but
 when I apply the patch I get these errors:
 
[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
 FieldValueCountCollapseCollectorFactory.java:127: class, interface, or
>>> enum expe
 cted
[javac] import java.util.HashMap;
[javac] ^
[javac]
>>> C:\svn\solr\src\java\org\apache\solr\search\fieldcollapse\collector\
 FieldValueCountCollapseCollectorFactory.java:128: class, interface, or
>>> enum

Re: Field Collapsing SOLR-236

2010-06-17 Thread Erik Hatcher



On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
p.s. I'd be glad to contribute our Maven build re-organization back  
to the community to get Solr properly Mavenized so that it can be  
distributed and released more often.  For us the benefit of this  
structure is that we will be able to overlay addons such as  
RequestHandlers and other third party support without having to  
rebuild Solr from scratch.


But you don't have to rebuild Solr from scratch to add a new request  
handler or other plugins - simply compile your custom stuff into a JAR  
and put it in /lib (or point to it with  in  
solrconfig.xml).


 Ideally, a Maven Archetype could be created that would allow one  
rapidly produce a Solr webapp and fire it up in Jetty in mere seconds.


How's that any different than cd example; java -jar start.jar?  Or do  
you mean a Solr client webapp?


Finally, with projects such as Bobo, integration with Spring would  
make configuration more consistent and request significantly less  
java coding just to add new capabilities everytime someone authors a  
new RequestHandler.


It's one line of config to add a new request handler.  How many  
ridiculously ugly confusing lines of Spring XML would it take?


 The biggest thing I learned about Solr in my work thusfar is that  
patches like these could be standalone modules in separate projects  
if it weren't for having to hack the configuration and solrj methods  
up to adopt them.  Which brings me to SolrJ, great API if it would  
stay generic and have less concern for adding method each time some  
custom collections and query support for morelikethis or  
collapseddocs needs to be added.


I personally find it silly that we customize SolrJ for all these  
request handlers anyway.  You get a decent navigable data structure  
back from general SolrJ query requests as it is, there's no need to  
build in all these convenience methods specific to all the Solr  
componetry.  Sure, it's "convenient", but it's a maintenance headache  
and as you say, not generic.


But hacking configuration is reasonable, I think, for adding in  
plugins.  I guess you're aiming for some kind of Spring-like auto- 
discovery of plugins?  Yeah, maybe, but I'm pretty -1 on Spring coming  
into Solr.  It's overkill and ugly, IMO.  But you like it :)  And  
that's cool by me, to each their own.


Oh, and Hi Mark! :)

Erik

Re: Document boosting troubles

2010-06-17 Thread dbashford


One problem down, two left!  =)  bf ==> bq did the trick, thanks.  Now at
least if I can't get the DIH solution working I don't have to tack that on
every query string.

Taking the quotes away from $docBoost results in a syntax error.  Needs to
be quoted.

Changed it up to this and still no luck

var rank = row.get('rank'); 
switch (rank) {
case 1: 
row.put("$docBoost",3.0);   
break;
case 2: 
row.put("$docBoost",2.6);   
break;
case 3:
row.put("$docBoost",2.2);   

break;
case 4: 
row.put("$docBoost",1.8);   
break;
case 5: 
row.put("$docBoost",1.5);   
break;
case 6: 
row.put("$docBoost",1.2);   
break;
case 7:
row.put("$docBoost",0.9);   
break;
case 8: 
row.put("$docBoost",0.7);   
break;
case 9: 
row.put("$docBoost",0.5);   
break;
default:
row.put("$docBoost",0.1);   
}   



And still can't figure out what I need to do with my dismax querying to get
scores for quality of match.  Thoughts?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p903638.html
Sent from the Solr - User mailing list archive at Nabble.com.

DismaxRequestHandler

2010-06-17 Thread Blargy


I have a title field and a description filed. I am searching across both
fields but I don't want description matches unless they are within some slop
of each other. How can I query for this? It seems that im getting back crazy
results when there are matches that are nowhere each other

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DismaxRequestHandler-tp903641p903641.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Collapsing SOLR-236

2010-06-17 Thread Martijn v Groningen

I've added a new patch to the issue, so building the trunk (rev
955615) with the latest patch should not be a problem. Due to recent
changes in the Lucene trunk the patch was not compatible.

On 17 June 2010 20:20, Erik Hatcher  wrote:
>
> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
>>
>> p.s. I'd be glad to contribute our Maven build re-organization back to the
>> community to get Solr properly Mavenized so that it can be distributed and
>> released more often.  For us the benefit of this structure is that we will
>> be able to overlay addons such as RequestHandlers and other third party
>> support without having to rebuild Solr from scratch.
>
> But you don't have to rebuild Solr from scratch to add a new request handler
> or other plugins - simply compile your custom stuff into a JAR and put it in
> /lib (or point to it with  in solrconfig.xml).
>
>>  Ideally, a Maven Archetype could be created that would allow one rapidly
>> produce a Solr webapp and fire it up in Jetty in mere seconds.
>
> How's that any different than cd example; java -jar start.jar?  Or do you
> mean a Solr client webapp?
>
>> Finally, with projects such as Bobo, integration with Spring would make
>> configuration more consistent and request significantly less java coding
>> just to add new capabilities everytime someone authors a new RequestHandler.
>
> It's one line of config to add a new request handler.  How many ridiculously
> ugly confusing lines of Spring XML would it take?
>
>>  The biggest thing I learned about Solr in my work thusfar is that patches
>> like these could be standalone modules in separate projects if it weren't
>> for having to hack the configuration and solrj methods up to adopt them.
>>  Which brings me to SolrJ, great API if it would stay generic and have less
>> concern for adding method each time some custom collections and query
>> support for morelikethis or collapseddocs needs to be added.
>
> I personally find it silly that we customize SolrJ for all these request
> handlers anyway.  You get a decent navigable data structure back from
> general SolrJ query requests as it is, there's no need to build in all these
> convenience methods specific to all the Solr componetry.  Sure, it's
> "convenient", but it's a maintenance headache and as you say, not generic.
>
> But hacking configuration is reasonable, I think, for adding in plugins.  I
> guess you're aiming for some kind of Spring-like auto-discovery of plugins?
>  Yeah, maybe, but I'm pretty -1 on Spring coming into Solr.  It's overkill
> and ugly, IMO.  But you like it :)  And that's cool by me, to each their
> own.
>
> Oh, and Hi Mark! :)
>
>        Erik
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: DismaxRequestHandler

2010-06-17 Thread Joe Calderon

the qs parameter affects matching , but you have to wrap your query in
double quotes,ex

q="oil spill"&qf=title description&qs=4&defType=dismax

im not sure how to formulate such a query to apply that rule just to
description, maybe with nested queries ...

On Thu, Jun 17, 2010 at 12:01 PM, Blargy  wrote:
>
> I have a title field and a description filed. I am searching across both
> fields but I don't want description matches unless they are within some slop
> of each other. How can I query for this? It seems that im getting back crazy
> results when there are matches that are nowhere each other
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/DismaxRequestHandler-tp903641p903641.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Field Collapsing SOLR-236

2010-06-17 Thread Moazzam Khan

I knew it wasn't me! :)

I found the patch just before I read this and applied it to the trunk
and it works!

Thanks Mark and martijn for all your help!

- Moazzam

On Thu, Jun 17, 2010 at 2:16 PM, Martijn v Groningen
 wrote:
> I've added a new patch to the issue, so building the trunk (rev
> 955615) with the latest patch should not be a problem. Due to recent
> changes in the Lucene trunk the patch was not compatible.
>
> On 17 June 2010 20:20, Erik Hatcher  wrote:
>>
>> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
>>>
>>> p.s. I'd be glad to contribute our Maven build re-organization back to the
>>> community to get Solr properly Mavenized so that it can be distributed and
>>> released more often.  For us the benefit of this structure is that we will
>>> be able to overlay addons such as RequestHandlers and other third party
>>> support without having to rebuild Solr from scratch.
>>
>> But you don't have to rebuild Solr from scratch to add a new request handler
>> or other plugins - simply compile your custom stuff into a JAR and put it in
>> /lib (or point to it with  in solrconfig.xml).
>>
>>>  Ideally, a Maven Archetype could be created that would allow one rapidly
>>> produce a Solr webapp and fire it up in Jetty in mere seconds.
>>
>> How's that any different than cd example; java -jar start.jar?  Or do you
>> mean a Solr client webapp?
>>
>>> Finally, with projects such as Bobo, integration with Spring would make
>>> configuration more consistent and request significantly less java coding
>>> just to add new capabilities everytime someone authors a new RequestHandler.
>>
>> It's one line of config to add a new request handler.  How many ridiculously
>> ugly confusing lines of Spring XML would it take?
>>
>>>  The biggest thing I learned about Solr in my work thusfar is that patches
>>> like these could be standalone modules in separate projects if it weren't
>>> for having to hack the configuration and solrj methods up to adopt them.
>>>  Which brings me to SolrJ, great API if it would stay generic and have less
>>> concern for adding method each time some custom collections and query
>>> support for morelikethis or collapseddocs needs to be added.
>>
>> I personally find it silly that we customize SolrJ for all these request
>> handlers anyway.  You get a decent navigable data structure back from
>> general SolrJ query requests as it is, there's no need to build in all these
>> convenience methods specific to all the Solr componetry.  Sure, it's
>> "convenient", but it's a maintenance headache and as you say, not generic.
>>
>> But hacking configuration is reasonable, I think, for adding in plugins.  I
>> guess you're aiming for some kind of Spring-like auto-discovery of plugins?
>>  Yeah, maybe, but I'm pretty -1 on Spring coming into Solr.  It's overkill
>> and ugly, IMO.  But you like it :)  And that's cool by me, to each their
>> own.
>>
>> Oh, and Hi Mark! :)
>>
>>        Erik
>>
>>
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>

RE: federated / meta search

2010-06-17 Thread Markus Jelsma

Hi,

 

Check out Solr sharding [1] capabilities. I never tested it with different 
schema's but if each node is queried with fields that it supports, it should 
return useful results.

 

[1]: http://wiki.apache.org/solr/DistributedSearch

 

Cheers.
 
-Original message-
From: Sascha Szott 
Sent: Thu 17-06-2010 19:44
To: solr-user@lucene.apache.org; 
Subject: federated / meta search

Hi folks,

if I'm seeing it right Solr currently does not provide any support for 
federated / meta searching. Therefore, I'd like to know if anyone has 
already put efforts into this direction? Moreover, is federated / meta 
search considered a scenario Solr should be able to deal with at all or 
is it (far) beyond the scope of Solr?

To be more precise, I'll give you a short explanation of my 
requirements. Assume, there are a couple of Solr instances running at 
different places. The documents stored within those instances are all 
from the same domain (bibliographic records), but it can not be ensured 
that the schema definitions conform to 100%. But lets say, there are at 
least some index fields that are present in all instances (fields with 
the same name and type definition). Now, I'd like to perform a search on 
all instances at the same time (with the restriction that the query 
contains only those fields that overlap among the different schemas) and 
combine the results in a reasonable way by utilizing the score 
information associated with each hit. Please note, that due to legal 
issues it is not feasible to build a single index that integrates the 
documents of all Solr instances under consideration.

Thanks in advance,
Sascha

Re: solr multi-node

2010-06-17 Thread Antonello Mangone

Mitch, thank you very much for your help, I'll read all the links you gave
me.


2010/6/17 MitchK 

>
> Antonello,
>
> here are a few links to the Solr Wiki:
> http://wiki.apache.org/solr/SolrReplication Solr Replication
> http://wiki.apache.org/solr/DistributedSearchDesign Distributed Search
> Design
> http://wiki.apache.org/solr/DistributedSearch Distributed Search
> http://wiki.apache.org/solr/SolrCloud Solr Cloud
>
> Hope this helps.
> - Mitch
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-multi-node-tp903159p903228.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Plural only stemmer

2010-06-17 Thread Robert Muir

I created LUCENE-2503 to address this.

On Thu, Jun 17, 2010 at 12:56 PM, Rachel Arbit  wrote:

> Hi all,
> I'm having trouble finding a stemmer that's less aggressive than the
> porter-stemmer, ideally, one that does only plural stemming.
> I've been trying to get KStem to work by copying the lucid-kstem and
> lucid-solr-kstem jars from the lucid distribution into solr/lib, but I get
> a
> classNotFound Exception for CharArraySet when I do that.
>
> Does anyone know where I can get a stemmer that fits my needs, or have tips
> on how to make it work with the KStem jars?
>
> Thanks!
>



-- 
Robert Muir
rcm...@gmail.com

Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread Otis Gospodnetic

I didn't open the issue, Mitch, but feel free to do it.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: MitchK 
> To: solr-user@lucene.apache.org
> Sent: Thu, June 17, 2010 12:07:13 PM
> Subject: Re: Re: Re: Solr and Nutch/Droids - to use or not to use?
> 
> 
Otis,

you are right. I wasn't aware of this. At least not with such a 
> large
dataList (let's think of an index with 4mio docs, this would mean we 
> got an
ExternalFile with 4mio records). But from what I've read at 
> 
search-lucene.com it seems to perform very well. Thanks for the 
> idea!

Btw: Otis, did you open a JIRA Issue for the distributed indexing 
> ability of
Solr?
I would like to follow the issue, if it is open. 
> 

Regards
- Mitch


Otis Gospodnetic-2 wrote:
> 
> 
> Mitch,
> 
> Yes, one day.  But it sounds like you are not aware 
> of ExternalFieldFile,
> which you can use today:
> 
> 
> href="http://search-lucene.com/?q=ExternalFileField&fc_project=Solr"; 
> target=_blank 
> >http://search-lucene.com/?q=ExternalFileField&fc_project=Solr
> 
> 
> Otis
> 
> Sematext :: 
> target=_blank >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene 
> ecosystem search :: 
> >http://search-lucene.com/
> 
> 
> 
> - Original 
> Message 
>> From: MitchK <
> href="mailto:mitc...@web.de";>mitc...@web.de>
>> To: 
> ymailto="mailto:solr-user@lucene.apache.org"; 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>> 
> Sent: Thu, June 17, 2010 4:15:27 AM
>> Subject: Re: Re: Re: Solr and 
> Nutch/Droids - to use or not to use?
>> 
>> 
> 
> 
> 
>> Solr doesn't know anything about OPIC, but I suppose you can 
> 
>> feed the OPIC
>> score computed by Nutch into a Solr field 
> and use it 
>> during scoring, if
>> you want, say with a 
> function query. 
>> 
> Oh! 
>> Yes, that makes more 
> sense than using the OPIC as doc-boost-value. 
>> :-)
> Anywhere 
> at the Lucene Mailing lists I read that in future it will 
>> 
> be
> possible to change field's contents without reindexing the whole 
> 
>> document.
> If one stores the OPIC-Score (which is 
> independent from the page's 
>> content)
> in a field and uses 
> functionQuery to influence the score of a 
>> document, one
> 
> saves the effort of reindexing the whole doc, if the content 
>> did 
> not change.
> 
> Regards
> - Mitch
> -- 
> View 
> this message in 
>> context: 
>> href="
> href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html";
> > 
> 
>> target=_blank 
>> >
> href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p902158.html
> 
> Sent 
>> from the Solr - User mailing list archive at 
> Nabble.com.
> 
> 
-- 
View this message in context: 
> href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p903148.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p903148.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

Question on dynamic fields

2010-06-17 Thread bbarani


Hi,

I am facing some issue with dynamic fields. I have 2 fields (UID and ID) on
which I want to do whole word search only.. I made those 2 fields to be of
type 'string'.

 

I also have a dynamic field with textgen field type as below

 

This dynamic field seems to capture all the data including the data from UID
field.. Now my issue is that I am having a copyfield called Text which I am
using to copy all the necessary static fields + all dynamic fields ( seems
like UID is also getting copied since I have used * for dynamic fields) in
to that Text field. This text field is of field type Textgen and hence it
has all kinds of analyzers implemented in it...


Now my questions is that, Is there a way for me to avoid UID and ID to be
copied in to this Text field? I want to copy all other fields (including
dynamic fields) but not ID and UID.



My schema file looks like below,

 

 


  
 
 

 
 
 



   
  
 
 
 

 
 
 

 
 
 

   
   
   


uid


 
 text

 
  

 
  
  
   
 
   
   
   
  
  
   
 



Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-dynamic-fields-tp904053p904053.html
Sent from the Solr - User mailing list archive at Nabble.com.

defType=Dismax questions

2010-06-17 Thread Blargy


Sorry for the repost but I posted under DismaxRequestHandler when I should
have listed it as DismaxQueryParser.. ie im using defType=dismax

I have a title field and a description filed. I am searching across both
fields but I don't want description matches unless they are within some slop
of each other. How can I query for this? It seems that im getting back crazy
results when there are matches that are nowhere each other 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/defType-Dismax-questions-tp904087p904087.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHandler + docBoost

2010-06-17 Thread dbashford


Pulled this out of another thread of mine as it's the only bit left that I
haven't been able to figure out.

Can someone show me briefly how one would include a docBoost inside a DIH?

I've got something like this...

var rank = row.get('rank'); 
switch (rank) {
case '1': 
row.put("$docBoost",3.0);   
break;
case '2': 
row.put("$docBoost",2.6);   
break;
case '3': 
row.put("$docBoost",2.2);   

break;
case '4': 
row.put("$docBoost",1.8);   
}

...and no effect.  I've tried rank as a int just to cover my bases...

switch (rank) {
case 1: 
row.put("$docBoost",3.0);   
break;
case 2: 
row.put("$docBoost",2.6);   
break;
case 3: 
row.put("$docBoost",2.2);   

break;
case 4: 
row.put("$docBoost",1.8);   
}

...still no effect.

And I've tried adding and removing this from my entity:



...again to no effect.  Not sure if it should be there or not, but gave both
a shot.

The results I see are a lot like this
(/select?fl=score,rank,title&q=title:red):

{"title":"red",
 "rank":10,
 "score":0.22583205},
{"title":"red",
 "rank":8,
 "score":0.22583205},

I would expect the rank 8 to have a higher score.  Not happening though.

What am I missing?




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-docBoost-tp904116p904116.html
Sent from the Solr - User mailing list archive at Nabble.com.

[ANN] Free Webinar: June 24: How Cisco uses Lucene/Solr w/ Social Networks

2010-06-17 Thread Chris Hostetter



(cross posted announcement, please keep any replies to gene...@lucene)

On behalf of Lucid Imagination, I'd like to invite folks to a free Webinar 
we're hosting on June 24th...


How Cisco’s Pulse uses Lucene/Solr to put Social Networks to Work
Thursday, June 24, 2010
9am PDT / 12pm noon EDT / 18:00 CET

To attend, you can sign up here:
http://www.eventsvc.com/lucidimagination/event/604c3606-5d13-40b1-be67-5867f547a0e4?trk=WR-JUN2010C-AP

Details...

Cisco’s new Pulse(TM) is a powerful platform that uses embedded 
Lucene/Solr search technology to tag and indexes key terms and topics from 
a broad range of media — from email to video — in real time. Tapping into 
internal communications traffic, it helps find expertise from within the 
enterprise's internal social network. Its cutting edge enterprise search 
techniques developed at Cisco with the help of Lucid Imagination


Presenters: Sonali Sambhus, senior search architect at Cisco; Stephen 
Bochniski, Software Engineer at Cisco; Thangam Arumugam, Senior Search 
Architect at Cisco; In-depth technical workshop covers how the Cisco team 
designed and optimized Pulse with Lucene and Solr, on topics including:


- Optimizing stored field retrieval performance for real time search
- Operational optimization with full index hot backups
- Performance efficient methods for highlighting text>

Re: Document boosting troubles

2010-06-17 Thread MitchK


Hi,



> One problem down, two left!  =)  bf ==> bq did the trick, thanks.  Now at
> least if I can't get the DIH solution working I don't have to tack that on
> every query string. 
> 
I would really recommend to use a boost function. If your rank will change
in future implementations, you do not need to redefine the bq. Besides that,
I think this is not only more comfortable, but also scales better.
The bq-param is more for things like "boost this category" or "boost docs of
an advertisement campaign" or something like that.

I am not sure, since I never worked with the DIH this way, but - from my
logic - the problem could be, that you do not return the row, right?
If you don't, try it again when return row was added to your sourcecode.

Otherwise, I can't help you, since there are no more codeexamples available
at the mailing list (from what I have seen).

Maybe this mailing-list topic helps you: 
http://lucene.472066.n3.nabble.com/Using-DIH-s-special-commands-Help-needed-td475695.html#a475695
Using DIHs special commands Help needed .
There are some suggestions,... however, it seems like he wasn't able to
solve the problem.



> And still can't figure out what I need to do with my dismax querying to
> get scores for quality of match. 
> 
I don't really understand what you mean. Can you explain it a little bit
more?
What, except the $docBoost, does not work as it should do?

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-boosting-troubles-tp902982p904129.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DismaxRequestHandler

2010-06-17 Thread MitchK


Joe, 

please, can you provide an example of what you are thinking of?

Subqueries with Solr... I've never seen something like that before.

Thank you!

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DismaxRequestHandler-tp903641p904142.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re: Re: Solr and Nutch/Droids - to use or not to use?

2010-06-17 Thread MitchK


Otis,

And again I wished I were registred.

I will check the JIRA and when I feel comfortable with it, I will open it.

Kind regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p904145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question on dynamic fields

2010-06-17 Thread MitchK


Barani,

without more background on dynamic fields, I would say that the most easiest
way would be to define a suffix for each of the fields you want to index
into the mentioned dynamic field and to redefine your dynamic field -
condition.

If suffix does not work, because of other dynamic-field declarations, use a
prefix.

Instead of "*_bla" to match "myField_bla", you can use "bla_*" to match
"bla_myField".

Hope this helps,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-dynamic-fields-tp904053p904159.html
Sent from the Solr - User mailing list archive at Nabble.com.

Exact match on a filter

2010-06-17 Thread Pete Chudykowski

Hi,

I'm trying with no luck to filter on the exact-match value of a field.
Speciffically:
  fq=brand:apple
returns document's whose 'brand' field contains values like "apple bottoms".

Is there a way to formulate the fq expression to match precisely and only 
"apple" ?

Thanks in advance for your help.
Pete.

Re: Exact match on a filter

2010-06-17 Thread Joe Calderon

use a copyField and index the copy as type string, exact matches on
that field should then work as the text wont be tokenized

On Thu, Jun 17, 2010 at 3:13 PM, Pete Chudykowski
 wrote:
> Hi,
>
> I'm trying with no luck to filter on the exact-match value of a field.
> Speciffically:
>  fq=brand:apple
> returns document's whose 'brand' field contains values like "apple bottoms".
>
> Is there a way to formulate the fq expression to match precisely and only 
> "apple" ?
>
> Thanks in advance for your help.
> Pete.
>

Re: DismaxRequestHandler

2010-06-17 Thread Joe Calderon

see yonik's post on nested queries
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

so for example i thought you could possibly do a dismax query across
the main fields (in this case just title) and OR that with
_query_:"{!description:'oil spill'~4}"

On Thu, Jun 17, 2010 at 3:01 PM, MitchK  wrote:
>
> Joe,
>
> please, can you provide an example of what you are thinking of?
>
> Subqueries with Solr... I've never seen something like that before.
>
> Thank you!
>
> Kind regards
> - Mitch
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/DismaxRequestHandler-tp903641p904142.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Exact match on a filter

2010-06-17 Thread Erik Hatcher


And when you do that, a best practice for fq'ing on a string field is:

   fq={!raw f=field_name}value

That avoids query parsing and the hassles associated with escaping  
special characters.


Erik


On Jun 17, 2010, at 6:22 PM, Joe Calderon wrote:


use a copyField and index the copy as type string, exact matches on
that field should then work as the text wont be tokenized

On Thu, Jun 17, 2010 at 3:13 PM, Pete Chudykowski
 wrote:

Hi,

I'm trying with no luck to filter on the exact-match value of a  
field.

Speciffically:
 fq=brand:apple
returns document's whose 'brand' field contains values like "apple  
bottoms".


Is there a way to formulate the fq expression to match precisely  
and only "apple" ?


Thanks in advance for your help.
Pete.

RE: Exact match on a filter

2010-06-17 Thread Pete Chudykowski

Wonderful,
Thank you both.

Pete.

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, June 17, 2010 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact match on a filter

And when you do that, a best practice for fq'ing on a string field is:

fq={!raw f=field_name}value

That avoids query parsing and the hassles associated with escaping  
special characters.

Erik


On Jun 17, 2010, at 6:22 PM, Joe Calderon wrote:

> use a copyField and index the copy as type string, exact matches on
> that field should then work as the text wont be tokenized
>
> On Thu, Jun 17, 2010 at 3:13 PM, Pete Chudykowski
>  wrote:
>> Hi,
>>
>> I'm trying with no luck to filter on the exact-match value of a  
>> field.
>> Speciffically:
>>  fq=brand:apple
>> returns document's whose 'brand' field contains values like "apple  
>> bottoms".
>>
>> Is there a way to formulate the fq expression to match precisely  
>> and only "apple" ?
>>
>> Thanks in advance for your help.
>> Pete.
>>

Re: Spellcheck and Solrconfig

2010-06-17 Thread Chris Hostetter


: We use Solr along with Drupal for our content management needs. The
: solrconfig.xml that we have from Drupal mentions that "we do not
: spellcheck by default" and here is our request handler from
: solrconfig.xml. 
: 
: First question - why is it recommended that we do not spellcheck by
: default

"recommended" is a missleading word -- what that configuration says is 
that it won't bother to spell check unless the URL used to query 
solr includes a "spellcheck=true" param ... it's possible that something 
the Drupal UI has it's own means of deciding when to send that param.

: Secondly  - if we add spellcheck in  tag - will
: spellcheck be enabled?

in the config you pasted, it's already there -- so you shouldn't need to 
add that (but since you don't show us what the searchComponent declaration 
for your spellcheck component looks like, we have no way of guessing if 
it's entirely configured properly -- in particular there may not be 
anything building your spellcheck micro-index)

: We are using basic Solr and Drupal configurations - only now - we are
: looking at tweaking solrconfig and schema files. Any help is greatly

since Drupal packages the solr configs, you should probably consult a 
drupal list about how to enable the spell check options "the 
drupal way" before modifying the configs (i have no idea what the 
drupal/solr feature expects as far as modifications when upgrading)


-Hoss

Re: federated / meta search

2010-06-17 Thread Joe Calderon

yes, you can use distributed search across shards with different
schemas as long as the query only references overlapping fields, i
usually test adding new fields or tokenizers on one shard and deploy
only after i verified its working properly

On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma  wrote:
> Hi,
>
>
>
> Check out Solr sharding [1] capabilities. I never tested it with different 
> schema's but if each node is queried with fields that it supports, it should 
> return useful results.
>
>
>
> [1]: http://wiki.apache.org/solr/DistributedSearch
>
>
>
> Cheers.
>
> -Original message-
> From: Sascha Szott 
> Sent: Thu 17-06-2010 19:44
> To: solr-user@lucene.apache.org;
> Subject: federated / meta search
>
> Hi folks,
>
> if I'm seeing it right Solr currently does not provide any support for
> federated / meta searching. Therefore, I'd like to know if anyone has
> already put efforts into this direction? Moreover, is federated / meta
> search considered a scenario Solr should be able to deal with at all or
> is it (far) beyond the scope of Solr?
>
> To be more precise, I'll give you a short explanation of my
> requirements. Assume, there are a couple of Solr instances running at
> different places. The documents stored within those instances are all
> from the same domain (bibliographic records), but it can not be ensured
> that the schema definitions conform to 100%. But lets say, there are at
> least some index fields that are present in all instances (fields with
> the same name and type definition). Now, I'd like to perform a search on
> all instances at the same time (with the restriction that the query
> contains only those fields that overlap among the different schemas) and
> combine the results in a reasonable way by utilizing the score
> information associated with each hit. Please note, that due to legal
> issues it is not feasible to build a single index that integrates the
> documents of all Solr instances under consideration.
>
> Thanks in advance,
> Sascha
>
>

dismax and AND as the default operator

2010-06-17 Thread Tommy Chheng

 I'm using the dismax request handler and want to set the default 
operator to AND.
Using the standard handler, i could just use the q.op or defaultOperator 
in the schema, but this doesn't work using the dismax request handler.


For example, if I call "solr/select/?q=fuel+cell", I want solr to handle 
it as a "solr/select/?q=fuel+AND+cell"


--
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com

Re: dismax and AND as the default operator

2010-06-17 Thread Chris Hostetter


:  I'm using the dismax request handler and want to set the default operator to
: AND.
: Using the standard handler, i could just use the q.op or defaultOperator in
: the schema, but this doesn't work using the dismax request handler.
: 
: For example, if I call "solr/select/?q=fuel+cell", I want solr to handle it as
: a "solr/select/?q=fuel+AND+cell"

Please consult the dismax docs...
http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29

dismax uses the "mm" param to decide how clauses that don't have an 
explicit operator will be dealt with -- the default is to require 100% of 
the terms, so if you aren't seeing that behavior then you have a 
solrconfig.xml that that sets the default mm value to something else.

Starting with Solr 4.0 (and mybe 3.1 if it's backported) the default mm 
will be based on the value of q.op (see SOLR-1889 for more details)


-Hoss

Re: Optimize with waitFlush="false" and waitSearcher="false" takes a long time

2010-06-17 Thread Chris Hostetter


: Because waitFlush doesn't work currently, your client

i didn't realize waitFlush is currently ignored ... is that an open bug 
in Jira, or was it a neccessary change because of something else?  do we 
at least log an warning if someone tries to use waitFlush=false?


-Hoss

Re: ranking question

2010-06-17 Thread Chris Hostetter


: I want to reorder the results as per function like
: sum(w0*score, w1*field1, w2*field2, w3*filed3,..)
: 
: I am using solr1.4 and it seems it does not support sort by function.
: 
: How can this be achieved
: 
: I tried using
:  q=(query)^w0 (_val_:field1)^w1 (_val_:field2...)^w2

try fq=(query)&q={!func}sum(...)

...if you can't express the entire query as a pure function, and need to 
resort to a BooleanQuery consisting of many individual function queries 
(like in your example) then consider writing a custom Similarity class 
that eliminates the querynorm.


-Hoss

Solr Project Structure (was Re: Field Collapsing SOLR-236)

2010-06-17 Thread Mark Diggory

Erik,

I try not to be exclusionary of others development tool choices in the 
selection of my own.  However, just to surely stir up a nest of hornets in true 
Apache fashion... when I saw what was done with the "templating" of the Maven 
pom work that was originally donated to solr, I just cringed at it.  The point 
of using Maven as a build tool is to avoid the complexity that was introduced 
by "one off'ing" in the manner that was finally committed. 

https://issues.apache.org/jira/browse/SOLR-19
https://issues.apache.org/jira/browse/SOLR-586

I choose to do the inverse of the solr build process as a means to manage our 
own dependency on solr in a Maven way, conventional to our current build and 
modularity practices.  I don't think Solr needs to adopt maven if they prefer 
ant, just draw clearer lines through the project about how to separate code for 
functional areas and clearly document the interfaces that should be 
customizable/changeable. JIRA Tasks should tackle core code changes separately 
from addon functionality that can be swapped out or left behind such as to 
avoid the risks of producing "spaghetti" interdependencies in the codebase. And 
if using Ant, efforts should be made to not do highly complex transformations 
of the sourcecode and or generated artifacts.  Ideally, source directories 
should have a 1 to 1 relationship to artifacts that are produced.

This SOLR-236 is a posterchild of an unclear practice or convention for how to 
package customizations to Solr.   Really, isn't SOLR-236 "wanted" enough to 
warrant that it actually reside in the svn where it could be developed properly 
rather than as a task thats been open for "how many years"?!  I'd highly 
recommend the Field Collapsing prototype ceased to be managed as patches in a 
JIRA task and actually got some code re-visioning behind it and interim release 
building available.  

I'll even confess that my "patch cludge" in my maven project to apply SOLR-236 
to the solr source is not at all a best practice in terms of supporting addons 
to solr.  It was simply an attempt to compensate.  Ideally, Field Collapsing 
should have been a separately maintained codebase in a separate maven project 
that did not interfere with the solr, solr core request handler or 
configuration implementations and simply just depended on them.  Then it could 
be dropped into a lib dir of any solr 1.4.0. (and conversely just added to my 
webapp poms as a maven dependency when they are assembled in our own build 
processes).

further comments below...

On Jun 17, 2010, at 11:20 AM, Erik Hatcher wrote:
> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
>> p.s. I'd be glad to contribute our Maven build re-organization back to the 
>> community to get Solr properly Mavenized so that it can be distributed and 
>> released more often.  For us the benefit of this structure is that we will 
>> be able to overlay addons such as RequestHandlers and other third party 
>> support without having to rebuild Solr from scratch.
> 
> But you don't have to rebuild Solr from scratch to add a new request handler 
> or other plugins - simply compile your custom stuff into a JAR and put it in 
> /lib (or point to it with  in solrconfig.xml).

Tu chez! 

>> Ideally, a Maven Archetype could be created that would allow one rapidly 
>> produce a Solr webapp and fire it up in Jetty in mere seconds.
> How's that any different than cd example; java -jar start.jar?  Or do you 
> mean a Solr client webapp?

mvn package jetty:run

Its not much different, but it is different in that its webapplication and 
development tool centric, theres no special startup code, thats just using 
maven+jetty or your debugging environment to fire up the war for testing. 

>> Finally, with projects such as Bobo, integration with Spring would make 
>> configuration more consistent and request significantly less java coding 
>> just to add new capabilities everytime someone authors a new RequestHandler.
> 
> It's one line of config to add a new request handler.  How many ridiculously 
> ugly confusing lines of Spring XML would it take?

But if I have my own configuration for that Request Handler, how many lines of 
java to I need to add/alter to get that configuration to parse in solr config 
and be available? Even if its just a few, its IMO, its still the wrong way to 
be cutting the cake.

> 
>> The biggest thing I learned about Solr in my work thusfar is that patches 
>> like these could be standalone modules in separate projects if it weren't 
>> for having to hack the configuration and solrj methods up to adopt them.  
>> Which brings me to SolrJ, great API if it would stay generic and have less 
>> concern for adding method each time some custom collections and query 
>> support for morelikethis or collapseddocs needs to be added.
> I personally find it silly that we customize SolrJ for all these request 
> handlers anyway.  You get a decent navigable data structure back from general 
> SolrJ que

Autocompletion with Solritas

2010-06-17 Thread Ken Krugler


I don't believe Solritas supports autocompletion out of the box.

So I'm wondering if anybody has experience using the LucidWorks distro  
& Solritas, plus the AJAX Solr auto-complete widget.


I realize that AJAX Solr's autocomplete support is mostly just  
leveraging the jQuery Autocomplete plugin, and hooking it up to Solr  
facets, but I was curious if there were any tricks or traps in getting  
it all to work.


Thanks,

-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: dismax and AND as the default operator

2010-06-17 Thread Tommy Chheng


 I don't think setting the mm helps.
I have mm to 1 which means the query terms should be in at least one 
field. Both query strings satisfy this condition.


The query "solr/select?q=fuel+cell" is parsed as

  "querystring":"fuel cell",
  "parsedquery":"+((DisjunctionMaxQuery((text:fuel | 
organization_name_ws_lc:fuel^5.0)) DisjunctionMaxQuery((text:cell | 
organization_name_ws_lc:cell^5.0)))~1) ()",
  "parsedquery_toString":"+(((text:fuel | organization_name_ws_lc:fuel^5.0) 
(text:cell | organization_name_ws_lc:cell^5.0))~1) ()",

returns ~900 results

The query "solr/select?q=fuel+AND+cell" is parsed as

  "querystring":"fuel AND cell",
  "parsedquery":"+(+DisjunctionMaxQuery((text:fuel | 
organization_name_ws_lc:fuel^5.0)) +DisjunctionMaxQuery((text:cell | 
organization_name_ws_lc:cell^5.0))) ()",
  "parsedquery_toString":"+(+(text:fuel | organization_name_ws_lc:fuel^5.0) 
+(text:cell | organization_name_ws_lc:cell^5.0)) ()",
returns ~80 results

(this is the behavior i want for query "fuel cell" because it adds the extra 
+). I want to do this without adding the AND for every query.



@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/17/10 4:19 PM, Chris Hostetter wrote:

:  I'm using the dismax request handler and want to set the default operator to
: AND.
: Using the standard handler, i could just use the q.op or defaultOperator in
: the schema, but this doesn't work using the dismax request handler.
:
: For example, if I call "solr/select/?q=fuel+cell", I want solr to handle it as
: a "solr/select/?q=fuel+AND+cell"

Please consult the dismax docs...
http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29

dismax uses the "mm" param to decide how clauses that don't have an
explicit operator will be dealt with -- the default is to require 100% of
the terms, so if you aren't seeing that behavior then you have a
solrconfig.xml that that sets the default mm value to something else.

Starting with Solr 4.0 (and mybe 3.1 if it's backported) the default mm
will be based on the value of q.op (see SOLR-1889 for more details)


-Hoss

Re: Solr Project Structure (was Re: Field Collapsing SOLR-236)

2010-06-17 Thread Erik Hatcher



On Jun 17, 2010, at 7:44 PM, Mark Diggory wrote:
 when I saw what was done with the "templating" of the Maven pom  
work that was originally donated to solr, I just cringed at it.


Most of us Solr committers are fairly anti-Maven or ambivalent about  
it at best, so it hasn't gotten much TLC, admittedly.  But rather than  
just cringe, help us fix it if it is still broken.  I know Ryan, and  
sometimes Grant, care about the POM stuff being right, so I'm sure you  
can count on some committer eyes on whatever you have to contribute.


Ideally, source directories should have a 1 to 1 relationship to  
artifacts that are produced.


You get complete agreement from me on that.  Directory structure is  
very important.  I haven't spent much if any time on Solr's build  
myself, alas, and I hear some complaints about it.  It certainly could  
use a bit more attention.


This SOLR-236 is a posterchild of an unclear practice or convention  
for how to package customizations to Solr.   Really, isn't SOLR-236  
"wanted" enough to warrant that it actually reside in the svn where  
it could be developed properly rather than as a task thats been open  
for "how many years"?!  I'd highly recommend the Field Collapsing  
prototype ceased to be managed as patches in a JIRA task and  
actually got some code re-visioning behind it and interim release  
building available.


This is where the new craze of git comes in, I think.   These types of  
big feature additions to Solr or any Apache project can be developed  
in a personal branch, maintained there, version controlled, etc.  And  
then patched and committed to Solr when ready.


Because of Apache's tighter control on committers, it's not really  
feasible to have Apache svn branches for these sorts of things where  
non-committers are collaborating.


I cringe at working with patches in JIRA myself - it's difficult and  
clunky, for me at least.


I'll even confess that my "patch cludge" in my maven project to  
apply SOLR-236 to the solr source is not at all a best practice in  
terms of supporting addons to solr.  It was simply an attempt to  
compensate.


Pragmatism at its finest.  +1

Ideally, Field Collapsing should have been a separately maintained  
codebase in a separate maven project that did not interfere with the  
solr, solr core request handler or configuration implementations and  
simply just depended on them.  Then it could be dropped into a lib  
dir of any solr 1.4.0. (and conversely just added to my webapp poms  
as a maven dependency when they are assembled in our own build  
processes).


I don't know the details of SOLR-236 myself, but I believe it includes  
necessary core changes too, so it can't simply be a drop in lib.


Ideally, a Maven Archetype could be created that would allow one  
rapidly produce a Solr webapp and fire it up in Jetty in mere  
seconds.
How's that any different than cd example; java -jar start.jar?  Or  
do you mean a Solr client webapp?


mvn package jetty:run


Oh, and Solr's build has this too:

  ant run-example

with optional switches to -D set: example.solr.home, example.data.dir,  
example.jetty.port, and some others like running the JVM with  
debugging enabled.


Finally, with projects such as Bobo, integration with Spring would  
make configuration more consistent and request significantly less  
java coding just to add new capabilities everytime someone authors  
a new RequestHandler.


It's one line of config to add a new request handler.  How many  
ridiculously ugly confusing lines of Spring XML would it take?


But if I have my own configuration for that Request Handler, how  
many lines of java to I need to add/alter to get that configuration  
to parse in solr config and be available? Even if its just a few,  
its IMO, its still the wrong way to be cutting the cake.


Zero lines of additional Java code.  Make your configuration available  
as a separate file pointed to by the args available in Solr's config,  
or however you want to wire your own configuration in.  Maybe I'm  
misunderstanding exactly what you want, but request handlers have init  
params.


Though, to be technical, most extensions these days are going to be  
search components, not request handlers - but the same discussion  
applies.


I personally find it silly that we customize SolrJ for all these  
request handlers anyway.  You get a decent navigable data structure  
back from general SolrJ query requests as it is, there's no need to  
build in all these convenience methods specific to all the Solr  
componetry.  Sure, it's "convenient", but it's a maintenance  
headache and as you say, not generic.


Its an example of something I coin a "policing bottleneck".  Where  
the core code introduces a pattern for convenience that restricts  
the ability  to add features to the application without "approval"  
I.E. consensus that the code contribution be part of the central  
API. Thus as long as the patch alters core code, you can't ma

Re: dismax and AND as the default operator

2010-06-17 Thread Erik Hatcher


dismax does not support the operator AND.  It uses +/- only.

set mm=100% (not 1), as Hoss said, and try your query again.

Erik

On Jun 17, 2010, at 8:08 PM, Tommy Chheng wrote:


I don't think setting the mm helps.
I have mm to 1 which means the query terms should be in at least one  
field. Both query strings satisfy this condition.


The query "solr/select?q=fuel+cell" is parsed as

 "querystring":"fuel cell",
 "parsedquery":"+((DisjunctionMaxQuery((text:fuel |  
organization_name_ws_lc:fuel^5.0)) DisjunctionMaxQuery((text:cell |  
organization_name_ws_lc:cell^5.0)))~1) ()",
 "parsedquery_toString":"+(((text:fuel |  
organization_name_ws_lc:fuel^5.0) (text:cell |  
organization_name_ws_lc:cell^5.0))~1) ()",


returns ~900 results

The query "solr/select?q=fuel+AND+cell" is parsed as

 "querystring":"fuel AND cell",
 "parsedquery":"+(+DisjunctionMaxQuery((text:fuel |  
organization_name_ws_lc:fuel^5.0)) +DisjunctionMaxQuery((text:cell |  
organization_name_ws_lc:cell^5.0))) ()",
 "parsedquery_toString":"+(+(text:fuel |  
organization_name_ws_lc:fuel^5.0) +(text:cell |  
organization_name_ws_lc:cell^5.0)) ()",

returns ~80 results

(this is the behavior i want for query "fuel cell" because it adds  
the extra +). I want to do this without adding the AND for every  
query.




@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/17/10 4:19 PM, Chris Hostetter wrote:
:  I'm using the dismax request handler and want to set the default  
operator to

: AND.
: Using the standard handler, i could just use the q.op or  
defaultOperator in

: the schema, but this doesn't work using the dismax request handler.
:
: For example, if I call "solr/select/?q=fuel+cell", I want solr to  
handle it as

: a "solr/select/?q=fuel+AND+cell"

Please consult the dismax docs...
http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29

dismax uses the "mm" param to decide how clauses that don't have an
explicit operator will be dealt with -- the default is to require  
100% of

the terms, so if you aren't seeing that behavior then you have a
solrconfig.xml that that sets the default mm value to something else.

Starting with Solr 4.0 (and mybe 3.1 if it's backported) the  
default mm

will be based on the value of q.op (see SOLR-1889 for more details)


-Hoss

Re: dismax and AND as the default operator

2010-06-17 Thread Tommy Chheng

 Thanks, Erik. that does work. I misunderstood the documentation, i 
thought "clause" meant "field" rather than the terms in the query.


If dismax doesn't support the operator AND,  why would the query 
"solr/select?q=fuel+cell" and "solr/select?q=fuel+AND+cell" get parsed 
differently(it adds the + for the AND query) and have different result 
count?


@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/17/10 5:17 PM, Erik Hatcher wrote:

dismax does not support the operator AND.  It uses +/- only.

set mm=100% (not 1), as Hoss said, and try your query again.

Erik

On Jun 17, 2010, at 8:08 PM, Tommy Chheng wrote:


I don't think setting the mm helps.
I have mm to 1 which means the query terms should be in at least one 
field. Both query strings satisfy this condition.


The query "solr/select?q=fuel+cell" is parsed as

 "querystring":"fuel cell",
 "parsedquery":"+((DisjunctionMaxQuery((text:fuel | 
organization_name_ws_lc:fuel^5.0)) DisjunctionMaxQuery((text:cell | 
organization_name_ws_lc:cell^5.0)))~1) ()",
 "parsedquery_toString":"+(((text:fuel | 
organization_name_ws_lc:fuel^5.0) (text:cell | 
organization_name_ws_lc:cell^5.0))~1) ()",


returns ~900 results

The query "solr/select?q=fuel+AND+cell" is parsed as

 "querystring":"fuel AND cell",
 "parsedquery":"+(+DisjunctionMaxQuery((text:fuel | 
organization_name_ws_lc:fuel^5.0)) +DisjunctionMaxQuery((text:cell | 
organization_name_ws_lc:cell^5.0))) ()",
 "parsedquery_toString":"+(+(text:fuel | 
organization_name_ws_lc:fuel^5.0) +(text:cell | 
organization_name_ws_lc:cell^5.0)) ()",

returns ~80 results

(this is the behavior i want for query "fuel cell" because it adds 
the extra +). I want to do this without adding the AND for every query.




@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: 
http://gradschoolnow.com



On 6/17/10 4:19 PM, Chris Hostetter wrote:
:  I'm using the dismax request handler and want to set the default 
operator to

: AND.
: Using the standard handler, i could just use the q.op or 
defaultOperator in

: the schema, but this doesn't work using the dismax request handler.
:
: For example, if I call "solr/select/?q=fuel+cell", I want solr to 
handle it as

: a "solr/select/?q=fuel+AND+cell"

Please consult the dismax docs...
http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29 



dismax uses the "mm" param to decide how clauses that don't have an
explicit operator will be dealt with -- the default is to require 
100% of

the terms, so if you aren't seeing that behavior then you have a
solrconfig.xml that that sets the default mm value to something else.

Starting with Solr 4.0 (and mybe 3.1 if it's backported) the default mm
will be based on the value of q.op (see SOLR-1889 for more details)


-Hoss

Re: dismax and AND as the default operator

2010-06-17 Thread Erik Hatcher

Hmmm, maybe I'm wrong and it does support AND.  Looking at the code I  
don't see why it wouldn't, actually.  Though I believe I've seen it  
documented that it isn't supported (or at least not advertised to  
support).  Ok, from the dismax wiki page it says: "This query handler  
supports an extremely simplified subset of the Lucene QueryParser  
syntax. Quotes can be used to group phrases, and +/- can be used to  
denote mandatory and optional clauses".  Only special single  
characters are escaped.  So AND/OR must work.  Learn something new  
every day!


Erik



On Jun 17, 2010, at 8:28 PM, Tommy Chheng wrote:

Thanks, Erik. that does work. I misunderstood the documentation, i  
thought "clause" meant "field" rather than the terms in the query.


If dismax doesn't support the operator AND,  why would the query  
"solr/select?q=fuel+cell" and "solr/select?q=fuel+AND+cell" get  
parsed differently(it adds the + for the AND query) and have  
different result count?


@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/17/10 5:17 PM, Erik Hatcher wrote:

dismax does not support the operator AND.  It uses +/- only.

set mm=100% (not 1), as Hoss said, and try your query again.

   Erik

On Jun 17, 2010, at 8:08 PM, Tommy Chheng wrote:


I don't think setting the mm helps.
I have mm to 1 which means the query terms should be in at least  
one field. Both query strings satisfy this condition.


The query "solr/select?q=fuel+cell" is parsed as

"querystring":"fuel cell",
"parsedquery":"+((DisjunctionMaxQuery((text:fuel |  
organization_name_ws_lc:fuel^5.0)) DisjunctionMaxQuery((text:cell  
| organization_name_ws_lc:cell^5.0)))~1) ()",
"parsedquery_toString":"+(((text:fuel |  
organization_name_ws_lc:fuel^5.0) (text:cell |  
organization_name_ws_lc:cell^5.0))~1) ()",


returns ~900 results

The query "solr/select?q=fuel+AND+cell" is parsed as

"querystring":"fuel AND cell",
"parsedquery":"+(+DisjunctionMaxQuery((text:fuel |  
organization_name_ws_lc:fuel^5.0)) +DisjunctionMaxQuery((text:cell  
| organization_name_ws_lc:cell^5.0))) ()",
"parsedquery_toString":"+(+(text:fuel |  
organization_name_ws_lc:fuel^5.0) +(text:cell |  
organization_name_ws_lc:cell^5.0)) ()",

returns ~80 results

(this is the behavior i want for query "fuel cell" because it adds  
the extra +). I want to do this without adding the AND for every  
query.




@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/17/10 4:19 PM, Chris Hostetter wrote:
:  I'm using the dismax request handler and want to set the  
default operator to

: AND.
: Using the standard handler, i could just use the q.op or  
defaultOperator in
: the schema, but this doesn't work using the dismax request  
handler.

:
: For example, if I call "solr/select/?q=fuel+cell", I want solr  
to handle it as

: a "solr/select/?q=fuel+AND+cell"

Please consult the dismax docs...
http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29

dismax uses the "mm" param to decide how clauses that don't have an
explicit operator will be dealt with -- the default is to require  
100% of

the terms, so if you aren't seeing that behavior then you have a
solrconfig.xml that that sets the default mm value to something  
else.


Starting with Solr 4.0 (and mybe 3.1 if it's backported) the  
default mm

will be based on the value of q.op (see SOLR-1889 for more details)


-Hoss

Peformance tuning

2010-06-17 Thread Blargy


After indexing our item descriptions our index grew from around 3gigs to now
17.5 and I can see our search has deteriorated from sub 50ms searches to
over 500ms now. The sick thing is I'm not even searching across that field
at the moment but I plan to in the near future as well as include
highlighting.

What size is considered to be "too big" for one index? When should one
looking into sharding/federation etc?

What are some generic performance tuning options that could possible help?
We are currently hosting 4 slaves. Would increasing the number of slaves
help?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904540.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Erik Hatcher

first step is to do an &debugQuery=true and see where the time is  
going on the server-side.  If you're doing highlighting of a stored  
field, that can be a biggie.   The timings will be in the debug output  
- be sure to look at both sections of the timings.


Erik

On Jun 17, 2010, at 9:37 PM, Blargy wrote:



After indexing our item descriptions our index grew from around  
3gigs to now
17.5 and I can see our search has deteriorated from sub 50ms  
searches to
over 500ms now. The sick thing is I'm not even searching across that  
field

at the moment but I plan to in the near future as well as include
highlighting.

What size is considered to be "too big" for one index? When should one
looking into sharding/federation etc?

What are some generic performance tuning options that could possible  
help?
We are currently hosting 4 slaves. Would increasing the number of  
slaves

help?
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904540.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Blargy


Is there an alternative for highlighting on a large stored field? I thought
for highlighting you needed the field stored? I really just need the
excerpting feature for highlighting relevant portions of our item
descriptions.

Not sure if this is because of the index size (17.5G) or because of
highlighting but our slave servers are experiencing high loads... possibly
due to replication That actually leads me to my next question, I thought
replication would only download new segments without the need to always
re-download the whole index. This doesn't appear to be the case from what
I'm seeing. Am I wrong?

Thanks again

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Erik Hatcher

Blargy - Please try to quote the mail you're responding to, at least  
the relevant piece.  It's nice to see some context to the discussion.


On Jun 17, 2010, at 10:23 PM, Blargy wrote:

Is there an alternative for highlighting on a large stored field?


Not currently.


I thought
for highlighting you needed the field stored?


You do.


Not sure if this is because of the index size (17.5G) or because of
highlighting but our slave servers are experiencing high loads...  
possibly
due to replication That actually leads me to my next question, I  
thought
replication would only download new segments without the need to  
always
re-download the whole index. This doesn't appear to be the case from  
what

I'm seeing. Am I wrong?


Depends - if you optimize the index on the master, then the entire  
index is replicated.  If you simply commit and let Lucene take care of  
adding segments you'll generally reduce what is replicated.


Erik

Re: Autocompletion with Solritas

2010-06-17 Thread Erik Hatcher

Your wish is my command.  Check out trunk, fire up Solr (ant run- 
example), index example data, hit http://localhost:8983/solr/browse -  
type in search box.


Just used jQuery's autocomplete plugin and the terms component for  
now, on the name field.  Quite simple to plug in, actually.  Check the  
commit diff.  The main magic is doing this:


   


Stupidly, though, jQuery's autocomplete seems to be hardcoded to send  
a q parameter, but I coded it to also send the same value as  
terms.prefix - but this could be an issue if hitting a different  
request handler where q is used for the actual query for filtering  
terms on.


Cool?!   I think so!  :)

Erik


On Jun 17, 2010, at 8:03 PM, Ken Krugler wrote:


I don't believe Solritas supports autocompletion out of the box.

So I'm wondering if anybody has experience using the LucidWorks  
distro & Solritas, plus the AJAX Solr auto-complete widget.


I realize that AJAX Solr's autocomplete support is mostly just  
leveraging the jQuery Autocomplete plugin, and hooking it up to Solr  
facets, but I was curious if there were any tricks or traps in  
getting it all to work.


Thanks,

-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Autocompletion with Solritas

2010-06-17 Thread Ken Krugler


You, sir, are on my Christmas card list.

I'll fire it up tomorrow morning & let you know how it goes.

-- Ken


On Jun 17, 2010, at 8:34pm, Erik Hatcher wrote:

Your wish is my command.  Check out trunk, fire up Solr (ant run- 
example), index example data, hit http://localhost:8983/solr/browse  
- type in search box.


Just used jQuery's autocomplete plugin and the terms component for  
now, on the name field.  Quite simple to plug in, actually.  Check  
the commit diff.  The main magic is doing this:


  


Stupidly, though, jQuery's autocomplete seems to be hardcoded to  
send a q parameter, but I coded it to also send the same value as  
terms.prefix - but this could be an issue if hitting a different  
request handler where q is used for the actual query for filtering  
terms on.


Cool?!   I think so!  :)

Erik


On Jun 17, 2010, at 8:03 PM, Ken Krugler wrote:


I don't believe Solritas supports autocompletion out of the box.

So I'm wondering if anybody has experience using the LucidWorks  
distro & Solritas, plus the AJAX Solr auto-complete widget.


I realize that AJAX Solr's autocomplete support is mostly just  
leveraging the jQuery Autocomplete plugin, and hooking it up to  
Solr facets, but I was curious if there were any tricks or traps in  
getting it all to work.


Thanks,

-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g









Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: Peformance tuning

2010-06-17 Thread Blargy



Blargy - Please try to quote the mail you're responding to, at least  
> the relevant piece.  It's nice to see some context to the discussion.

No problem ;)


Depends - if you optimize the index on the master, then the entire index is
replicated.  If you simply commit and let Lucene take care of  adding
segments you'll generally reduce what is replicated. 

As a side question... would reducing the mergeFactor help at all? This is
currently what I am using...


false
64
5
false
true


  1
  0


false
  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904810.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Blargy




> first step is to do an &debugQuery=true and see where the time is  
> going on the server-side.  If you're doing highlighting of a stored  
> field, that can be a biggie.   The timings will be in the debug output  
> - be sure to look at both sections of the timings. 
> 

Looks like the majority of the time is spend on the QueryComponent in the
Process section. Any suggestions on how I can improve this? Thanks!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904861.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Otis Gospodnetic

You may want to try the RPM tool, it will show you what inside of that 
QueryComponent is really slow.

http://blog.sematext.com/2010/05/11/solr-performance-monitoring-announcement/


Or you can run Solr under your own profiler.
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Blargy 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 18, 2010 1:22:57 AM
> Subject: Re: Peformance tuning
> 
> 


> first step is to do an &debugQuery=true and see where the 
> time is  
> going on the server-side.  If you're doing 
> highlighting of a stored  
> field, that can be a biggie.   The 
> timings will be in the debug output  
> - be sure to look at both 
> sections of the timings. 
> 

Looks like the majority of the time is 
> spend on the QueryComponent in the
Process section. Any suggestions on how I 
> can improve this? Thanks!

-- 
View this message in context: 
> href="http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904861.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904861.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Otis Gospodnetic

Hi,

Smaller merge factor will make things worse - it will cause Lucene to merge 
index segments more often (than the default merge factor of 10), thus resulting 
in more new files being created on the master, thus resulting in more network 
IO, more disk IO on the slaves, more OS cache evicted on the slaves, longer 
warmup times on slaves, higher CPU usage and higher query latency on slaves 
during warmup.  The sequence to remember.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Blargy 
> To: solr-user@lucene.apache.org
> Sent: Fri, June 18, 2010 12:41:32 AM
> Subject: Re: Peformance tuning
> 
> 

Blargy - Please try to quote the mail you're responding to, at 
> least  
> the relevant piece.  It's nice to see some context to 
> the discussion.

No problem ;)


Depends - if you optimize the 
> index on the master, then the entire index is
replicated.  If you simply 
> commit and let Lucene take care of  adding
segments you'll generally 
> reduce what is replicated. 

As a side question... would reducing the 
> mergeFactor help at all? This is
currently what I am 
> using...



> false

> 64

> 5

> false

> true


> 
  
> 1
   name="maxOptimizedCommitsToKeep">0

> 

 file="INFOSTREAM.txt">false
  
> 
-- 
View this message in context: 
> href="http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904810.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904810.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

71 matches

Mail list logo