date:20090618

Re: pk vs. uniqueKey with DIH delta-import

2009-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

apparently the row return a null 'board_id'

your stacktrace sugggests this. even if it is fixed I guess it may not
work because your are storing the id as


board-${test.board_id}

and unless your query returns something like board- it may
not work for you.

Anyway i shall put in a fix ion DIH to avoid this NPE







On Thu, Jun 18, 2009 at 2:17 AM, Erik Hatcher wrote:
> First - DIH has worked pretty well in a new customer engagement of ours.
>  We've easily imported tens of millions of records with no problem.  Kudos
> to the developers/contributors to DIH - it got us up and running quickly.
>  But now we're delving into more complexities and having some issues.
>
> Now on to my current issue, doing a delta-import such that records marked as
> "deleted" in the database are removed from Solr using deletedPkQuery.
>
> Here's a config I'm using against a mocked test database:
>
> 
>   url="jdbc:mysql://localhost/db"/>
>  
>                pk="board_id"
>            transformer="TemplateTransformer"
>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>            query="select * from boards where deleted = 'N'"
>            deltaImportQuery="select * from boards where deleted = 'N'"
>            deltaQuery="select * from boards where deleted = 'N'"
>            preImportDeleteQuery="datasource:board">
>      
>      
>      
>    
>  
> 
>
> Note that the uniqueKey in Solr is the "id" field.  And its value is a
> template board-.
>
> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In
> our definition, unique key of Solr document is the primary key of the top
> level entity".  This of course isn't really an appropriate assumption.
>
> I also tried a deletedPkQuery of "select concat('board-',board_id) from
> boards where deleted = 'Y'", but got an NPE (relevant stack trace below).
>
> It seems that deletedPkQuery only works if the pk and Solr's uniqueKey field
> use the same value.  Is that the case?  If this is the case we'll need to
> fix this somehow.  Any suggestions?
>
> Thanks,
>        Erik
>
> stack trace from scenario mentioned above:
> SEVERE: Delta Import Failed
> java.lang.NullPointerException
>        at
> org.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java:275)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:247)
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: pk vs. uniqueKey with DIH delta-import

2009-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

a have raised an issue and fixed it
https://issues.apache.org/jira/browse/SOLR-1228

2009/6/18 Noble Paul നോബിള്‍  नोब्ळ् :
> apparently the row return a null 'board_id'
>
> your stacktrace sugggests this. even if it is fixed I guess it may not
> work because your are storing the id as
>
>
> board-${test.board_id}
>
> and unless your query returns something like board- it may
> not work for you.
>
> Anyway i shall put in a fix ion DIH to avoid this NPE
>
>
>
>
>
>
>
> On Thu, Jun 18, 2009 at 2:17 AM, Erik Hatcher 
> wrote:
>> First - DIH has worked pretty well in a new customer engagement of ours.
>>  We've easily imported tens of millions of records with no problem.  Kudos
>> to the developers/contributors to DIH - it got us up and running quickly.
>>  But now we're delving into more complexities and having some issues.
>>
>> Now on to my current issue, doing a delta-import such that records marked as
>> "deleted" in the database are removed from Solr using deletedPkQuery.
>>
>> Here's a config I'm using against a mocked test database:
>>
>> 
>>  > url="jdbc:mysql://localhost/db"/>
>>  
>>    >            pk="board_id"
>>            transformer="TemplateTransformer"
>>            deletedPkQuery="select board_id from boards where deleted = 'Y'"
>>            query="select * from boards where deleted = 'N'"
>>            deltaImportQuery="select * from boards where deleted = 'N'"
>>            deltaQuery="select * from boards where deleted = 'N'"
>>            preImportDeleteQuery="datasource:board">
>>      
>>      
>>      
>>    
>>  
>> 
>>
>> Note that the uniqueKey in Solr is the "id" field.  And its value is a
>> template board-.
>>
>> I noticed the javadoc comments in DocBuilder#collectDelta it says "Note: In
>> our definition, unique key of Solr document is the primary key of the top
>> level entity".  This of course isn't really an appropriate assumption.
>>
>> I also tried a deletedPkQuery of "select concat('board-',board_id) from
>> boards where deleted = 'Y'", but got an NPE (relevant stack trace below).
>>
>> It seems that deletedPkQuery only works if the pk and Solr's uniqueKey field
>> use the same value.  Is that the case?  If this is the case we'll need to
>> fix this somehow.  Any suggestions?
>>
>> Thanks,
>>        Erik
>>
>> stack trace from scenario mentioned above:
>> SEVERE: Delta Import Failed
>> java.lang.NullPointerException
>>        at
>> org.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java:275)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:247)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:337)
>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig


MilkDud schrieb:

Ok, so lets suppose i did index across just the album.  Using that
index, how would I be able to handle searches of the form "artist name
track name".


What does the user interface look like? Do you have separate fields for
artists and tracks? Or just one field?


If i do the search using a phrase query, this won't match anything
because the artist and track are not in one field (hence my idea of
creating a third concatenated field).


What do you expect the user to enter?

* "dream theater innocence faded" - certainly wrong
* dream theater "innocence faded" - much better

Use the DisMax query parser to read the query, as I suggested in my
first reply. You need to become more familiar with the various search
facilities, that will probably steer your ideas in more promising
directions. Read up about DisMax.


If i make it a non phrase query, itll return albums that have those
words across all the tracks, which is not ideal.  I.e. if you search
for a track titled "love me" you will get back albums with the words
love and me in different tracks.


That doesn't make sense me to me. Did you inspect your query using
debugQuery=true as I suggested? What did it boil down to?


Basically, i'd like it to look at each track individually


That tells me you're thinking database and table scan.


and if the artist + just one track match all the search terms, then
that counts as a match.  Does that make sense?  If i index on the
track level, that should work, but then i have to store album/artist
info on each track.


I think the following makes much more sense:


An album should be a document and have the following fields (and
maybe more, if you have more data attached to it):

id - unique, an identifier
title - album title
interpret - the musician, possibly multi-valued
track - every song or whatever, definitely multi-valued


Read up about multi-valued fields (sample schema.xml, for example, or
Google) if you're unsure what this is; your posting subject, however,
suggests you aren't.

Regards,

Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-18 Thread Michael Ludwig


Otis Gospodnetic schrieb:

[...] nothing prevents the indexing client from sending the same doc
to multiple shards.  In some scenarios that's exactly what you want
to do.

What kind of scenario would that be?


One scenario is making use of small and large core to provide near
real-time search - you index to both - to smaller so you can
flip/drop/purge+reopen it frequently and quickly, the large one to
persist.  You search across both of them and remove dupes.


This makes sense. Thanks for taking the time to answer this.


Q: What is the most annoying thing in e-mail?


A: it never stops!


Imagine it did one day!

Michael Ludwig

Re: FilterCache issue

2009-06-18 Thread Michael Ludwig


Manepalli, Kalyan schrieb:

I am seeing an issue with the filtercache setting on my solr app
which is causing slower faceting.

Here is the configuration.




hitratio : 0.00
inserts : 973531
evictions : 972978
size : 512



cumulative_hitratio : 0.00
cumulative_inserts : 61170111
cumulative_evictions : 61153787

As we can see the cache hit ratio is almost zero. How do I improve the
filter cache.


Maybe these pages add some ideas to the mix:

http://wiki.apache.org/solr/FilterQueryGuidance
https://issues.apache.org/jira/browse/SOLR-475

Michael Ludwig

Re: Distributed querying using solr multicore.

2009-06-18 Thread Michael Ludwig


Rakhi Khatwani schrieb:

[...] how do we do a distributed search across multicores??  is it
just like how we query using multiple shards?


I don't know how we're supposed to use it. I did the following:

http://flunder:8983/solr/xpg/select?q=bla&shards=flunder:8983/solr/xpg,flunder:8983/solr/kk

For SolrJ, see this thread:

Using SolrJ with multicore/shards - ahammad
http://markmail.org/thread/qnytfrk4dytmgjis


if so, isnt there a better way to do that?


No idea.

Michael Ludwig

Re: Distributed querying using solr multicore.

2009-06-18 Thread Rakhi Khatwani

On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig  wrote:

> Rakhi Khatwani schrieb:
>
>> [...] how do we do a distributed search across multicores??  is it
>> just like how we query using multiple shards?
>>
>
> I don't know how we're supposed to use it. I did the following:
>
>
> http://flunder:8983/solr/xpg/select?q=bla&shards=flunder:8983/solr/xpg,flunder:8983/solr/kk


i am gettin a page load error... "cannot find server"


>
> For SolrJ, see this thread:
>
> Using SolrJ with multicore/shards - ahammad
> http://markmail.org/thread/qnytfrk4dytmgjis
>
>  if so, isnt there a better way to do that?
>>
>
> No idea.
>
> Michael Ludwig
>

Re: Distributed querying using solr multicore.

2009-06-18 Thread Michael Ludwig


Rakhi Khatwani schrieb:

On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig 
wrote:



I don't know how we're supposed to use it. I did the following:

http://flunder:8983/solr/xpg/select?q=bla&shards=flunder:8983/solr/xpg,flunder:8983/solr/kk


i am gettin a page load error... "cannot find server"


This is not a public server, just an example for the syntax I found by
trial and error.

Michael Ludwig

Re: Distributed querying using solr multicore.

2009-06-18 Thread Rakhi Khatwani

Hi Michael,
Sorry for the misinterpretation.

in that case, its the same like querying multiple shards. :)

Thanks,
Raakhi

On Thu, Jun 18, 2009 at 4:09 PM, Michael Ludwig  wrote:

> Rakhi Khatwani schrieb:
>
>> On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig 
>> wrote:
>>
>
>  I don't know how we're supposed to use it. I did the following:
>>>
>>>
>>> http://flunder:8983/solr/xpg/select?q=bla&shards=flunder:8983/solr/xpg,flunder:8983/solr/kk
>>>
>>
>> i am gettin a page load error... "cannot find server"
>>
>
> This is not a public server, just an example for the syntax I found by
> trial and error.
>
> Michael Ludwig
>

Re: FilterCache issue

2009-06-18 Thread Grant Ingersoll



On Jun 17, 2009, at 10:32 PM, Mark Miller wrote:

Right, so if you are on 1.3 or early 1.4 dev, with so many uniques,  
you should be using the FieldCache method of faceting. The RAM  
depends on the number of documents and number of uniques terms mostly.


With 1.4 you may be using an UninvertedField though (are your facet  
fields multivalued or tokenized?), and I know much less about that.


I'd try a cache size of 10,000 and see how it goes.


I'm not so sure about that, my guess is your going to get hammered on  
garbage collection when you do commits with something that big.


Let's take a step back.  These are LRU cache's, the fact that you have  
a zero hit ratio does not mean caching isn't working or that you  
necessarily need a bigger cache.  It suggests to me that your  
application is not the type that can benefit from caching of  
filters.   My understanding is that in certain cases with the new 1.4  
faceting, it ends up using the filterCache as well.  I believe the  
admin will give stats on the number of big terms, etc.


Perhaps you can give a bit more detail about your application and why  
you think that cache ratio is causing slower faceting.


Have you actually done some profiling/timings on the faceting?




- Mark

Manepalli, Kalyan wrote:
Got that, if its number of cache entries, definitely its very low.  
I have around 10,000 unique items to facet on. Does the RAM size  
depend on Document size.


Thanks,
Kalyan Manepalli
-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Wednesday, June 17, 2009 7:13 PM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

Its been a while since I've thought about this sort of thing, but it
looks like your cache is way too small and things get evicted before
being used. How many uniques are you faceting on? 512 is the number  
of

cache entries, not the size in kb/mb.

Try raising it - perhaps a lot ;) But consider that you have to  
have the

RAM to accommodate as well ...

What version of Solr are you using?

--
- Mark

http://www.lucidimagination.com



Manepalli, Kalyan wrote:


Hi,
   I am seeing an issue with the filtercache setting on my  
solr app which is causing slower faceting.


Here is the configuration.
autowarmCount="256"/>


Statistics:
description:  LRU Cache(maxSize=512, initialSize=512,  
autowarmCount=256,  
regenerator=org.apache.solr.search.solrindexsearche...@8d41f2)

stats: lookups : 979692
hits : 6904
hitratio : 0.00
inserts : 973531
evictions : 972978
size : 512
warmupTime : 1479
cumulative_lookups : 61660491
cumulative_hits : 516057
cumulative_hitratio : 0.00
cumulative_inserts : 61170111
cumulative_evictions : 61153787

As we can see the cache hit ratio is almost zero. How do I improve  
the filter cache.
Also wanted to know what does the size mean. Is it number of  
documents or the memory size (kb/mb)


Any suggestions in this regard will be very helpful.

Thanks,
Kalyan Manepalli













--
- Mark

http://www.lucidimagination.com





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: FilterCache issue

2009-06-18 Thread Mark Miller

Thats why I asked about multi-valued terms. If hes not using the enum 
faceting method (which only makes sense with fewer uniques), and the 
fields are not multi-valued, than it is using the FieldCache method. 
Which of course does use the filterCache, and works best when the 
filterCache size is the size of all the unique terms in all the fields 
you are faceting on.


Perhaps his machine can't handle that, but certainly he would benefit 
heavily from the cache. And if you had tons of uniques and a small 
cache, you would see exactly what he is seeing. Probably best to see how 
it goes before seeing if you have to optimize based on garbage 
collection (and I beleive that he could run in to resource issues, whats 
why I said give it a shot :) ). He may just need more resources than he 
can get, but I'm fairly sure he needs the resources. Unless its 
multivalued fields and its using the UninvertedField - not sure how the 
filterCache plays into that.


- Mark

Grant Ingersoll wrote:


On Jun 17, 2009, at 10:32 PM, Mark Miller wrote:

Right, so if you are on 1.3 or early 1.4 dev, with so many uniques, 
you should be using the FieldCache method of faceting. The RAM 
depends on the number of documents and number of uniques terms mostly.


With 1.4 you may be using an UninvertedField though (are your facet 
fields multivalued or tokenized?), and I know much less about that.


I'd try a cache size of 10,000 and see how it goes.


I'm not so sure about that, my guess is your going to get hammered on 
garbage collection when you do commits with something that big.


Let's take a step back.  These are LRU cache's, the fact that you have 
a zero hit ratio does not mean caching isn't working or that you 
necessarily need a bigger cache.  It suggests to me that your 
application is not the type that can benefit from caching of 
filters.   My understanding is that in certain cases with the new 1.4 
faceting, it ends up using the filterCache as well.  I believe the 
admin will give stats on the number of big terms, etc.


Perhaps you can give a bit more detail about your application and why 
you think that cache ratio is causing slower faceting.


Have you actually done some profiling/timings on the faceting?




- Mark

Manepalli, Kalyan wrote:
Got that, if its number of cache entries, definitely its very low. I 
have around 10,000 unique items to facet on. Does the RAM size 
depend on Document size.


Thanks,
Kalyan Manepalli
-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Wednesday, June 17, 2009 7:13 PM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

Its been a while since I've thought about this sort of thing, but it
looks like your cache is way too small and things get evicted before
being used. How many uniques are you faceting on? 512 is the number of
cache entries, not the size in kb/mb.

Try raising it - perhaps a lot ;) But consider that you have to have 
the

RAM to accommodate as well ...

What version of Solr are you using?

--
- Mark

http://www.lucidimagination.com



Manepalli, Kalyan wrote:


Hi,
   I am seeing an issue with the filtercache setting on my 
solr app which is causing slower faceting.


Here is the configuration.
autowarmCount="256"/>


Statistics:
description:  LRU Cache(maxSize=512, initialSize=512, 
autowarmCount=256, 
regenerator=org.apache.solr.search.solrindexsearche...@8d41f2)

stats: lookups : 979692
hits : 6904
hitratio : 0.00
inserts : 973531
evictions : 972978
size : 512
warmupTime : 1479
cumulative_lookups : 61660491
cumulative_hits : 516057
cumulative_hitratio : 0.00
cumulative_inserts : 61170111
cumulative_evictions : 61153787

As we can see the cache hit ratio is almost zero. How do I improve 
the filter cache.
Also wanted to know what does the size mean. Is it number of 
documents or the memory size (kb/mb)


Any suggestions in this regard will be very helpful.

Thanks,
Kalyan Manepalli













--
- Mark

http://www.lucidimagination.com





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
using Solr/Lucene:

http://www.lucidimagination.com/search




--
- Mark

http://www.lucidimagination.com

Does Solr 1.4 really work nicely on Jboss 4?

2009-06-18 Thread Giovanni De Stefano

Hello all,

I have a simple question :-)

In my project it is mandatory to use Jboss 4.0.1 SP3 and Java 1.5.0_06/08.
The software relies on Solr 1.4.

Now, I am aware that some JSP Admin pages will not be displayed due to some
Java5/6 dependency but this is not a problem because rewriting some of the
JSPs it is possible to have everything up and running.

The real question is: is anybody aware of any feature that might not work
when deploying the solr based software in Jboss 4?

I look forward to hearing your experience.

Cheers,
Giovanni

Re: pk vs. uniqueKey with DIH delta-import

2009-06-18 Thread Erik Hatcher



On Jun 18, 2009, at 4:51 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

apparently the row return a null 'board_id'


No.  I'm working with a test database situation with a single record,  
and I simply do a full-import, then change the deleted column to 'Y'  
and try a delta-import.  The deletedPkQuery returns a single result in  
that case, and the NPE came from when I made the query return board-1  
instead of just 1.



your stacktrace sugggests this. even if it is fixed I guess it may not
work because your are storing the id as

board-${test.board_id}

and unless your query returns something like board- it may
not work for you.

Anyway i shall put in a fix ion DIH to avoid this NPE


That fix didn't solve the NPE.  I still get the following stacktrace  
when I have deletedPkQuery="select concat('board-',board_id) from  
boards where deleted = 'Y'".   I presume it's looking for the pk  
column (board_id in my case) in the results of the deletedPkQuery.


SEVERE: Delta Import Failed
java.lang.NullPointerException
	at  
org 
.apache.solr.handler.dataimport.SolrWriter.deleteDoc(SolrWriter.java:83)
	at  
org 
.apache.solr.handler.dataimport.DocBuilder.deleteAll(DocBuilder.java: 
289)
	at  
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java: 
247)
	at  
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java: 
159)
	at  
org 
.apache 
.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java: 
337)


I changed to deletedPkQuery="select concat('board-',board_id) as  
board_id from boards where deleted = 'Y'" and got no NPE, but I also  
still haven't been able to get DIH to properly remove Solr documents  
that have been flagged as deleted in the database.


Erik

Re: pk vs. uniqueKey with DIH delta-import

2009-06-18 Thread Erik Hatcher



On Jun 18, 2009, at 4:51 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



apparently the row return a null 'board_id'


I replied "No" earlier, but of course you're right here.  The  
deletedPkQuery I originally used was not returning a board_id column.   
And even if it did, that isn't the uniqueKey (id field) value.


What about having the results of the deletedPkQuery run through the  
same transformation process that indexing would, only for the field  
that matches Solr's uniqueKey setting would be necessary??


Erik

Re: FilterCache issue

2009-06-18 Thread Yonik Seeley

On Thu, Jun 18, 2009 at 8:35 AM, Mark Miller wrote:
> Thats why I asked about multi-valued terms. If hes not using the enum
> faceting method (which only makes sense with fewer uniques), and the fields
> are not multi-valued, than it is using the FieldCache method. Which of
> course does use the filterCache,

The FieldCache method for single-valued fields does not use the
filterCache... that's only for big terms on multi-valued fields.

-Yonik
http://www.lucidimagination.com

Re: FilterCache issue

2009-06-18 Thread Mark Miller


Yonik Seeley wrote:

On Thu, Jun 18, 2009 at 8:35 AM, Mark Miller wrote:
  

Thats why I asked about multi-valued terms. If hes not using the enum
faceting method (which only makes sense with fewer uniques), and the fields
are not multi-valued, than it is using the FieldCache method. Which of
course does use the filterCache,



The FieldCache method for single-valued fields does not use the
filterCache... that's only for big terms on multi-valued fields.

-Yonik
http://www.lucidimagination.com
  

Ah, I think the wiki is incorrect then.

SolrCaching

If you use faceting with the fieldCache method (see SolrFacetingOverview 
), it is recommended 
that you set the filterCache size to be greater than the number of 
unique values in all of your faceted fields.


--
- Mark

http://www.lucidimagination.com

Re: FilterCache issue

2009-06-18 Thread Mark Miller


Mark Miller wrote:

Yonik Seeley wrote:
On Thu, Jun 18, 2009 at 8:35 AM, Mark Miller 
wrote:
 

Thats why I asked about multi-valued terms. If hes not using the enum
faceting method (which only makes sense with fewer uniques), and the 
fields

are not multi-valued, than it is using the FieldCache method. Which of
course does use the filterCache,



The FieldCache method for single-valued fields does not use the
filterCache... that's only for big terms on multi-valued fields.

-Yonik
http://www.lucidimagination.com
  

Ah, I think the wiki is incorrect then.

SolrCaching

If you use faceting with the fieldCache method (see 
SolrFacetingOverview 
), it is recommended 
that you set the filterCache size to be greater than the number of 
unique values in all of your faceted fields.




Thats some pretty misleading info. I was wondering how the heck the 
filterCache played into counting off a FieldCache.



--
- Mark

http://www.lucidimagination.com

Re: Searching across multivalued fields

2009-06-18 Thread Vicky_Dev


Hi Michel,

We are also facing same problem mentioned in the post (we are using
dismaxrequesthandler)::

Ex: There is product title field in which --possible values
1) in unique key ID =1000
prdTitle_s field contains value "ladybird classic"

2) in unique key ID =1001
prdTitle_s field contains value "ladybird" 

When we are searching for --q=prdTitle_s:"ladybird"&qt=dismax , we are
getting 2 results --  unique key ID =1000 and  unique key ID =1001 

Is it possible to just exact match which is nothing but unique key = 1001?

Note: by default mm value is 100% per Solr documentation

~Vikrant





Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
>> Ok, so lets suppose i did index across just the album.  Using that
>> index, how would I be able to handle searches of the form "artist name
>> track name".
> 
> What does the user interface look like? Do you have separate fields for
> artists and tracks? Or just one field?
> 
>> If i do the search using a phrase query, this won't match anything
>> because the artist and track are not in one field (hence my idea of
>> creating a third concatenated field).
> 
> What do you expect the user to enter?
> 
> * "dream theater innocence faded" - certainly wrong
> * dream theater "innocence faded" - much better
> 
> Use the DisMax query parser to read the query, as I suggested in my
> first reply. You need to become more familiar with the various search
> facilities, that will probably steer your ideas in more promising
> directions. Read up about DisMax.
> 
>> If i make it a non phrase query, itll return albums that have those
>> words across all the tracks, which is not ideal.  I.e. if you search
>> for a track titled "love me" you will get back albums with the words
>> love and me in different tracks.
> 
> That doesn't make sense me to me. Did you inspect your query using
> debugQuery=true as I suggested? What did it boil down to?
> 
>> Basically, i'd like it to look at each track individually
> 
> That tells me you're thinking database and table scan.
> 
>> and if the artist + just one track match all the search terms, then
>> that counts as a match.  Does that make sense?  If i index on the
>> track level, that should work, but then i have to store album/artist
>> info on each track.
> 
> I think the following makes much more sense:
> 
>>> An album should be a document and have the following fields (and
>>> maybe more, if you have more data attached to it):
>>>
>>> id - unique, an identifier
>>> title - album title
>>> interpret - the musician, possibly multi-valued
>>> track - every song or whatever, definitely multi-valued
> 
> Read up about multi-valued fields (sample schema.xml, for example, or
> Google) if you're unsure what this is; your posting subject, however,
> suggests you aren't.
> 
> Regards,
> 
> Michael Ludwig
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24093897.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr 1.4 really work nicely on Jboss 4?

2009-06-18 Thread Development Team

Hi Giovanni,

 Solr 1.4 does work fine in JBoss (all of the features, including all of
the admin pages). For example, I am running it in JBoss 4.0.5.GA on JDK
1.5.0_18 without problems. I am also using Jetty instead of Tomcat, however
instructions for getting it to work in JBoss with Tomcat can be found here:
 http://wiki.apache.org/solr/SolrJBoss  It should work fine on JBoss 4.0.1.

- Daryl.

On Thu, Jun 18, 2009 at 8:57 AM, Giovanni De Stefano <
giovanni.destef...@gmail.com> wrote:

> Hello all,
>
> I have a simple question :-)
>
> In my project it is mandatory to use Jboss 4.0.1 SP3 and Java 1.5.0_06/08.
> The software relies on Solr 1.4.
>
> Now, I am aware that some JSP Admin pages will not be displayed due to some
> Java5/6 dependency but this is not a problem because rewriting some of the
> JSPs it is possible to have everything up and running.
>
> The real question is: is anybody aware of any feature that might not work
> when deploying the solr based software in Jboss 4?
>
> I look forward to hearing your experience.
>
> Cheers,
> Giovanni
>

Re: Boost Query effect with Standard Request Handler

2009-06-18 Thread Vicky_Dev


Hi Hossman,

We are also facing similar issue:

is there any way to boost fields in standard query parser itself?

~Vikrant




hossman wrote:
> 
> 
> : The reason I brought the question back up is that hossman said:
>   ...
> : I tried it and it didn't work, so I was curious if I was still doing
> : something wrong.
> 
> no ... i'm just a foolish foolish man who says things with a lot of 
> authority even though i clearly don't know what i'm talking about.
> 
> bq isn't supported directly by SearchHandler, and i need to learn to 
> double check the code before i answer questions about changes i wasn't 
> intimately involved with :)  ... sorry about that.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Boost-Query-effect-with-Standard-Request-Handler-tp20042301p24094722.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: FilterCache issue

2009-06-18 Thread Manepalli, Kalyan

I am faceting on the single values only. I ran load test against solr app and 
found that under increased load the faceting just gets slower and slower. That 
is why I wanted to investigate filtercache and any other features to tweak the 
performance.
As suggested by Mark in the earlier email, I increased the size of filtercache 
and the performance has improved. I need to further test to see the impact on 
other areas.

Thanks,
Kalyan Manepalli

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Thursday, June 18, 2009 9:15 AM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

Mark Miller wrote:
> Yonik Seeley wrote:
>> On Thu, Jun 18, 2009 at 8:35 AM, Mark Miller 
>> wrote:
>>  
>>> Thats why I asked about multi-valued terms. If hes not using the enum
>>> faceting method (which only makes sense with fewer uniques), and the 
>>> fields
>>> are not multi-valued, than it is using the FieldCache method. Which of
>>> course does use the filterCache,
>>> 
>>
>> The FieldCache method for single-valued fields does not use the
>> filterCache... that's only for big terms on multi-valued fields.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>   
> Ah, I think the wiki is incorrect then.
>
> SolrCaching
>
> If you use faceting with the fieldCache method (see 
> SolrFacetingOverview 
> ), it is recommended 
> that you set the filterCache size to be greater than the number of 
> unique values in all of your faceted fields.
>

Thats some pretty misleading info. I was wondering how the heck the 
filterCache played into counting off a FieldCache.

-- 
- Mark

http://www.lucidimagination.com

Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig


Hi Vicky,

Vicky_Dev schrieb:

We are also facing same problem mentioned in the post (we are using
dismaxrequesthandler)::



When we are searching for --q=prdTitle_s:"ladybird"&qt=dismax , we are
getting 2 results --  unique key ID =1000 and  unique key ID =1001


(1) Append debugQuery=true to your query and see how the DisMax query
parser rewrites your query, interpreting what you think is a field name
as just another query term.

(2) Proceed immediately to read the whole Wiki page explaining DisMax:

http://wiki.apache.org/solr/DisMaxRequestHandler


Is it possible to just exact match which is nothing but unique key =
1001?


Yes, it is:  q=id:1001

(1) Don't use DisMax here, that will not interpret field names.
(2) Replace "id" by whatever name you gave to your unique key field.

Michael Ludwig

Re: Boost Query effect with Standard Request Handler

2009-06-18 Thread Erik Hatcher



On Jun 18, 2009, at 10:54 AM, Vicky_Dev wrote:

is there any way to boost fields in standard query parser itself?


You can boost terms using field:term^2.0 syntax

See http://wiki.apache.org/solr/SolrQuerySyntax and down into http://lucene.apache.org/java/2_4_0/queryparsersyntax.html 
 for more details.


Erik

Re: FilterCache issue

2009-06-18 Thread Yonik Seeley

On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
Kalyan wrote:
> I am faceting on the single values only.

You may have only added a single value to each field, but is the field
defined to be single valued or multi valued?

Also, what version of Solr are you using?

-Yonik
http://www.lucidimagination.com

Re: Solr Jetty confusion

2009-06-18 Thread Development Team

Hey,
 So... I'm assuming your problem is that you're having trouble deploying
Solr in Jetty? Or is your problem that it's deploying just fine but your
code throws an exception when you try to run it?
 I am running Solr in Jetty, and I just copied the war into the webapps
directory and it worked. It was accessible under /solr, and it was
accessible under the port that Jetty has as its HTTP listener (which is
probably 8080 by default, but probably won't be 8983). To specify the
solr-home I use a Java system property (instead of the JNDI way) since I
already have other necessary system properties for my apps. So if your
problem turns out to be with the JNDI, sorry I won't be of much help.
 Hope that helps...

- Daryl.

On Thu, Jun 18, 2009 at 2:44 AM, pof  wrote:

>
> Hi, I am currently trying to write a Jetty embedded java app that
> implements
> SOLR and uses SOLRJ by excepting posts telling it to do a batch index, or a
> deletion or what have you. At this point I am completely lost trying to
> follow http://wiki.apache.org/solr/SolrJetty . In my constructor I am
> doing
> the following call:
>
> Server server = new Server();
> XmlConfiguration configuration = new XmlConfiguration(new
> FileInputStream("solrjetty.xml"));
>
> My xml has two calls, an addConnector to configure the port etc. and the
> addWebApplication as specified on the solr wiki. When running the app I get
> this:
>
> Exception in thread "main" java.lang.IllegalStateException: No Method:
>  name="addWebApplication">/solr/*/webapps/solr.war name="extractWAR">true
> name="defaultsDescriptor">org/mortbay/jetty/servlet/webdefault.xml name="addEnvEntry">/solr/home type="String">/solr/home on class
> org.mortbay.jetty.Server
>
> Can anyone point me in the right direction? Thanks.
> --
> View this message in context:
> http://www.nabble.com/Solr-Jetty-confusion-tp24087264p24087264.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

PlainTextEntitiyProcessor not putting any text into a field in index

2009-06-18 Thread Jay Hill

I'm having some trouble getting the PlainTextEntityProcessor to populate a
field in an index. I'm using the TemplateTransformer to fill 2 fields, and
have a timestamp field in schema.xml, and these fields make it into the
index. Only the plaintText data is missing. Here is my configuration:







  
  






I've tried adding "plainText" as a field in schema.xml, but that didn't work
either.

When I look at what the PlainTextEntityProcessor class is doing I see that
it has correctly parsed the file and has the text in a StringWriter:
row.put(PLAIN_TEXT, sw.toString());
I just don't know how to get that text into a field in the index

Any pointers appreciated.

-Jay

Numerical range faceting

2009-06-18 Thread gwk


Hi,

I'm currently using facet.query to do my numerical range faceting. I 
basically use a fixed price range of €0 to €1 in steps of €500 which 
means 20 facet.queries plus an extra facet.query for anything above 
€1. I use the inclusive/exclusive query as per my question two days 
ago so the facets add up to the total number of products. This is done 
so that the javascript on my search page can accurately show the amount 
of products returned for a specified range before submitting it to the 
server by adding up the facet counts for the selected range.


I'm a bit concerned about the amount and size of my request to the 
server. Especially because there are other numerical values which might 
be interesting to facet on and I've noticed the server won't response 
correctly if I add (many) more facet.queries by decreasing the step 
size. I was really hoping for faceting options for numerical ranges 
similar to the date faceting options. The functionality would be 
practically identical as far as I can tell (which isn't very far as I 
know very little about the internals of Solr) so I was wondering if such 
options are planned or if I'm overlooking something.


Regards,

gwk

Can I use the same index from 1.2.0 to 1.3.0?

2009-06-18 Thread Francis Yakin



Can I transport the index from Solr 1.2 to Sol 1.3 without 
resubmiting/reloading again from Database?

Francis

RE: FilterCache issue

2009-06-18 Thread Manepalli, Kalyan

The fields are defined as single valued and they are non tokenized for.
I am using solr 1.3 waiting for release of solr 1.4.

Thanks,
Kalyan Manepalli
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Thursday, June 18, 2009 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
Kalyan wrote:
> I am faceting on the single values only.

You may have only added a single value to each field, but is the field
defined to be single valued or multi valued?

Also, what version of Solr are you using?

-Yonik
http://www.lucidimagination.com

Re: FilterCache issue

2009-06-18 Thread Yonik Seeley

On Thu, Jun 18, 2009 at 12:19 PM, Manepalli,
Kalyan wrote:
> The fields are defined as single valued and they are non tokenized for.
> I am using solr 1.3 waiting for release of solr 1.4.

Then the filterCache won't be used for faceting, just for filters.
You should be able to verify this by looking at how the cache stats
change for a single faceting request.

-Yonik
http://www.lucidimagination.com



> Thanks,
> Kalyan Manepalli
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Thursday, June 18, 2009 10:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: FilterCache issue
>
> On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
> Kalyan wrote:
>> I am faceting on the single values only.
>
> You may have only added a single value to each field, but is the field
> defined to be single valued or multi valued?
>
> Also, what version of Solr are you using?
>
>

Re: PlainTextEntitiyProcessor not putting any text into a field in index

2009-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्

can you just log it and see what is contained in the plainText field.
(using LogTransformer)

On Thu, Jun 18, 2009 at 8:54 PM, Jay Hill wrote:
> I'm having some trouble getting the PlainTextEntityProcessor to populate a
> field in an index. I'm using the TemplateTransformer to fill 2 fields, and
> have a timestamp field in schema.xml, and these fields make it into the
> index. Only the plaintText data is missing. Here is my configuration:
>
> 
>    
>    
>               name="f"
>       processor="FileListEntityProcessor"
>       baseDir="/Users/jayhill/test/dir"
>       fileName=".*txt"
>       recursive="true"
>       rootEntity="true"
>       >
>
>                   name="pt"
>           processor="PlainTextEntityProcessor"
>           url="${f.fileAbsolutePath}"
>           transformer="RegexTransformer,TemplateTransformer"
>           >
>          
>          
>        
>
>        
>    
> 
>
> I've tried adding "plainText" as a field in schema.xml, but that didn't work
> either.
>
> When I look at what the PlainTextEntityProcessor class is doing I see that
> it has correctly parsed the file and has the text in a StringWriter:
>    row.put(PLAIN_TEXT, sw.toString());
> I just don't know how to get that text into a field in the index
>
> Any pointers appreciated.
>
> -Jay
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: FilterCache issue

2009-06-18 Thread Mark Miller


Maybe he is not using the FieldCache method?

Yonik Seeley wrote:

On Thu, Jun 18, 2009 at 12:19 PM, Manepalli,
Kalyan wrote:
  

The fields are defined as single valued and they are non tokenized for.
I am using solr 1.3 waiting for release of solr 1.4.



Then the filterCache won't be used for faceting, just for filters.
You should be able to verify this by looking at how the cache stats
change for a single faceting request.

-Yonik
http://www.lucidimagination.com



  

Thanks,
Kalyan Manepalli
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Thursday, June 18, 2009 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
Kalyan wrote:


I am faceting on the single values only.
  

You may have only added a single value to each field, but is the field
defined to be single valued or multi valued?

Also, what version of Solr are you using?






--
- Mark

http://www.lucidimagination.com

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Mark Miller


Why do you have:
query.set("hl.maxAnalyzedChars", -1);

Have you tried using the default? Unless -1 is an undoc'd feature, this 
means you wouldnt get anything back! This should normally be a fairly 
hefty value and defaults to 51200, according to the wiki.


And why:
query.set("hl.fragsize", 1);

That means a fragment could only be 1 char - again, I'd try the default 
(take out the param), and adjust from there.

(wiki says the default is 100).

Let us know how it goes.

--
- Mark

http://www.lucidimagination.com



Bruno wrote:

Hi guys.
 
I new at using highlighting, so probably I'm making some stupid 
mistake, however I'm not founding anything wrong.
 
I use highlighting from a query withing a EmbeddedSolrServer, and 
within the query I've set parameters necessary for enabling 
highlighting. Attached, follows my schema and solrconfig.xml , and 
down below follows the Java code. Content from the SolrDocumentList is 
not highlighted.
 


EmbeddedSolrServer server = SolrServerManager./getServerEv/();
String queryString = filter;
SolrQuery query =

*new* SolrQuery();

query.setQuery(queryString);
query.setHighlight(*true*);
query.addHighlightField(/LOG_FIELD/);
query.setHighlightSimplePost("");
query.setHighlightSimplePre("");
query.set("hl.usePhraseHighlighter", *true*);
query.set("hl.highlightMultiTerm", *true*);
query.set("hl.snippets", 100);
query.set("hl.fragsize", 1);
query.set("hl.mergeContiguous", *false*);
query.set("hl.requireFieldMatch", *false*);
query.set("hl.maxAnalyzedChars", -1);

query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/, 
1));
query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/, 
1));

query.setIncludeScore(*true*);
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

--
Bruno Morelli Vargas  
Mail: brun...@gmail.com 

Msn: brun...@hotmail.com 
Icq: 165055101
Skype: morellibmv

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno

I've tried with default values and didn't work either.


On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller  wrote:

> Why do you have:
> query.set("hl.maxAnalyzedChars", -1);
>
> Have you tried using the default? Unless -1 is an undoc'd feature, this
> means you wouldnt get anything back! This should normally be a fairly hefty
> value and defaults to 51200, according to the wiki.
>
> And why:
> query.set("hl.fragsize", 1);
>
> That means a fragment could only be 1 char - again, I'd try the default
> (take out the param), and adjust from there.
> (wiki says the default is 100).
>
> Let us know how it goes.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Bruno wrote:
>
>>  Hi guys.
>>  I new at using highlighting, so probably I'm making some stupid mistake,
>> however I'm not founding anything wrong.
>>  I use highlighting from a query withing a EmbeddedSolrServer, and within
>> the query I've set parameters necessary for enabling highlighting. Attached,
>> follows my schema and solrconfig.xml , and down below follows the Java code.
>> Content from the SolrDocumentList is not highlighted.
>>
>> EmbeddedSolrServer server = SolrServerManager./getServerEv/();
>> String queryString = filter;
>> SolrQuery query =
>>
>> *new* SolrQuery();
>>
>> query.setQuery(queryString);
>> query.setHighlight(*true*);
>> query.addHighlightField(/LOG_FIELD/);
>> query.setHighlightSimplePost("");
>> query.setHighlightSimplePre("");
>> query.set("hl.usePhraseHighlighter", *true*);
>> query.set("hl.highlightMultiTerm", *true*);
>> query.set("hl.snippets", 100);
>> query.set("hl.fragsize", 1);
>> query.set("hl.mergeContiguous", *false*);
>> query.set("hl.requireFieldMatch", *false*);
>> query.set("hl.maxAnalyzedChars", -1);
>>
>> query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
>> query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>> 1));
>> query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>> 1));
>> query.setIncludeScore(*true*);
>> QueryResponse rsp = server.query(query);
>> SolrDocumentList docs = rsp.getResults();
>>
>> --
>> Bruno Morelli Vargas  Mail: brun...@gmail.com > brun...@gmail.com>
>> Msn: brun...@hotmail.com 
>> Icq: 165055101
>> Skype: morellibmv
>>
>>
>
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: FilterCache issue

2009-06-18 Thread Yonik Seeley

On Thu, Jun 18, 2009 at 1:22 PM, Mark Miller wrote:
> Maybe he is not using the FieldCache method?

It occurs to me that this might be nice info to add to debugging info
(the exact method used + perhaps some other info).

-Yonik
http://www.lucidimagination.com

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno

Couple of things I've forgot to mention:

Solr Version: 1.3
Enviroment: Websphere

On Thu, Jun 18, 2009 at 2:34 PM, Bruno  wrote:

> I've tried with default values and didn't work either.
>
>
> On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller wrote:
>
>> Why do you have:
>> query.set("hl.maxAnalyzedChars", -1);
>>
>> Have you tried using the default? Unless -1 is an undoc'd feature, this
>> means you wouldnt get anything back! This should normally be a fairly hefty
>> value and defaults to 51200, according to the wiki.
>>
>> And why:
>> query.set("hl.fragsize", 1);
>>
>> That means a fragment could only be 1 char - again, I'd try the default
>> (take out the param), and adjust from there.
>> (wiki says the default is 100).
>>
>> Let us know how it goes.
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>> Bruno wrote:
>>
>>>  Hi guys.
>>>  I new at using highlighting, so probably I'm making some stupid mistake,
>>> however I'm not founding anything wrong.
>>>  I use highlighting from a query withing a EmbeddedSolrServer, and within
>>> the query I've set parameters necessary for enabling highlighting. Attached,
>>> follows my schema and solrconfig.xml , and down below follows the Java code.
>>> Content from the SolrDocumentList is not highlighted.
>>>
>>> EmbeddedSolrServer server = SolrServerManager./getServerEv/();
>>> String queryString = filter;
>>> SolrQuery query =
>>>
>>> *new* SolrQuery();
>>>
>>> query.setQuery(queryString);
>>> query.setHighlight(*true*);
>>> query.addHighlightField(/LOG_FIELD/);
>>> query.setHighlightSimplePost("");
>>> query.setHighlightSimplePre("");
>>> query.set("hl.usePhraseHighlighter", *true*);
>>> query.set("hl.highlightMultiTerm", *true*);
>>> query.set("hl.snippets", 100);
>>> query.set("hl.fragsize", 1);
>>> query.set("hl.mergeContiguous", *false*);
>>> query.set("hl.requireFieldMatch", *false*);
>>> query.set("hl.maxAnalyzedChars", -1);
>>>
>>> query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
>>> query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>>> 1));
>>> query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
>>> 1));
>>> query.setIncludeScore(*true*);
>>> QueryResponse rsp = server.query(query);
>>> SolrDocumentList docs = rsp.getResults();
>>>
>>> --
>>> Bruno Morelli Vargas  Mail: brun...@gmail.com >> brun...@gmail.com>
>>> Msn: brun...@hotmail.com 
>>> Icq: 165055101
>>> Skype: morellibmv
>>>
>>>
>>
>>
>>
>
>
> --
> Bruno Morelli Vargas
> Mail: brun...@gmail.com
> Msn: brun...@hotmail.com
> Icq: 165055101
> Skype: morellibmv
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Mark Miller


Nothing off the top of my head ...

I can play around with some of the solrj unit tests a bit later and 
perhaps see if I can dig anything up.


Note:
if you expect wildcard/prefix/etc queries to highlight, they will not 
with Solr 1.3.


query.set("hl.highlightMultiTerm", *true*);

The above only applies to solr 1.4.
So if your query is just a wildcard ...

What is your query, by the way?


--
- Mark

http://www.lucidimagination.com



Bruno wrote:

Couple of things I've forgot to mention:

Solr Version: 1.3
Enviroment: Websphere

On Thu, Jun 18, 2009 at 2:34 PM, Bruno  wrote:

  

I've tried with default values and didn't work either.


On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller wrote:



Why do you have:
query.set("hl.maxAnalyzedChars", -1);

Have you tried using the default? Unless -1 is an undoc'd feature, this
means you wouldnt get anything back! This should normally be a fairly hefty
value and defaults to 51200, according to the wiki.

And why:
query.set("hl.fragsize", 1);

That means a fragment could only be 1 char - again, I'd try the default
(take out the param), and adjust from there.
(wiki says the default is 100).

Let us know how it goes.

--
- Mark

http://www.lucidimagination.com



Bruno wrote:

  

 Hi guys.
 I new at using highlighting, so probably I'm making some stupid mistake,
however I'm not founding anything wrong.
 I use highlighting from a query withing a EmbeddedSolrServer, and within
the query I've set parameters necessary for enabling highlighting. Attached,
follows my schema and solrconfig.xml , and down below follows the Java code.
Content from the SolrDocumentList is not highlighted.

EmbeddedSolrServer server = SolrServerManager./getServerEv/();
String queryString = filter;
SolrQuery query =

*new* SolrQuery();

query.setQuery(queryString);
query.setHighlight(*true*);
query.addHighlightField(/LOG_FIELD/);
query.setHighlightSimplePost("");
query.setHighlightSimplePre("");
query.set("hl.usePhraseHighlighter", *true*);
query.set("hl.highlightMultiTerm", *true*);
query.set("hl.snippets", 100);
query.set("hl.fragsize", 1);
query.set("hl.mergeContiguous", *false*);
query.set("hl.requireFieldMatch", *false*);
query.set("hl.maxAnalyzedChars", -1);

query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
1));
query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
1));
query.setIncludeScore(*true*);
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

--
Bruno Morelli Vargas  Mail: brun...@gmail.com 
Msn: brun...@hotmail.com 
Icq: 165055101
Skype: morellibmv





  

--
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Erik Hatcher

Note that highlighting is NOT part of the document list returned.   
It's in an additional NamedList section of the response (with  
name="highlighting")


Erik

On Jun 18, 2009, at 1:22 PM, Bruno wrote:


Hi guys.

I new at using highlighting, so probably I'm making some stupid  
mistake, however I'm not founding anything wrong.


I use highlighting from a query withing a EmbeddedSolrServer, and  
within the query I've set parameters necessary for enabling  
highlighting. Attached, follows my schema and solrconfig.xml , and  
down below follows the Java code. Content from the SolrDocumentList  
is not highlighted.


EmbeddedSolrServer server = SolrServerManager.getServerEv();
String queryString = filter;
SolrQuery query =

new SolrQuery();

query.setQuery(queryString);
query.setHighlight(true);
query.addHighlightField(LOG_FIELD);
query.setHighlightSimplePost("");
query.setHighlightSimplePre("");
query.set("hl.usePhraseHighlighter", true);
query.set("hl.highlightMultiTerm", true);
query.set("hl.snippets", 100);
query.set("hl.fragsize", 1);
query.set("hl.mergeContiguous", false);
query.set("hl.requireFieldMatch", false);
query.set("hl.maxAnalyzedChars", -1);

query.addSortField(DATE_FIELD, SolrQuery.ORDER.asc);
query 
.setFacetLimit 
(LogUtilProperties 
.getInstance 
().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE, 1));
query 
.setRows 
(LogUtilProperties 
.getInstance 
().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE, 1));

query.setIncludeScore(true);
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

--
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno

Here is the query, search for the term "ipod" on the "log" field
q=log%3Aipod+AND+requestid%3A1029+AND+logfilename%3Apayxdev-1245272062125-USS.log.zip&hl=true&hl.fl=log&hl.fl=message&hl.simple.post=%3Ci%3E&hl.simple.pre=%3C%2Fi%3E&hl.usePhraseHighlighter=true&hl.highlightMultiTerm=true&hl.snippets=100&hl.fragsize=100&hl.mergeContiguous=false&hl.requireFieldMatch=false&hl.maxAnalyzedChars=-1&sort=timestamp+asc&facet.limit=6000&rows=6000&fl=score

On Thu, Jun 18, 2009 at 2:51 PM, Mark Miller  wrote:

> Nothing off the top of my head ...
>
> I can play around with some of the solrj unit tests a bit later and perhaps
> see if I can dig anything up.
>
> Note:
> if you expect wildcard/prefix/etc queries to highlight, they will not with
> Solr 1.3.
>
> query.set("hl.highlightMultiTerm", *true*);
>
> The above only applies to solr 1.4.
> So if your query is just a wildcard ...
>
> What is your query, by the way?
>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Bruno wrote:
>
>> Couple of things I've forgot to mention:
>>
>> Solr Version: 1.3
>> Enviroment: Websphere
>>
>> On Thu, Jun 18, 2009 at 2:34 PM, Bruno  wrote:
>>
>>
>>
>>> I've tried with default values and didn't work either.
>>>
>>>
>>> On Thu, Jun 18, 2009 at 2:31 PM, Mark Miller >> >wrote:
>>>
>>>
>>>
 Why do you have:
 query.set("hl.maxAnalyzedChars", -1);

 Have you tried using the default? Unless -1 is an undoc'd feature, this
 means you wouldnt get anything back! This should normally be a fairly
 hefty
 value and defaults to 51200, according to the wiki.

 And why:
 query.set("hl.fragsize", 1);

 That means a fragment could only be 1 char - again, I'd try the default
 (take out the param), and adjust from there.
 (wiki says the default is 100).

 Let us know how it goes.

 --
 - Mark

 http://www.lucidimagination.com



 Bruno wrote:



>  Hi guys.
>  I new at using highlighting, so probably I'm making some stupid
> mistake,
> however I'm not founding anything wrong.
>  I use highlighting from a query withing a EmbeddedSolrServer, and
> within
> the query I've set parameters necessary for enabling highlighting.
> Attached,
> follows my schema and solrconfig.xml , and down below follows the Java
> code.
> Content from the SolrDocumentList is not highlighted.
>
> EmbeddedSolrServer server = SolrServerManager./getServerEv/();
> String queryString = filter;
> SolrQuery query =
>
> *new* SolrQuery();
>
> query.setQuery(queryString);
> query.setHighlight(*true*);
> query.addHighlightField(/LOG_FIELD/);
> query.setHighlightSimplePost("");
> query.setHighlightSimplePre("");
> query.set("hl.usePhraseHighlighter", *true*);
> query.set("hl.highlightMultiTerm", *true*);
> query.set("hl.snippets", 100);
> query.set("hl.fragsize", 1);
> query.set("hl.mergeContiguous", *false*);
> query.set("hl.requireFieldMatch", *false*);
> query.set("hl.maxAnalyzedChars", -1);
>
> query.addSortField(/DATE_FIELD/, SolrQuery.ORDER./asc/);
>
> query.setFacetLimit(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
> 1));
>
> query.setRows(LogUtilProperties./getInstance/().getProperty(LogUtilProperties./LOGEVENT_SEARCH_RESULT_SIZE/,
> 1));
> query.setIncludeScore(*true*);
> QueryResponse rsp = server.query(query);
> SolrDocumentList docs = rsp.getResults();
>
> --
> Bruno Morelli Vargas  Mail: brun...@gmail.com  brun...@gmail.com>
> Msn: brun...@hotmail.com 
> Icq: 165055101
> Skype: morellibmv
>
>
>
>



>>> --
>>> Bruno Morelli Vargas
>>> Mail: brun...@gmail.com
>>> Msn: brun...@hotmail.com
>>> Icq: 165055101
>>> Skype: morellibmv
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno

I've checked the NamedList you told me about, but it contains only one
highlighted doc, when there I have more docs that sould be highlighted.

On Thu, Jun 18, 2009 at 3:03 PM, Erik Hatcher wrote:

> Note that highlighting is NOT part of the document list returned.  It's in
> an additional NamedList section of the response (with name="highlighting")
>
>Erik
>
>
> On Jun 18, 2009, at 1:22 PM, Bruno wrote:
>
>  Hi guys.
>>
>> I new at using highlighting, so probably I'm making some stupid mistake,
>> however I'm not founding anything wrong.
>>
>> I use highlighting from a query withing a EmbeddedSolrServer, and within
>> the query I've set parameters necessary for enabling highlighting. Attached,
>> follows my schema and solrconfig.xml , and down below follows the Java code.
>> Content from the SolrDocumentList is not highlighted.
>>
>> EmbeddedSolrServer server = SolrServerManager.getServerEv();
>> String queryString = filter;
>> SolrQuery query =
>>
>> new SolrQuery();
>>
>> query.setQuery(queryString);
>> query.setHighlight(true);
>> query.addHighlightField(LOG_FIELD);
>> query.setHighlightSimplePost("");
>> query.setHighlightSimplePre("");
>> query.set("hl.usePhraseHighlighter", true);
>> query.set("hl.highlightMultiTerm", true);
>> query.set("hl.snippets", 100);
>> query.set("hl.fragsize", 1);
>> query.set("hl.mergeContiguous", false);
>> query.set("hl.requireFieldMatch", false);
>> query.set("hl.maxAnalyzedChars", -1);
>>
>> query.addSortField(DATE_FIELD, SolrQuery.ORDER.asc);
>> query.setFacetLimit(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>> 1));
>> query.setRows(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>> 1));
>> query.setIncludeScore(true);
>> QueryResponse rsp = server.query(query);
>> SolrDocumentList docs = rsp.getResults();
>>
>> --
>> Bruno Morelli Vargas
>> Mail: brun...@gmail.com
>> Msn: brun...@hotmail.com
>> Icq: 165055101
>> Skype: morellibmv
>>
>> 
>>
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

RE: FilterCache issue

2009-06-18 Thread Manepalli, Kalyan

Mark,
Where do we specify the method? fieldCache or otherwise

Thanks,
Kalyan Manepalli

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Thursday, June 18, 2009 12:22 PM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

Maybe he is not using the FieldCache method?

Yonik Seeley wrote:
> On Thu, Jun 18, 2009 at 12:19 PM, Manepalli,
> Kalyan wrote:
>
>> The fields are defined as single valued and they are non tokenized for.
>> I am using solr 1.3 waiting for release of solr 1.4.
>>
>
> Then the filterCache won't be used for faceting, just for filters.
> You should be able to verify this by looking at how the cache stats
> change for a single faceting request.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
>
>> Thanks,
>> Kalyan Manepalli
>> -Original Message-
>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>> Sent: Thursday, June 18, 2009 10:15 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: FilterCache issue
>>
>> On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
>> Kalyan wrote:
>>
>>> I am faceting on the single values only.
>>>
>> You may have only added a single value to each field, but is the field
>> defined to be single valued or multi valued?
>>
>> Also, what version of Solr are you using?
>>
>>
>>


--
- Mark

http://www.lucidimagination.com

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Bruno

Just figured out what happened... It's necessary for the schema to have a
uniqueKey set, otherwise, highlighting will have one or less entries, as the
map's key is the doc uniqueKey, so on debuggin I figured out that the
QueryResponse tries to put all highlighted results in a map with null key...
at end, putting tons of entries all with null key will result on a
one-entry-only map.

Thanks for the help guys.

On Thu, Jun 18, 2009 at 3:17 PM, Bruno  wrote:

> I've checked the NamedList you told me about, but it contains only one
> highlighted doc, when there I have more docs that sould be highlighted.
>
>
> On Thu, Jun 18, 2009 at 3:03 PM, Erik Hatcher 
> wrote:
>
>> Note that highlighting is NOT part of the document list returned.  It's in
>> an additional NamedList section of the response (with name="highlighting")
>>
>>Erik
>>
>>
>> On Jun 18, 2009, at 1:22 PM, Bruno wrote:
>>
>>  Hi guys.
>>>
>>> I new at using highlighting, so probably I'm making some stupid mistake,
>>> however I'm not founding anything wrong.
>>>
>>> I use highlighting from a query withing a EmbeddedSolrServer, and within
>>> the query I've set parameters necessary for enabling highlighting. Attached,
>>> follows my schema and solrconfig.xml , and down below follows the Java code.
>>> Content from the SolrDocumentList is not highlighted.
>>>
>>> EmbeddedSolrServer server = SolrServerManager.getServerEv();
>>> String queryString = filter;
>>> SolrQuery query =
>>>
>>> new SolrQuery();
>>>
>>> query.setQuery(queryString);
>>> query.setHighlight(true);
>>> query.addHighlightField(LOG_FIELD);
>>> query.setHighlightSimplePost("");
>>> query.setHighlightSimplePre("");
>>> query.set("hl.usePhraseHighlighter", true);
>>> query.set("hl.highlightMultiTerm", true);
>>> query.set("hl.snippets", 100);
>>> query.set("hl.fragsize", 1);
>>> query.set("hl.mergeContiguous", false);
>>> query.set("hl.requireFieldMatch", false);
>>> query.set("hl.maxAnalyzedChars", -1);
>>>
>>> query.addSortField(DATE_FIELD, SolrQuery.ORDER.asc);
>>> query.setFacetLimit(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>>> 1));
>>> query.setRows(LogUtilProperties.getInstance().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,
>>> 1));
>>> query.setIncludeScore(true);
>>> QueryResponse rsp = server.query(query);
>>> SolrDocumentList docs = rsp.getResults();
>>>
>>> --
>>> Bruno Morelli Vargas
>>> Mail: brun...@gmail.com
>>> Msn: brun...@hotmail.com
>>> Icq: 165055101
>>> Skype: morellibmv
>>>
>>> 
>>>
>>
>>
>
>
> --
> Bruno Morelli Vargas
> Mail: brun...@gmail.com
> Msn: brun...@hotmail.com
> Icq: 165055101
> Skype: morellibmv
>
>


-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: SolrJ: Highlighting not Working

2009-06-18 Thread Erik Hatcher

And unfortunately, that isn't the best approach for highlighting to  
take - a uniqueKey shouldn't be required for highlighting.  I've yet  
to see a real-world deployment of Solr that did not have a uniqueKey  
field, but there's no reason Solr should make that assumption.


Erik

On Jun 18, 2009, at 3:24 PM, Bruno wrote:

Just figured out what happened... It's necessary for the schema to  
have a
uniqueKey set, otherwise, highlighting will have one or less  
entries, as the

map's key is the doc uniqueKey, so on debuggin I figured out that the
QueryResponse tries to put all highlighted results in a map with  
null key...

at end, putting tons of entries all with null key will result on a
one-entry-only map.

Thanks for the help guys.

On Thu, Jun 18, 2009 at 3:17 PM, Bruno  wrote:

I've checked the NamedList you told me about, but it contains only  
one
highlighted doc, when there I have more docs that sould be  
highlighted.



On Thu, Jun 18, 2009 at 3:03 PM, Erik Hatcher >wrote:


Note that highlighting is NOT part of the document list returned.   
It's in
an additional NamedList section of the response (with  
name="highlighting")


  Erik


On Jun 18, 2009, at 1:22 PM, Bruno wrote:

Hi guys.


I new at using highlighting, so probably I'm making some stupid  
mistake,

however I'm not founding anything wrong.

I use highlighting from a query withing a EmbeddedSolrServer, and  
within
the query I've set parameters necessary for enabling  
highlighting. Attached,
follows my schema and solrconfig.xml , and down below follows the  
Java code.

Content from the SolrDocumentList is not highlighted.

EmbeddedSolrServer server = SolrServerManager.getServerEv();
String queryString = filter;
SolrQuery query =

new SolrQuery();

query.setQuery(queryString);
query.setHighlight(true);
query.addHighlightField(LOG_FIELD);
query.setHighlightSimplePost("");
query.setHighlightSimplePre("");
query.set("hl.usePhraseHighlighter", true);
query.set("hl.highlightMultiTerm", true);
query.set("hl.snippets", 100);
query.set("hl.fragsize", 1);
query.set("hl.mergeContiguous", false);
query.set("hl.requireFieldMatch", false);
query.set("hl.maxAnalyzedChars", -1);

query.addSortField(DATE_FIELD, SolrQuery.ORDER.asc);
query 
.setFacetLimit 
(LogUtilProperties 
.getInstance 
().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,

1));
query 
.setRows 
(LogUtilProperties 
.getInstance 
().getProperty(LogUtilProperties.LOGEVENT_SEARCH_RESULT_SIZE,

1));
query.setIncludeScore(true);
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

--
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv









--
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv





--
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv

Re: FilterCache issue

2009-06-18 Thread Mark Miller


Its the facet.method param:

http://wiki.apache.org/solr/SimpleFacetParameters#head-7574cb658563f6de3ad54cd99a793cd73d593caa

--
- Mark

http://www.lucidimagination.com



Manepalli, Kalyan wrote:

Mark,
Where do we specify the method? fieldCache or otherwise

Thanks,
Kalyan Manepalli

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Thursday, June 18, 2009 12:22 PM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

Maybe he is not using the FieldCache method?

Yonik Seeley wrote:
  

On Thu, Jun 18, 2009 at 12:19 PM, Manepalli,
Kalyan wrote:



The fields are defined as single valued and they are non tokenized for.
I am using solr 1.3 waiting for release of solr 1.4.

  

Then the filterCache won't be used for faceting, just for filters.
You should be able to verify this by looking at how the cache stats
change for a single faceting request.

-Yonik
http://www.lucidimagination.com






Thanks,
Kalyan Manepalli
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Thursday, June 18, 2009 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
Kalyan wrote:

  

I am faceting on the single values only.



You may have only added a single value to each field, but is the field
defined to be single valued or multi valued?

Also, what version of Solr are you using?



  



--
- Mark

http://www.lucidimagination.com

RE: FilterCache issue

2009-06-18 Thread Manepalli, Kalyan

Got that. Since I am still using Solr 1.3, the defaults should work fine, field 
cache for single value and enum for multi-valued fields.

Thanks,
Kalyan Manepalli

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Thursday, June 18, 2009 3:01 PM
To: solr-user@lucene.apache.org
Subject: Re: FilterCache issue

Its the facet.method param:

http://wiki.apache.org/solr/SimpleFacetParameters#head-7574cb658563f6de3ad54cd99a793cd73d593caa

--
- Mark

http://www.lucidimagination.com



Manepalli, Kalyan wrote:
> Mark,
> Where do we specify the method? fieldCache or otherwise
>
> Thanks,
> Kalyan Manepalli
>
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Thursday, June 18, 2009 12:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: FilterCache issue
>
> Maybe he is not using the FieldCache method?
>
> Yonik Seeley wrote:
>
>> On Thu, Jun 18, 2009 at 12:19 PM, Manepalli,
>> Kalyan wrote:
>>
>>
>>> The fields are defined as single valued and they are non tokenized for.
>>> I am using solr 1.3 waiting for release of solr 1.4.
>>>
>>>
>> Then the filterCache won't be used for faceting, just for filters.
>> You should be able to verify this by looking at how the cache stats
>> change for a single faceting request.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>>
>>
>>> Thanks,
>>> Kalyan Manepalli
>>> -Original Message-
>>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>>> Sent: Thursday, June 18, 2009 10:15 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: FilterCache issue
>>>
>>> On Thu, Jun 18, 2009 at 10:59 AM, Manepalli,
>>> Kalyan wrote:
>>>
>>>
 I am faceting on the single values only.


>>> You may have only added a single value to each field, but is the field
>>> defined to be single valued or multi valued?
>>>
>>> Also, what version of Solr are you using?
>>>
>>>
>>>
>>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

multi-core, autocommit and resource use

2009-06-18 Thread Peter Wolanin

A question for anyone familiar with the details of the time-based
autocommit mechanism in Solr:

if I am running several core on the same server and send updates to
each core at the same time, what happens?   If all the cores have
their autocommit time run out at the same time, will every core try to
conduct operations (e.g. opening new searchers, merges, other things?)
at the same time and thus cause resource issues?  I think I understand
that all the pending changes are on disk already, so the "commit" that
happens when the time is up is really just opening new searchers that
include the added documents.

Thanks,

Peter

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Destemming snafu

2009-06-18 Thread Stephen Weiss


Hi,

I've hit a bit of a problem with destemming and could use some advice.

Right now there is a word in the index called "Stylesight" and another  
word "Stylesightings", which was just added.  When users search for  
"Stylesightings", the client really only wants them to get results  
that match "Stylesightings" and not "Stylesight", as they are two  
[relatively] unrelated things.  However, I'm guessing because of the  
destemmer, "Stylesightings" becomes "Stylesight" internally... which  
results in the "wrong" behavior.


I really don't want to turn off the destemmer, that's like killing an  
ant with a nuke.  I was thinking, perhaps, since we use both index-  
and query-time synonyms, I could make a synonym like this:


"Stylesightings" =>  "xlkje0r923jjfsdf"

or some other random string of un-destemmable junk, that might work,  
but I'm not sure and reindexing all the affected documents will take  
quite some time so it would be good to know in advance if this is even  
a good idea.


Of course, if there's another, better idea, I'd be very open to that  
too.


Thanks for any suggestions!

--
Steve

Re: Destemming snafu

2009-06-18 Thread Brendan Grainger

Are you using Porter Stemming? If so I think you can just specify your  
word in the protwords.txt file (or whatever you've called it).


Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
and the example config for the Porter Stemmer:


	
protected="protwords.txt" /> 

 

HTH
Brendan

On Jun 18, 2009, at 4:38 PM, Stephen Weiss wrote:


Hi,

I've hit a bit of a problem with destemming and could use some advice.

Right now there is a word in the index called "Stylesight" and  
another word "Stylesightings", which was just added.  When users  
search for "Stylesightings", the client really only wants them to  
get results that match "Stylesightings" and not "Stylesight", as  
they are two [relatively] unrelated things.  However, I'm guessing  
because of the destemmer, "Stylesightings" becomes "Stylesight"  
internally... which results in the "wrong" behavior.


I really don't want to turn off the destemmer, that's like killing  
an ant with a nuke.  I was thinking, perhaps, since we use both  
index- and query-time synonyms, I could make a synonym like this:


"Stylesightings" =>  "xlkje0r923jjfsdf"

or some other random string of un-destemmable junk, that might work,  
but I'm not sure and reindexing all the affected documents will take  
quite some time so it would be good to know in advance if this is  
even a good idea.


Of course, if there's another, better idea, I'd be very open to that  
too.


Thanks for any suggestions!

--
Steve

Re: Destemming snafu

2009-06-18 Thread Stephen Weiss

Yes, that's exactly what I needed.  I don't know how I missed that.   
Thank you!


--
Steve

On Jun 18, 2009, at 4:49 PM, Brendan Grainger wrote:

Are you using Porter Stemming? If so I think you can just specify  
your word in the protwords.txt file (or whatever you've called it).


Check out http://wiki.apache.org/solr/ 
AnalyzersTokenizersTokenFilters and the example config for the  
Porter Stemmer:


	
protected="protwords.txt" /> 



HTH
Brendan

On Jun 18, 2009, at 4:38 PM, Stephen Weiss wrote:


Hi,

I've hit a bit of a problem with destemming and could use some  
advice.


Right now there is a word in the index called "Stylesight" and  
another word "Stylesightings", which was just added.  When users  
search for "Stylesightings", the client really only wants them to  
get results that match "Stylesightings" and not "Stylesight", as  
they are two [relatively] unrelated things.  However, I'm guessing  
because of the destemmer, "Stylesightings" becomes "Stylesight"  
internally... which results in the "wrong" behavior.


I really don't want to turn off the destemmer, that's like killing  
an ant with a nuke.  I was thinking, perhaps, since we use both  
index- and query-time synonyms, I could make a synonym like this:


"Stylesightings" =>  "xlkje0r923jjfsdf"

or some other random string of un-destemmable junk, that might  
work, but I'm not sure and reindexing all the affected documents  
will take quite some time so it would be good to know in advance if  
this is even a good idea.


Of course, if there's another, better idea, I'd be very open to  
that too.


Thanks for any suggestions!

--
Steve

Re: Searching across multivalued fields

2009-06-18 Thread MilkDud



Michael Ludwig-4 wrote:
> 
> MilkDud schrieb:
> What do you expect the user to enter?
> 
> * "dream theater innocence faded" - certainly wrong
> * dream theater "innocence faded" - much better
> 
> Most likely they would just enter dream theater innocence faded, no
> quotes.  Without any quotes around any fields, which is a large cause of
> the problem.  Now if i index on the track level, than all those words
> would have to show up in just one track (including the album, artist, and
> track name), which is expected.  If i index on the album level however,
> now, those words just need to show up anywhere throughout the entire
> album.
> 
> So, while it will match dream theater - innocence faded, it will also
> match an album that has all the words dream theater innocence faded
> mentioned across all tracks, which for small queries can be very common.
> 
> Basically, I'm looking for a way to say match all the words in the search
> query across the artist, album, and track name, but only looking at one
> track (a multivalued field) at a time given a query without any quotes. 
> Does that make sense at all?
> 
> That is why I was leaning towards the track level index, such as:
> id, artist, album, track (all single valued)
> 
> as it does solve that problem, but then I have to deal with duplicate data
> being put in the artist/album fields (and a bunch of other fields).  Also,
> indexing on the album level poses further complications given that I also
> store the location to a track preview clip next to each track and keeping
> track of sets of data like that in solr is not really feasible.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-across-multivalued-fields-tp24056297p24099668.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Solr 1.4 really work nicely on Jboss 4?

2009-06-18 Thread Giovanni De Stefano

Hello Daryl,

thank you very much for sharing your experience with me :-)

My software Architect reported some exceptions thrown when accessing some
Admin JSPs using Solr 1.4, jboss 4.0.1 SP3 and tomcat, java jdk 1.5.0_06.

I will forward the info you gave me.

Thank you very much.

Giovanni



On Thu, Jun 18, 2009 at 4:44 PM, Development Team wrote:

> Hi Giovanni,
>
> Solr 1.4 does work fine in JBoss (all of the features, including all of
> the admin pages). For example, I am running it in JBoss 4.0.5.GA on
> JDK
> 1.5.0_18 without problems. I am also using Jetty instead of Tomcat, however
> instructions for getting it to work in JBoss with Tomcat can be found here:
>  http://wiki.apache.org/solr/SolrJBoss  It should work fine on JBoss
> 4.0.1.
>
> - Daryl.
>
>
> On Thu, Jun 18, 2009 at 8:57 AM, Giovanni De Stefano <
> giovanni.destef...@gmail.com> wrote:
>
> > Hello all,
> >
> > I have a simple question :-)
> >
> > In my project it is mandatory to use Jboss 4.0.1 SP3 and Java
> 1.5.0_06/08.
> > The software relies on Solr 1.4.
> >
> > Now, I am aware that some JSP Admin pages will not be displayed due to
> some
> > Java5/6 dependency but this is not a problem because rewriting some of
> the
> > JSPs it is possible to have everything up and running.
> >
> > The real question is: is anybody aware of any feature that might not work
> > when deploying the solr based software in Jboss 4?
> >
> > I look forward to hearing your experience.
> >
> > Cheers,
> > Giovanni
> >
>

Re: Indexing tempary data on lucene

2009-06-18 Thread Chris Hostetter


building a temporary index would certainly work, but it's a question of 
how efficient it would be (ie: how many users do you have, how often do 
they log in, how long does it take to build a typical index, how many 
concurrent users will you have, etc...)

one solution i've seen to a problem like this was to have a custom 
SearchComponent that did an external lookup to get the list of freinds, 
then for each friend did a search to get the DocSet of all their documents 
(letting them get cached in the filterCache) and computed the union of all 
those DocSets, and added that union as a filter for use by the 
QueryComponent.

not sure if that type of approach will scale well to the number of users 
you are dealing with.

: Date: Fri, 24 Apr 2009 02:22:37 -0700 (PDT)
: Subject: Indexing tempary data on lucene
: 
: 
: I have a list of public profiles of my site user's on solr index. There is
: also a community around them, which is currently not their in Index.
: 
: While searching, I have to give an option to search only my community
: (friends and friends of friends). I could do it from data base query or
: storing connection graph in memory but here I loose power of Solr Analyzers,
: tokenizers and filters. 
: 
: Alternatively, I am thinking to store this relation temp in some other Solr
: instance (running on a separate machine) and use it for search. I.e create
: this index async when user logs in and destroy when user logs out.
: 
: So when user searches for a profile the application will merge the results
: from two indexes and returns unique users.
: 
: Is this a practical/scalable solution? If yes, what performance
: consideration, I should look for this new solr instance? For merging should
: I built an application over solr or solr provides any way of merging results
: from multiple indexes?
: 
: Thanks,
: Amit
: 
: -- 
: View this message in context: 
http://www.nabble.com/Indexing-tempary-data-on-lucene-tp23212838p23212838.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 



-Hoss

no .war with ubuntu release ?

2009-06-18 Thread Jonathan Vanasco


i'm a bit confused.  hoping someone can help.

solr is awesome on my macbook for development.

i've been fighting with getting solr-jetty running on my ubuntu box  
all day.


after countless searching, it seems that there is no .war file in the  
distro


should this be the case?

the actual files are :

http://packages.ubuntu.com/hardy/all/solr-common/filelist
http://packages.ubuntu.com/hardy/all/solr-jetty/filelist

as you can see, there is no .war

there are several .jars in this directory
/usr/share/solr/WEB-INF/lib/

can anyone give me a suggestion ? i haven't touched java / jetty /  
tomcat / whatever in at least a good 8 years and am lost.

Re: multi-core, autocommit and resource use

2009-06-18 Thread Yonik Seeley

On Thu, Jun 18, 2009 at 4:27 PM, Peter Wolanin wrote:
>  I think I understand
> that all the pending changes are on disk already, so the "commit" that
> happens when the time is up is really just opening new searchers that
> include the added documents.

Only some of the pending changes may be on disk - a solr level commit
involves closing the IndexWriter which flushes everything to disk, and
then a new IndexReader is opened to read those changes.

This will be improved in future versions such that an IndexReader can
be opened *before* all of the changes have been flushed to disk (work
on near-real-time indexing/searching in Lucene is progressing).

-Yonik
http://www.lucidimagination.com

Re: Solr Jetty confusion

2009-06-18 Thread pof


My problem is that my project doesn't compile and I have know way of knowing
if I'm on the right track code wise. There just isn't any comprehensive
guide out there for having a solr/jetty app.


Development Team wrote:
> 
> Hey,
>  So... I'm assuming your problem is that you're having trouble
> deploying
> Solr in Jetty? Or is your problem that it's deploying just fine but your
> code throws an exception when you try to run it?
>  I am running Solr in Jetty, and I just copied the war into the
> webapps
> directory and it worked. It was accessible under /solr, and it was
> accessible under the port that Jetty has as its HTTP listener (which is
> probably 8080 by default, but probably won't be 8983). To specify the
> solr-home I use a Java system property (instead of the JNDI way) since I
> already have other necessary system properties for my apps. So if your
> problem turns out to be with the JNDI, sorry I won't be of much help.
>  Hope that helps...
> 
> - Daryl.
> 
> 
> On Thu, Jun 18, 2009 at 2:44 AM, pof  wrote:
> 
>>
>> Hi, I am currently trying to write a Jetty embedded java app that
>> implements
>> SOLR and uses SOLRJ by excepting posts telling it to do a batch index, or
>> a
>> deletion or what have you. At this point I am completely lost trying to
>> follow http://wiki.apache.org/solr/SolrJetty . In my constructor I am
>> doing
>> the following call:
>>
>> Server server = new Server();
>> XmlConfiguration configuration = new XmlConfiguration(new
>> FileInputStream("solrjetty.xml"));
>>
>> My xml has two calls, an addConnector to configure the port etc. and the
>> addWebApplication as specified on the solr wiki. When running the app I
>> get
>> this:
>>
>> Exception in thread "main" java.lang.IllegalStateException: No Method:
>> > name="addWebApplication">/solr/*/webapps/solr.war> name="extractWAR">true>
>> name="defaultsDescriptor">org/mortbay/jetty/servlet/webdefault.xml> name="addEnvEntry">/solr/home> type="String">/solr/home on class
>> org.mortbay.jetty.Server
>>
>> Can anyone point me in the right direction? Thanks.
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-Jetty-confusion-tp24087264p24087264.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr-Jetty-confusion-tp24087264p24099696.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: no .war with ubuntu release ?

2009-06-18 Thread Phil Hagelberg

On Thu, Jun 18, 2009 at 4:00 PM, Jonathan Vanasco wrote:
> can anyone give me a suggestion ? i haven't touched java / jetty / tomcat /
> whatever in at least a good 8 years and am lost.

I spent a lot of time trying to get this working too. My conclusion
was simply that the .deb packages for Solr are unmaintained and have
fallen victim to bitrot. You'll have a much easier time getting it
from a maven repository or just downloading a binary release.

I wish that it would be removed from the Ubuntu repositories though if
it isn't fixed as its presence there seems to cause more harm than
good.

-Phil

Re: multi-core, autocommit and resource use

2009-06-18 Thread Peter Wolanin

So for now would it make sense to spread out the autocommit times for
the different cores?

Thanks.

-Peter

On Thu, Jun 18, 2009 at 7:07 PM, Yonik Seeley wrote:
> On Thu, Jun 18, 2009 at 4:27 PM, Peter Wolanin 
> wrote:
>>  I think I understand
>> that all the pending changes are on disk already, so the "commit" that
>> happens when the time is up is really just opening new searchers that
>> include the added documents.
>
> Only some of the pending changes may be on disk - a solr level commit
> involves closing the IndexWriter which flushes everything to disk, and
> then a new IndexReader is opened to read those changes.
>
> This will be improved in future versions such that an IndexReader can
> be opened *before* all of the changes have been flushed to disk (work
> on near-real-time indexing/searching in Lucene is progressing).
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: multi-core, autocommit and resource use

2009-06-18 Thread Yonik Seeley

On Thu, Jun 18, 2009 at 8:30 PM, Peter Wolanin wrote:
> So for now would it make sense to spread out the autocommit times for
> the different cores?

Sure.
You might also consider using commitWithin (solr 1.4) when updating
the index - then you could either send the updates at slightly
different times, or add a random amount of time to the commitWithin
for each update.

-Yonik
http://www.lucidimagination.com

> Thanks.
>
> -Peter
>
> On Thu, Jun 18, 2009 at 7:07 PM, Yonik Seeley 
> wrote:
>> On Thu, Jun 18, 2009 at 4:27 PM, Peter Wolanin 
>> wrote:
>>>  I think I understand
>>> that all the pending changes are on disk already, so the "commit" that
>>> happens when the time is up is really just opening new searchers that
>>> include the added documents.
>>
>> Only some of the pending changes may be on disk - a solr level commit
>> involves closing the IndexWriter which flushes everything to disk, and
>> then a new IndexReader is opened to read those changes.
>>
>> This will be improved in future versions such that an IndexReader can
>> be opened *before* all of the changes have been flushed to disk (work
>> on near-real-time indexing/searching in Lucene is progressing).
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wola...@acquia.com
>

Re: Solr spring application context error

2009-06-18 Thread Chris Hostetter

: Date: Fri, 08 May 2009 08:27:58 -0400
: From: Mark Miller
: Subject: Re: Solr spring application context error
: 
: I've run into this in the past as well. Its fairly annoying. Anyone know why
: the limitation? Why aren't we passing the ClassLoader thats loading Solr
: classes as the parent to the lib dir plugin classloader?

FWIW: We do.  

I'm not sure what exactly the problem might have been in this thread 
(class loaders are nightmares, and spring doesn't make life any easier 
unless *everything* involved is using spring) but SolrResourceLoader uses 
Thread.currentThread().getContextClassLoader() to set the parent class 
loader unless an explict parent class loader was specified. (the webapp 
loader is the parent for the solr shardLib loader, which is the parent for 
the individual core loaders)

at least ... that's the way it use to work, and skimming the code it 
doesn't look like it's been broken (in an obvious way)

-Hoss

Re: Solr spring application context error

2009-06-18 Thread Mark Miller


Chris Hostetter wrote:

: Date: Fri, 08 May 2009 08:27:58 -0400
: From: Mark Miller
: Subject: Re: Solr spring application context error
: 
: I've run into this in the past as well. Its fairly annoying. Anyone know why

: the limitation? Why aren't we passing the ClassLoader thats loading Solr
: classes as the parent to the lib dir plugin classloader?

FWIW: We do.  

I'm not sure what exactly the problem might have been in this thread 
(class loaders are nightmares, and spring doesn't make life any easier 
unless *everything* involved is using spring) but SolrResourceLoader uses 
Thread.currentThread().getContextClassLoader() to set the parent class 
loader unless an explict parent class loader was specified. (the webapp 
loader is the parent for the solr shardLib loader, which is the parent for 
the individual core loaders)


at least ... that's the way it use to work, and skimming the code it 
doesn't look like it's been broken (in an obvious way)


-Hoss

  
Yeah, I actually looked at the code and saw that later. I was forgetting 
the issue that bugged me (and confusing it with the trouble this guy was 
having) - which is that plugins in the solr/lib folder cannot load from 
other jars in that folder. I think that was the actual issue.


- Mark

--
- Mark

http://www.lucidimagination.com

Re: Solr spring application context error

2009-06-18 Thread Chris Hostetter


: Yeah, I actually looked at the code and saw that later. I was forgetting the
: issue that bugged me (and confusing it with the trouble this guy was having) -
: which is that plugins in the solr/lib folder cannot load from other jars in
: that folder. I think that was the actual issue.

WTF?!? ... seriously?  

I don't think i've ever tried it, but if that's really true then it seems 
like it must be a bug in URLClassLoader ... we're iterating over the file 
list to generate a URL[] before calling URLClassLoader.newInstance.


-Hoss

Re: about boosting queries...

2009-06-18 Thread Chris Hostetter


Marc: I know it's been a while since you asked this question, but i didn't 
see any reply ... in general the problem is that a "low" boost is stil la 
boost, it can only improve the score of documents that match.

one way to fake a "negative boost" is to give a high boost to everything 
that does *not* match.  This should do what you want...

bq=(*:* -field_a:54^1)

...I can't remember if "bq" supports pure negative queries yet, if it does 
then you can simplify that to bq=-field_a:54^1


On Mon, 11 May 2009, Marc Sturlese wrote:

: Hey there,
: I would like to give very low boost to the docs that match field_a = 54.
: I have tried
: 
: field_a:54^0.1
: 
: but it's not working. In the opposite case, I mean to give hight boost
: doing:
: 
: field_a:54^1
: 
: it works perfect. I supose it is because I do the search in 6 fields and a
: summation is happening so.. even if I am seting boost to 0.1 the sum
: with other fields boost makes the bq to almost not take effect (and negative
: boost is not allowed). Is that the reason? Any clue how could I reach my
: goal?
: 
: Thanks in advance
: -- 
: View this message in context: 
http://www.nabble.com/about-boosting-queries...-tp23484208p23484208.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 



-Hoss

Re: Solr spring application context error

2009-06-18 Thread Mark Miller


Chris Hostetter wrote:

: Yeah, I actually looked at the code and saw that later. I was forgetting the
: issue that bugged me (and confusing it with the trouble this guy was having) -
: which is that plugins in the solr/lib folder cannot load from other jars in
: that folder. I think that was the actual issue.

WTF?!? ... seriously?  

I don't think i've ever tried it, but if that's really true then it seems 
like it must be a bug in URLClassLoader ... we're iterating over the file 
list to generate a URL[] before calling URLClassLoader.newInstance.



-Hoss

  
I have a fairly strong memory that its true - but its been almost a year 
now. I'll check when I get a chance. I also seem to remember mentioning 
it to Erik and him already knowing of it...


Or my mind is playing tricks on me. I'll check.

--
- Mark

http://www.lucidimagination.com

Re: Solr Jetty confusion

2009-06-18 Thread pof



Development Team wrote:
> 
> To specify the
> solr-home I use a Java system property (instead of the JNDI way) since I
> already have other necessary system properties for my apps.
> 

Could you please give me a concrete example of how you did this? There is no
example code or commandline examples to be found.

Cheers, Brett.

-- 
View this message in context: 
http://www.nabble.com/Solr-Jetty-confusion-tp24087264p24104378.html
Sent from the Solr - User mailing list archive at Nabble.com.

Default AND operator | search on multiple fields

2009-06-18 Thread prerna07


Hi,

I want to perform search with default AND operator on multiple fields.

Below mentioned query works fine with AND operator and one field :
?q={!lucene q.op=AND df=prdMainTitle_product_s}Ladybird Shrinkwrap

However as soon as I add two fields, it starts giving  me records which have
two terms "LadyBird" and "Shrinkwrap" but in multiple fields, however it
should search only in prdMainTitle_product_s and prdMainSubTitle_product_s

?q={!lucene q.op=AND qf=prdMainTitle_product_s
qf=prdMainSubTitle_product_s}Ladybird Shrinkwrap&qt=dismaxrequest


Please let me know if there are issues with the query above.

Thanks,
Prerna



-- 
View this message in context: 
http://www.nabble.com/Default-AND-operator-%7C-search-on-multiple-fields-tp24106288p24106288.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multi Field AND Search

2009-06-18 Thread saurabhs_iitk


Hi
I have indexed 8 fileds with different boost. Now i have given a
searchstring which consists of a words and phrases. Now i want to do AND
search of that searchString on four fields and show the result based on
boost. For me searchString should occur completely in one of the field and
then the boosts come into picture. I know i can index combination of four
fields in one and then search in that but then the boost will not work.

Regards
Saurabh
-- 
View this message in context: 
http://www.nabble.com/Multi-Field-AND-Search-tp24106434p24106434.html
Sent from the Solr - User mailing list archive at Nabble.com.

66 matches

Mail list logo