RE: custom scorer in Solr

2010-06-14 Thread Fornoville, Tom
I've been investigating this further and I might have found another path
to consider.

Would it be possible to create a custom implementation of a SortField,
comparable to the RandomSortField, to tackle the problem?


I know it is not your standard question but would really appreciate all
feedback and suggestions on this because this is the issue that will
make or break the acceptance of Solr for this client.

Thanks,
Tom

-Original Message-
From: Fornoville, Tom 
Sent: woensdag 9 juni 2010 15:35
To: solr-user@lucene.apache.org
Subject: custom scorer in Solr

Hi all,

 

We are currently working on a proof-of-concept for a client using Solr
and have been able to configure all the features they want except the
scoring.

 

Problem is that they want scores that make results fall in buckets:

*   Bucket 1: exact match on category (score = 4)
*   Bucket 2: exact match on name (score = 3)
*   Bucket 3: partial match on category (score = 2)
*   Bucket 4: partial match on name (score = 1)

 

First thing we did was develop a custom similarity class that would
return the correct score depending on the field and an exact or partial
match.

 

The only problem now is that when a document matches on both the
category and name the scores are added together.

Example: searching for "restaurant" returns documents in the category
restaurant that also have the word restaurant in their name and thus get
a score of 5 (4+1) but they should only get 4.

 

I assume for this to work we would need to develop a custom Scorer class
but we have no clue on how to incorporate this in Solr.

Maybe there is even a simpler solution that we don't know about.

 

All suggestions welcome!

 

Thanks,

Tom



Re: custom scorer in Solr

2010-06-14 Thread Geert-Jan Brits
First of all,

Do you expect every query to return results for all 4 buckets?
i.o.w: say you make a Sortfield that sorts for score 4 first, than 3, 2, 1.
When displaying the first 10 results, is it ok that these documents
potentially all have score 4, and thus only bucket 1 is filled?

If so, I can think of the following out-of-the-box option works: (which I'm
not sure performs enough, but you can easily test it on your data)

following your example create 4 fields:
1. categoryExact - configure anaylzers so that only full matches score,
other
2. categoryPartial - configure so that full and partial match (likely you
have already configured this)
3. nameExact - like 1
4. namepartial - like 2

configure copyfields: 1 --> 2 and 3 --> 4
this way your indexing client can stay the same as it likely is at the
moment.


Now you have 4 fields which scores you have to combine on search-time so
that the evenual scores are [1,4]
Out-of-the-box you can do this with functionqueries.

http://wiki.apache.org/solr/FunctionQuery

I don't have time to write it down exactly, but for each field:
- calc the score of each field (use the Query functionquery (nr 16 in the
wiki) . If score > 0 use the map function to map it to respectively
4,3,2,1.

now for each document you have potentially multiple scores for instance: 4
and 2 if your doc matches exact and partial on category.
- use the max functionquery to only return the highest score --> 4 in this
case.

You have to find out for yourself if this performs though.

Hope that helps,
Geert-Jan


2010/6/14 Fornoville, Tom 

> I've been investigating this further and I might have found another path
> to consider.
>
> Would it be possible to create a custom implementation of a SortField,
> comparable to the RandomSortField, to tackle the problem?
>
>
> I know it is not your standard question but would really appreciate all
> feedback and suggestions on this because this is the issue that will
> make or break the acceptance of Solr for this client.
>
> Thanks,
> Tom
>
> -Original Message-
> From: Fornoville, Tom
> Sent: woensdag 9 juni 2010 15:35
> To: solr-user@lucene.apache.org
> Subject: custom scorer in Solr
>
> Hi all,
>
>
>
> We are currently working on a proof-of-concept for a client using Solr
> and have been able to configure all the features they want except the
> scoring.
>
>
>
> Problem is that they want scores that make results fall in buckets:
>
> *   Bucket 1: exact match on category (score = 4)
> *   Bucket 2: exact match on name (score = 3)
> *   Bucket 3: partial match on category (score = 2)
> *   Bucket 4: partial match on name (score = 1)
>
>
>
> First thing we did was develop a custom similarity class that would
> return the correct score depending on the field and an exact or partial
> match.
>
>
>
> The only problem now is that when a document matches on both the
> category and name the scores are added together.
>
> Example: searching for "restaurant" returns documents in the category
> restaurant that also have the word restaurant in their name and thus get
> a score of 5 (4+1) but they should only get 4.
>
>
>
> I assume for this to work we would need to develop a custom Scorer class
> but we have no clue on how to incorporate this in Solr.
>
> Maybe there is even a simpler solution that we don't know about.
>
>
>
> All suggestions welcome!
>
>
>
> Thanks,
>
> Tom
>
>


Re: custom scorer in Solr

2010-06-14 Thread Geert-Jan Brits
Just to be clear,
this is for the use-case in which it is ok that potentially only 1 bucket
gets filled.

2010/6/14 Geert-Jan Brits 

> First of all,
>
> Do you expect every query to return results for all 4 buckets?
> i.o.w: say you make a Sortfield that sorts for score 4 first, than 3, 2,
> 1.
> When displaying the first 10 results, is it ok that these documents
> potentially all have score 4, and thus only bucket 1 is filled?
>
> If so, I can think of the following out-of-the-box option works: (which I'm
> not sure performs enough, but you can easily test it on your data)
>
> following your example create 4 fields:
> 1. categoryExact - configure anaylzers so that only full matches score,
> other
> 2. categoryPartial - configure so that full and partial match (likely you
> have already configured this)
> 3. nameExact - like 1
> 4. namepartial - like 2
>
> configure copyfields: 1 --> 2 and 3 --> 4
> this way your indexing client can stay the same as it likely is at the
> moment.
>
>
> Now you have 4 fields which scores you have to combine on search-time so
> that the evenual scores are [1,4]
> Out-of-the-box you can do this with functionqueries.
>
> http://wiki.apache.org/solr/FunctionQuery
>
> I don't have time to write it down exactly, but for each field:
> - calc the score of each field (use the Query functionquery (nr 16 in the
> wiki) . If score > 0 use the map function to map it to respectively
> 4,3,2,1.
>
> now for each document you have potentially multiple scores for instance: 4
> and 2 if your doc matches exact and partial on category.
> - use the max functionquery to only return the highest score --> 4 in this
> case.
>
> You have to find out for yourself if this performs though.
>
> Hope that helps,
> Geert-Jan
>
>
> 2010/6/14 Fornoville, Tom 
>
> I've been investigating this further and I might have found another path
>> to consider.
>>
>> Would it be possible to create a custom implementation of a SortField,
>> comparable to the RandomSortField, to tackle the problem?
>>
>>
>> I know it is not your standard question but would really appreciate all
>> feedback and suggestions on this because this is the issue that will
>> make or break the acceptance of Solr for this client.
>>
>> Thanks,
>> Tom
>>
>> -Original Message-
>> From: Fornoville, Tom
>> Sent: woensdag 9 juni 2010 15:35
>> To: solr-user@lucene.apache.org
>> Subject: custom scorer in Solr
>>
>> Hi all,
>>
>>
>>
>> We are currently working on a proof-of-concept for a client using Solr
>> and have been able to configure all the features they want except the
>> scoring.
>>
>>
>>
>> Problem is that they want scores that make results fall in buckets:
>>
>> *   Bucket 1: exact match on category (score = 4)
>> *   Bucket 2: exact match on name (score = 3)
>> *   Bucket 3: partial match on category (score = 2)
>> *   Bucket 4: partial match on name (score = 1)
>>
>>
>>
>> First thing we did was develop a custom similarity class that would
>> return the correct score depending on the field and an exact or partial
>> match.
>>
>>
>>
>> The only problem now is that when a document matches on both the
>> category and name the scores are added together.
>>
>> Example: searching for "restaurant" returns documents in the category
>> restaurant that also have the word restaurant in their name and thus get
>> a score of 5 (4+1) but they should only get 4.
>>
>>
>>
>> I assume for this to work we would need to develop a custom Scorer class
>> but we have no clue on how to incorporate this in Solr.
>>
>> Maybe there is even a simpler solution that we don't know about.
>>
>>
>>
>> All suggestions welcome!
>>
>>
>>
>> Thanks,
>>
>> Tom
>>
>>
>


RE: custom scorer in Solr

2010-06-14 Thread Fornoville, Tom
Hello Geert-Jan,

This seems like a very promising idea, I will test it out later today.
It is not expected that we have results in all buckets, we have many
use-cases where only 1 or 2 buckets are filled.
It is also not a problem that the first 10 results (or 20 in our case)
all fall in the same bucket.

I'll keep you updated on how this works out.

-Original Message-
From: Geert-Jan Brits [mailto:gbr...@gmail.com] 
Sent: maandag 14 juni 2010 11:00
To: solr-user@lucene.apache.org
Subject: Re: custom scorer in Solr

First of all,

Do you expect every query to return results for all 4 buckets?
i.o.w: say you make a Sortfield that sorts for score 4 first, than 3, 2,
1.
When displaying the first 10 results, is it ok that these documents
potentially all have score 4, and thus only bucket 1 is filled?

If so, I can think of the following out-of-the-box option works: (which
I'm
not sure performs enough, but you can easily test it on your data)

following your example create 4 fields:
1. categoryExact - configure anaylzers so that only full matches score,
other
2. categoryPartial - configure so that full and partial match (likely
you
have already configured this)
3. nameExact - like 1
4. namepartial - like 2

configure copyfields: 1 --> 2 and 3 --> 4
this way your indexing client can stay the same as it likely is at the
moment.


Now you have 4 fields which scores you have to combine on search-time so
that the evenual scores are [1,4]
Out-of-the-box you can do this with functionqueries.

http://wiki.apache.org/solr/FunctionQuery

I don't have time to write it down exactly, but for each field:
- calc the score of each field (use the Query functionquery (nr 16 in
the
wiki) . If score > 0 use the map function to map it to respectively
4,3,2,1.

now for each document you have potentially multiple scores for instance:
4
and 2 if your doc matches exact and partial on category.
- use the max functionquery to only return the highest score --> 4 in
this
case.

You have to find out for yourself if this performs though.

Hope that helps,
Geert-Jan


2010/6/14 Fornoville, Tom 

> I've been investigating this further and I might have found another
path
> to consider.
>
> Would it be possible to create a custom implementation of a SortField,
> comparable to the RandomSortField, to tackle the problem?
>
>
> I know it is not your standard question but would really appreciate
all
> feedback and suggestions on this because this is the issue that will
> make or break the acceptance of Solr for this client.
>
> Thanks,
> Tom
>
> -Original Message-
> From: Fornoville, Tom
> Sent: woensdag 9 juni 2010 15:35
> To: solr-user@lucene.apache.org
> Subject: custom scorer in Solr
>
> Hi all,
>
>
>
> We are currently working on a proof-of-concept for a client using Solr
> and have been able to configure all the features they want except the
> scoring.
>
>
>
> Problem is that they want scores that make results fall in buckets:
>
> *   Bucket 1: exact match on category (score = 4)
> *   Bucket 2: exact match on name (score = 3)
> *   Bucket 3: partial match on category (score = 2)
> *   Bucket 4: partial match on name (score = 1)
>
>
>
> First thing we did was develop a custom similarity class that would
> return the correct score depending on the field and an exact or
partial
> match.
>
>
>
> The only problem now is that when a document matches on both the
> category and name the scores are added together.
>
> Example: searching for "restaurant" returns documents in the category
> restaurant that also have the word restaurant in their name and thus
get
> a score of 5 (4+1) but they should only get 4.
>
>
>
> I assume for this to work we would need to develop a custom Scorer
class
> but we have no clue on how to incorporate this in Solr.
>
> Maybe there is even a simpler solution that we don't know about.
>
>
>
> All suggestions welcome!
>
>
>
> Thanks,
>
> Tom
>
>


Re: diff logging for each solr-core?

2010-06-14 Thread Alexander Rothenberg
After some more research, i found an even older thread on the list where it 
was discussed a little more, but still no separat logfiles:  
http://search.lucidimagination.com/search/document/a5cdc596b2c76a7c/setting_a_log_file_per_core_with_slf4
 

Anyway i will use this in my custom-code to add a prefix for each line.

Regards, Alex

-- 
Alexander Rothenberg
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.net/
Potsdamer Str. 96   Tel: +49 30 25792890
10785 BerlinFax: +49 30 257928999

Geschäftsführer:Ali Paczensky
Amtsgericht:Berlin Charlottenburg (HRB 73099)
Sitz:   Berlin


Re: diff logging for each solr-core?

2010-06-14 Thread Peter Karich
Hi Alex,

as I understand the thread you will have to change the solr src then,
right? The logPath is not available or did I understand something wrong?

If you are okay with touching solr I would rather suggest repackaging
the solr.war with a different logging configuration. (so that the cores
do not fallback to the tomcat one)

Regards,
Peter.

> After some more research, i found an even older thread on the list where it 
> was discussed a little more, but still no separat logfiles:  
> http://search.lucidimagination.com/search/document/a5cdc596b2c76a7c/setting_a_log_file_per_core_with_slf4
>  
>
> Anyway i will use this in my custom-code to add a prefix for each line.
>
> Regards, Alex
>
>   


-- 
http://karussell.wordpress.com/



Re: diff logging for each solr-core?

2010-06-14 Thread Alexander Rothenberg
On Monday 14 June 2010 13:21:31 Peter Karich wrote:
> as I understand the thread you will have to change the solr src then,
> right? The logPath is not available or did I understand something wrong?

For me, i will only change my own custom-code, not the orginal src from solr. 
I had to write a custom dataSource-plugin and a custom dataImportHandler to 
index over 300 db's (database design of our customers is special... :s ) and 
badly need to see which log-msg comes from which solrcore. 
It will not be possible to really influence the logpath from inside the 
solrcore. 

The only way i know is altering the log4j.xml for example:






  
  
  

  




  


this would write all log-msg's from the 
java-package "org.apache.solr.handler.dataimport" to /var/log/indexer_log
(and my custom classes are in org.apache.solr.handler.dataimport)



-- 
Alexander Rothenberg
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.net/
Potsdamer Str. 96   Tel: +49 30 25792890
10785 BerlinFax: +49 30 257928999

Geschäftsführer:Ali Paczensky
Amtsgericht:Berlin Charlottenburg (HRB 73099)
Sitz:   Berlin


dataimporthandler and javascript transformer and default values

2010-06-14 Thread Markus.Rietzler
hi,

i have two questions:

1) how can i set a default value on an imported field if the
field/column is missing from a SQL query
2) i had a problem with the dataimporthandler. in one database column
(WebDst) i have a string with a comma/semicolon seperated numbers, like

100,200; 300;400, 500

there can be a space or not. i want to have a multivalued field in the
end like


100
200
300
400
500


i thought that the javascript/script-transformer could do the trick. i
have a script like

 

in my entity-definition i have
transformer="RegexTransformer,script:dst2intern,TemplateTransformer"

and then i have a field __intern

 

i thought that this would work perfect. it seems the split only can
split on ; when comparing a single char.
the regex with 

webdst.split(/[,; ] */);

doesn't work. i have check it in a simple html-page, there the
javascript split works with the regex.
the solution which works for me is to first use a regex transformer on
WebDst




and use a simple ";" split in the javascript.

i am using solr 1.4, java 1.6...

does anyone know or can tell my, why the javascript split with a regex
doesn't work?

thank you

markus


 


Re: dataimporthandler and javascript transformer and default values

2010-06-14 Thread Geek Gamer
hi,

check Regex Transformer
http://wiki.apache.org/solr/DataImportHandler#RegexTransformer

umar

On Mon, Jun 14, 2010 at 5:44 PM,  wrote:

> hi,
>
> i have two questions:
>
> 1) how can i set a default value on an imported field if the
> field/column is missing from a SQL query
> 2) i had a problem with the dataimporthandler. in one database column
> (WebDst) i have a string with a comma/semicolon seperated numbers, like
>
>100,200; 300;400, 500
>
> there can be a space or not. i want to have a multivalued field in the
> end like
>
> 
>100
>200
>300
>400
>500
> 
>
> i thought that the javascript/script-transformer could do the trick. i
> have a script like
>
>  function dst2intern(row) {
>var webdst='';
>var count = 0;
>webdst = row.get('WebDst');
>var arr = new java.util.ArrayList();
>if (webdst) {
>// var dst = webdst.split(/[,; ] */);
>var dst = webdst.split(';');
>for (var i=0; iarr.add(dst[i]);
>count++;
>}
>if (!count) {
>arr.add('0');
>}
>row.put('intern', arr);
>} else {
>arr.add('0');
>row.put('intern', arr);
>}
>return row;
>}
>]]>
>
> in my entity-definition i have
> transformer="RegexTransformer,script:dst2intern,TemplateTransformer"
>
> and then i have a field __intern
>
>  
>
> i thought that this would work perfect. it seems the split only can
> split on ; when comparing a single char.
> the regex with
>
>webdst.split(/[,; ] */);
>
> doesn't work. i have check it in a simple html-page, there the
> javascript split works with the regex.
> the solution which works for me is to first use a regex transformer on
> WebDst
>
> 
> 
>
> and use a simple ";" split in the javascript.
>
> i am using solr 1.4, java 1.6...
>
> does anyone know or can tell my, why the javascript split with a regex
> doesn't work?
>
> thank you
>
> markus
>
>
>
>


VelocityResponseWriter in Solr Core ?! configuration

2010-06-14 Thread stockii

Hello.

I want to use the VelocityResponseWriter.  I did all these steps from this
site: http://wiki.apache.org/solr/VelocityResponseWriter

Builded a war-file with "ant dist" and use it. but solr cannot find the
VelocityResponseWriter - Class 

java.lang.NoClassDefFoundError: org/apache/solr/response/QueryResponseWriter
at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClassCond(Unknown Source)

i think i have an mistake when i build the warfile, because in my
build-folder after building, no valecity-classes are in there. how can i
build a solr.war with this classes ? 

thx =)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/VelocityResponseWriter-in-Solr-Core-configuration-tp894262p894262.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: VelocityResponseWriter in Solr Core ?! configuration

2010-06-14 Thread Erik Hatcher

What version of Solr are you using?

If you're using trunk, the VelocityResponseWriter is built in to the  
example.


If you're using previous versions, try specifying  
"solr.VelocityResponseWriter" as the class name, as it switched from  
the request to the response packages, and the "solr." shortcut will  
find it in either one.  The additional JAR files can go into your  
/lib subdirectory and don't need to be built into the WAR  
at all.


Erik


On Jun 14, 2010, at 8:39 AM, stockii wrote:



Hello.

I want to use the VelocityResponseWriter.  I did all these steps  
from this

site: http://wiki.apache.org/solr/VelocityResponseWriter

Builded a war-file with "ant dist" and use it. but solr cannot find  
the

VelocityResponseWriter - Class

java.lang.NoClassDefFoundError: org/apache/solr/response/ 
QueryResponseWriter

at java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClassCond(Unknown Source)

i think i have an mistake when i build the warfile, because in my
build-folder after building, no valecity-classes are in there. how  
can i

build a solr.war with this classes ?

thx =)
--
View this message in context: 
http://lucene.472066.n3.nabble.com/VelocityResponseWriter-in-Solr-Core-configuration-tp894262p894262.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: VelocityResponseWriter in Solr Core ?! configuration

2010-06-14 Thread stockii

ah okay.

i tried it with 1.4 and put the jars into lib of solr.home but it want be
work. i get the same error ...

i use 2 cores. and my solr.home is ...path/cores in this folder i put
another folder with the name: "lib" and put all these Jars into it: 
apache-solr-velocity-1.4-dev.jar 
velocity-1.6.1.jar 
velocity-tools-2.0-beta3.jar 
commons-beanutils-1.7.0.jar 
commons-collections-3.2.1.jar
commons-lang-2.1.jar

and then in solrconfig.xml this line:  

solr cannot find the jars =(

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/VelocityResponseWriter-in-Solr-Core-configuration-tp894262p894354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using solr with woodstox 4.0.8

2010-06-14 Thread Weber, Alexander
Hi all,

we are using woodstox-4.0 and solr-1.4 in our project.
As solr is using woodstox-3.2.7, there is a version clash.

So I tried to check if solr would run with woodstox-4.0.

I downloaded a clean solr-1.4.0 and replaced wstx-asl-3.2.7.jar with
stax2-api-3.0.2.jar and woodstox-core-lgpl-4.0.8.jar in the lib directory.
Then I called "ant clean test" and it succeeded with no failures.

Am I missing something? Anything more to test?

Cheers,
Alex  



Re: Solr and Nutch/Droids - to use or not to use?

2010-06-14 Thread MitchK

Just wanted to push the topic a little bit, because those question come up
quite often and it's very interesting for me.

Thank you!

- Mitch


MitchK wrote:
> 
> Hello community and a nice satureday,
> 
> from several discussions about Solr and Nutch, I got some questions for a
> virtual web-search-engine.
> 
> The requirements:
> I. I need a scalable solution for a growing index that becomes larger than
> one machine can handle. If I add more hardware, I want to linear improve
> the performance.
> 
> II. I want to use technologies like the OPIC-algorithm (default algorithm
> in Nutch) or PageRank or... whatever is out there to improve the ranking
> of the webpages. 
> 
> III. I want to be able to easily add more fields to my documents. Imagine
> one retrives information from a webpage's content, than I want to make it
> searchable.
> 
> IV. While fetching my data, I want to make special-searches possible. For
> example I want to retrive pictures from a webpage and want to index
> picture-related content into another search-index plus I want to save a
> small thumbnail of the picture itself. Btw: This is (as far as I know) not
> possible with solr, because solr was not intended to do such special
> indexing-logic.
> 
> V. I want to use filter queries (i.e. main-query "christopher lee" returns
> 1.5mio results, subquery "action" -> the main-query would be a
> filter-query and "action" would be the actual query. So a search within
> search-results would be easily made available).
> 
> VI. I want to be able to use different logics for different pages. Maybe I
> got a pool of 100 domains that I know better than others and I got special
> scripts that retrive more special information from those 100 domains. Than
> I want to apply my special logic to those 100 domains, but every other
> domain should use the default logic.
> 
> -
> 
> The project is only virtual. So why I am asking?
> I want to learn more about websearch and I would like to make some new
> experiences.
> 
> What do I know about Solr + Nutch:
> As it is said on lucidimagination.com, Solr + Nutch does not scale if the
> index is too large.
> The article was a little bit older and I don't know whether this problem
> gets fixed with the new distributed abilities of Solr.
> 
> Furthermore I don't want to index the pages with nutch and reindex them
> with solr. 
> The only exception would be: If the content of a webpage get's indexed by
> nutch, I want to use the already tokenized content of the body with some
> Solr copyfield operations to extend the search (i.e. making fuzzy search
> possible). At the moment: I don't think this is possible.
> 
> I don't know much about the droids project and how well it is documented.
> But from what I can read by some posts of Otis, it seems to be usable as a
> crawler-framework.
> 
> 
> Pros for Nutch are: It is very scalable! Thanks to hadoop and MapReduce it
> is a scaling-monster (from what I've read).
> 
> Cons: The search is not as rich as it is possible with Solr. Extend
> Nutch's search-abilities *seems* to be more complicated than with Solr.
> Furthermore, if I want to use Solr to search nutch's index, looking at my
> requirements I would need to reindex the whole thing - without the
> benefits of Hadoop. 
> 
> What I don't know at the moment is, how it is possible to use algorithms
> like in II. mentioned with Solr.
> 
> I hope you understand the problem here - Solr *seems* to me as it would
> not be the best solution for a web-search-engine, because of scaling
> reasons in indexing. 
> 
> 
> Where should I dive deeper? 
> Solr + Droids?
> Solr + Nutch?
> Nutch + howToExtendNutchToMakeSearchBetter?
> 
> 
> Thanks for the discussion!
> - Mitch
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp890640p894391.html
Sent from the Solr - User mailing list archive at Nabble.com.


FW: Tika in Action

2010-06-14 Thread Mattmann, Chris A (388J)
All, FYI, as SolrCell is built on top of Tika, some folks might be interested 
in this message I posted to the Tika lists.

Thanks!

Cheers,
Chris

-- Forwarded Message
From: "Mattmann, Chris A (388J)" 
Reply-To: 
Date: Fri, 11 Jun 2010 19:07:24 -0700
To: 
Cc: 
Subject: Tika in Action

Hi Folks,

Just wanted to give you an FYI that the book that Jukka Zitting and I are
writing on Tika titled "Tika in Action" is now available through Manning's
Early Access Program [1].

Feedback, comments welcome.

Thanks!

Cheers,
Chris

[1] http://www.manning.com/mattmann/

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




-- End of Forwarded Message


Re: Using solr with woodstox 4.0.8

2010-06-14 Thread Peter Karich
Hi Alex!

> Am I missing something? Anything more to test?
>   

Are you using solrj too? If so, beware of:
https://issues.apache.org/jira/browse/SOLR-1950

Regards,
Peter.


Re: VelocityResponseWriter in Solr Core ?! configuration

2010-06-14 Thread Erik Hatcher


On Jun 14, 2010, at 9:12 AM, stockii wrote:
i tried it with 1.4 and put the jars into lib of solr.home but it  
want be

work. i get the same error ...

i use 2 cores. and my solr.home is ...path/cores in this folder i put
another folder with the name: "lib" and put all these Jars into it:
apache-solr-velocity-1.4-dev.jar
velocity-1.6.1.jar
velocity-tools-2.0-beta3.jar
commons-beanutils-1.7.0.jar
commons-collections-3.2.1.jar
commons-lang-2.1.jar


With multicore, you either have to put the JARs in each cores lib/  
directory, or use the multicore sharedLib feature to point to the  
proper lib directory.




and then in solrconfig.xml this line: name="velocity"

class="org.apache.solr.response.VelocityResponseWriter"/>


Again, I strongly recommend you use class="solr.VelocityResponseWriter".

Erik



Re: Using solr with woodstox 4.0.8

2010-06-14 Thread Weber, Alexander
Hi Peter!

Yes, we do.
Thanks for the hint!

Cheers,
Alex


Am 14.06.10 16:49 schrieb "Peter Karich" unter :

> Hi Alex!
> 
>> Am I missing something? Anything more to test?
>>   
> 
> Are you using solrj too? If so, beware of:
> https://issues.apache.org/jira/browse/SOLR-1950
> 
> Regards,
> Peter.



Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread olivier sallou
Hi,
I use Solr Cell to send specific content files. I developped a dedicated
Parser for specific mime types.
However I cannot get Solr accepting my new mime types.

In solrconfig, in update/extract requesthandler I specified ./tika-config.xml , where tika-config.xml is in
conf directory (same as solrconfig).

In tika-config I added my mimetypes:


biosequence/document
biosequence/embl
biosequence/genbank


I do not know for:
  

whereas path to tika mimetypes should be absolute or relative... and even if
this file needs to be redefined if "magic" is not used.


When I run my update/extract, I have an error that "biosequence/document"
does not match any known parser.

Thanks

Olivier


Re: Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread Ken Krugler

Hi Olivier,

Are you setting the mime type explicitly via the stream.type parameter?

-- Ken

On Jun 14, 2010, at 9:14am, olivier sallou wrote:


Hi,
I use Solr Cell to send specific content files. I developped a  
dedicated

Parser for specific mime types.
However I cannot get Solr accepting my new mime types.

In solrconfig, in update/extract requesthandler I specified name="tika.config">./tika-config.xml , where tika-config.xml  
is in

conf directory (same as solrconfig).

In tika-config I added my mimetypes:


   biosequence/document
   biosequence/embl
   biosequence/genbank
   

I do not know for:
 

whereas path to tika mimetypes should be absolute or relative... and  
even if

this file needs to be redefined if "magic" is not used.


When I run my update/extract, I have an error that "biosequence/ 
document"

does not match any known parser.

Thanks

Olivier



Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread olivier sallou
Yeap, I do.
As magic is not set, this is the reason why it looks for this specific
mime-type. Unfortunatly, It seems it either do not read my specific
tika-config file or the mime-type file. But there is no error log concerning
those files... (not trying to load them?)


2010/6/14 Ken Krugler 

> Hi Olivier,
>
> Are you setting the mime type explicitly via the stream.type parameter?
>
> -- Ken
>
>
> On Jun 14, 2010, at 9:14am, olivier sallou wrote:
>
>  Hi,
>> I use Solr Cell to send specific content files. I developped a dedicated
>> Parser for specific mime types.
>> However I cannot get Solr accepting my new mime types.
>>
>> In solrconfig, in update/extract requesthandler I specified > name="tika.config">./tika-config.xml , where tika-config.xml is in
>> conf directory (same as solrconfig).
>>
>> In tika-config I added my mimetypes:
>>
>> > class="org.irisa.genouest.tools.readseq.ReadSeqParser">
>>   biosequence/document
>>   biosequence/embl
>>   biosequence/genbank
>>   
>>
>> I do not know for:
>>  
>>
>> whereas path to tika mimetypes should be absolute or relative... and even
>> if
>> this file needs to be redefined if "magic" is not used.
>>
>>
>> When I run my update/extract, I have an error that "biosequence/document"
>> does not match any known parser.
>>
>> Thanks
>>
>> Olivier
>>
>
> 
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>


Solr 1.4 and Nutch 1.0 Integration

2010-06-14 Thread Dean Del Ponte
I'm new to Solr, but I'm interested in setting it up to act like a google
search appliance to crawl and index my website.

It's my understanding that nutch provides the web crawling but needs to be
integrated with Solr in order to get a google search appliance type
experience.

Two questions:

1.  Is the scenario I'm outlining above possible?
2.  If it is possible, where may I found documentation describing how to set
up a Solr/Nutch instance?

Thanks for your help,

Dean Del Ponte


need help with multicore dataimport

2010-06-14 Thread Moazzam Khan
Hi,

Does anyone know how to access the dataimport handler on a multicore setup?

This is my solr.xml









I've tried http://localhost:8080/solr/advisors/dataimport but that
doesn't work. My solrconfig.xml for advisors looks like this:

  
  

  C:\solr\example\solr\advisors\conf\dih-advisors-jdbc.xml

 

Thanks,

Moazzam


Re: need help with multicore dataimport

2010-06-14 Thread Erik Hatcher
This issue is your request handler path: , use name="/dataimport" instead.  Implicitly  
all access to a core is /solr/ and all paths in solrconfig  
go after that.


Erik

On Jun 14, 2010, at 1:44 PM, Moazzam Khan wrote:


Hi,

Does anyone know how to access the dataimport handler on a multicore  
setup?


This is my solr.xml









I've tried http://localhost:8080/solr/advisors/dataimport but that
doesn't work. My solrconfig.xml for advisors looks like this:

 
 
   
 C:\solr\example\solr\advisors\conf\dih- 
advisors-jdbc.xml

   


Thanks,

Moazzam




Re: need help with multicore dataimport

2010-06-14 Thread Moazzam Khan
Thanks! It worked.

- Moazzam

On Mon, Jun 14, 2010 at 12:48 PM, Erik Hatcher  wrote:
> This issue is your request handler path:  name="/advisor/dataimport"...>, use name="/dataimport" instead.  Implicitly
> all access to a core is /solr/ and all paths in solrconfig go
> after that.
>
>        Erik
>
> On Jun 14, 2010, at 1:44 PM, Moazzam Khan wrote:
>
>> Hi,
>>
>> Does anyone know how to access the dataimport handler on a multicore
>> setup?
>>
>> This is my solr.xml
>>
>> 
>>        
>>                
>>                
>>        
>> 
>>
>>
>> I've tried http://localhost:8080/solr/advisors/dataimport but that
>> doesn't work. My solrconfig.xml for advisors looks like this:
>>
>>  
>>  > class="org.apache.solr.handler.dataimport.DataImportHandler">
>>   
>>     > name="config">C:\solr\example\solr\advisors\conf\dih-advisors-jdbc.xml
>>   
>> 
>>
>> Thanks,
>>
>> Moazzam
>
>


Re: AW: XSLT for JSON

2010-06-14 Thread Chris Hostetter

: i'm only want the response format of StandardSearchHandler for the
: TermsComponent. how can i do this in a simple way  ? :D

I still don't understand what you are asking ... TermsComponent returns 
data about terms.  The SearchHandler runs multiple components, and returns 
whatever data those components want to return.

If you are using TermsComponent in SearchHandler, you will get one type of 
data back in the terms section, and it will be in the "terms structure" 
(either as XML or as JSON depending on the writer you use) ... if you use 
some other components in your SerachHandler they will return *different* 
data in the data structutre that makes sense for that component, which 
will either be formated as JSON or XML depending on the response writer 
you use.

But all of this seems orthoginal tothe question you seem adament about, 
which is translating the XML reponse (from some component) into some a 
JSON structure your clients are expecting.  In Short: sure you can 
probably use XSLT to generate JSON from the XML response -- if that's what 
you really want to do, then go right ahead and try it, but since i don't 
know anyone else who has ever done that i can't offer you any specific 
tips or assistance.


-Hoss



Re: Questions about hsin and dist

2010-06-14 Thread Chris Hostetter

i'm not very knowledgable on spatial search, but...

: for example, if I were to use a filter query such as
: 
: {!frange l=0 u=75}dist(2,latitude,longitude,44.0,73.0)
: 
: I would expect it to return all results within 75 mi of the given
: latitude and longitude. however, the values being returned are far
: outside of that range:

nothing in the wiki for the dist function suggests that the returned value 
is in miles -- it's notably devoid of mention of units of measurements.  I 
believe (but am not certain) based on skimming the Junit  
test that it's returning a number between 0 and 1 (as noted in the docs, 
it's finding the distance between two *vectors*)

: {!frange}hsin(1,44.0,73.0,latitude,longitude,true)
: 
: expecting that it would return a filtered set of queries in a radius
: of 1 mi within 44.0lat and 73.0 long, where true tells the hsin
: function to convert to radians. However, whether or not the filter is

That doens't match my reading of the docs at all -- as i understand it, 
the "radius" argument to the hsin function is the radius of the sphere, in 
whatever units you want, and then it computes the distance between two 
points on that sphere using the same units.  so if you want to filter to 
only points within 1 mile of some specific point (where all points are 
specified in degrees) you would use something like...

fq={!frange l=0 u=1}hsin(XXX,44.0,73.0,latitude,longitude,true)

...where XXX is the radius of hte earth in miles (i didn't bother to look 
it up)


-Hoss



Re: Questions about hsin and dist

2010-06-14 Thread Yonik Seeley
On Mon, Jun 14, 2010 at 3:35 PM, Chris Hostetter
 wrote:
> fq={!frange l=0 u=1}hsin(XXX,44.0,73.0,latitude,longitude,true)
>
> ...where XXX is the radius of hte earth in miles (i didn't bother to look
> it up)

That's what the docs say, but it doesn't really work in my experience.
IMO, the spatial stuff is still in development and not ready for
public consumption.

-Yonik
http://www.lucidimagination.com


Re: Solr Architecture discussion

2010-06-14 Thread Chris Hostetter

: B- A backup of the current index would be created
: C- Re-Indexing will happen on Master-core2 
: D- When Indexing is done, we'll trigger a swap between Master-core1 and
: core2
...
: But how can B,C, and D. I'll do it manually. Wait! I'm not sure my boss will
: pay for that.

: 1/Can I leverage on some solr mechanisms (that is, by configuration only) in
: order to reach that goal?
: I haven't found how to do it!

your best bet is some external scheduler -- depending on how your build 
process works, you can fairly easily integrate it into external 
publishing tools.

: 2/ Is there any issue while replicating master "swapped" index files? I've
: seen in the literature that there might be some issues.

As long as the "new" version of the index is treuly "newer" then the old 
version, there shouldn't be any problem.

Frankly though: i'm not sure you need core swapping on the master either 
-- it depends largely on how much "churn" will happen each time you do one 
of these full rebuilds.  you could just as easily do incremental 
reindexing on your master, with occasional commits (or even autocommits) 
nad your slaves picking up those new segments -- either gradually, or all 
at once when you do a monolithic commit. 

if you're ok with the slaves pulling over the *entire* index after you do 
the core swap, then you should be fine with the slaves pulling over the 
*entire* index (or maybe just most of it) after a rebuild directly to the 
existing core.

all you really need to do explicitly on the master is trigger a backup 
just before you rebuild the world, and if (and only if) something goes 
terribly wrong, then restore from your backup.




-Hoss



Re: Default filter in solr config (+filter document by now for near time index feeling)

2010-06-14 Thread Chris Hostetter

: 10 minutes. Sure, but idea now is to index all documents with a index 
: date, set this index date 10 min to the future and create a filter 
: "INDEX_DATE:[* TO NOW]".
: 
: Question 1: is it possible to set this as part of solr-config, so every 
: implementation against the server will regard this.

yes.

: Question 2: From caching point of view this sounds a little ugly, is it - 
anybody tried this?

it is very ugly, and i don't recommend it if you even remotely care about 
caching -- at a minimum you should do something like "INDEX_DATE:[* TO 
NOW/MINUTE+1MINUTE]" so you at least get reusable queries for 1 minute at 
a time.




-Hoss



Re: how to use "q=string" in solrconfig.xml `?

2010-06-14 Thread Chris Hostetter

: this ist my request to solr. and i cannot change this.:
: http://host/solr/select/?q=string
: 
: i cannot change this =( so i have a new termsComponent. i want to use
: q=string as default for terms.prefix=string.
: 
: can i do somethin like this: ?
: 
: 
:  true   
:  suggest
:  index
:  ${???}
: 

in general: no.  for things that are QParsers there is a "local var" 
feature that can be used -- but the term.prefix isn't parsed as a query, 
so it doesn't work that way.

your best bet is to add a server side rule (using something like 
mod_rewrite) that adds a term.prefix param using the value of the q param



-Hoss



Re: SolrException: No such core

2010-06-14 Thread Chris Hostetter

: Here the wrappers to use ...solrj.SolrServer
: [code]
: public class SolrCoreServer
: {
:private static Logger log = LoggerFactory.getLogger(SolrCoreServer.class);
:   
:private SolrServer server=null;
:
:public SolrCoreServer(CoreContainer container, String coreName)
:{
:   server = new EmbeddedSolrServer( container, coreName );
:}

showing the code for your SolrCoreServer isn't any use if you don't show 
us how you construct instance of it ... how are you initializing that 
CoreContainer?

In general, you've provided a lot of info, but you haven't answered most 
of my very specific questions...

: * what does your code for initializing solr look like?

  ...need to details on the CoreContainer (and what coreName you are 
passing) for that to be of any use.

: * what does your soler home dir look like (ie: what files are in it)

...you showed us the files, but not the directory structure

: * what is the full stack trace of these exceptions, and what does your 
: code look like around the lines where these stack traces indicate your 
: code is interacting with solr?

...no mention what so ever in your response.




-Hoss



Re: Some basics

2010-06-14 Thread Chris Hostetter

: - I want my search to "auto" spell check - that is if someone types
: "restarant" I'd like the system to automatically search for restaurant.
: I've seen the SpellCheckComponent but that doesn't seem to have a simple way
: to automatically do the "near" type comparison.  Is the SpellCheckComponent
: the wrong one or do I just need to manually handle the situation in my
: client code?

at the moment you need to handle this in your client -- if you get no 
results back (or too few results based on some expecatation you have) 
but the spellcheck component retunred a suggestion then trigger a 
subsequent search using that suggestion.

: - Also, what is the proper analyzer if I want to search a search for "thai
: food" or "thai restaurant" to actually match on Thai?  I can't totally
: ignore words like food and restaurant but I want to ignore more general
: terms and look for specific first (or I should say score them higher).

the issue isn't so much your analyzer as how you structure your query -- i 
would suggest using the dismax query parser with a very low value for hte 
'mm' param (ie: '1' or something like '10%' if you expect a lot of queries 
with many many words) and a useful "pf" param -- that way two word queries 
will return matches for either word, but docs that match both words will 
score higher, and docs that match the full phrase will score the highest.




-Hoss



Re: Indexing stops after exception

2010-06-14 Thread Chris Hostetter

: on one of the PDF documents and this causes indexing to stop (the
: TikaEntityProcessor) throws a Severe exception. Is it possible to ignore
: this exception and continue indexing by some kind of solr configuration ?

i'm not really a power user of DIH but have you tried adusting the value 
of the 'onError' param?

: TikaEntityProcessor to return null in this case. BTW shouldn't the
: inputstream close be in a finally block?

Almost certainly -- can you please open a Jira issue and either 
attach a patch with your suggested "finally" changes or just
cite the files/lines you think look suspicious.


-Hoss



Re: general debugging techniques?

2010-06-14 Thread Chris Hostetter

: > if you are only seeing one log line per request, then you are just looking
: > at the "request" log ... there should be more logs with messages from all
: > over the code base with various levels of severity -- and using standard
: > java log level controls you can turn these up/down for various components.
: 
: Unfortunately, I'm not very familiar with java deploys so I don't know
: where the standard controls are yet.  As a concrete example, I do see
: INFO level logs, but haven't found a way to move up DEBUG level in
: either solr or tomcat.  I was hopeful debug statements would point to
: where extraction/indexing hangs were occurring.  I will keep poking
: around, thanks for the tips.

Hmm ... it sounds like maybe you haven't seen this wiki page...

  http://wiki.apache.org/solr/SolrLogging

..as mentioned there, for quick debugging, there is an admin page to 
adjust the log levels on the fly...

  http://localhost:8983/solr/admin/logging.jsp

...but for more long term changes to the logging configuration, it depends 
greatly on wether your servlet container customizes the Java LogManager.  
There are links there to general info about Java logging, and about 
tweaking this in the example Jetty setup.


-Hoss



Re: Master master?

2010-06-14 Thread Chris Hostetter

: Does Solr handling having two masters that are also slaves to each other (ie
: in a cycle)?

no.



-Hoss



Re: Help with Shingled queries

2010-06-14 Thread Chris Hostetter

: the queryparser first splits on whitespace.

FWIW: robert is refering to the LuceneQParser, and it also applies to the 
DismaxQParser ... whitespace is considered markup in those parsers unless 
it's escaped or quoted.

The FieldQParser may make more sense for your usecase - or you may need a 
custom QParser (hard to tell)

To answer your specific question...

: > the debug output, for example with the term "short red evil fox" I would
: > expect
: > to see the shingles
: > 'short_red' 'red_evil' 'evil_fox'
: >
: > but instead I get the following
: >
: > "debug":{
: >  "rawquerystring":"short red evil fox",
: >  "querystring":"short red evil fox",
: >  "parsedquery":"+() ()",
: >  "parsedquery_toString":"+() ()",
: >  "explain":{},
: >  "QParser":"DisMaxQParser",

...you are using the DisMaxQParser, but evidently you haven't configured 
the qf or pf fields, so you are getting a query that is completley empty.



-Hoss



Re: Problem in solr reponse time when Query size is big

2010-06-14 Thread Chris Hostetter

You'll have to give us some specific details of what your code/queries 
look like, and the exact error messages you are getting back if you expect 
anyone to be able to compe up with a meaniniful guess as to what might be 
going wrong for you

Off the top of my head, there is no reason i can think of why a "large" 
query would cause something that might be called an "HTTP Version not 
supported." error unless there was a bug in your servlet container, or a 
bug in your client code, or both.

: Hi All,
: 
: I have configured Apache Solr 1.4 with JBoss 5.1.0GA and Working fine when I
: send some small query strings but my requirement is different and I have to
: build query string in the fly and pass to solr and execute to get response.
: It's working fine with small query of data but when passing big query then
: not responding anything on page and in JBoss console I got message HTTP
: Version not supported. Can anyone help me where I am wrong? If any other way
: to overcome this problem then please reply me.
: 
: 
: Thanks & Regards,
: Dhirendra
: -- 
: View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-solr-reponse-time-when-Query-size-is-big-tp876221p876221.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 



-Hoss



Re: custom scorer in Solr

2010-06-14 Thread Chris Hostetter

: Problem is that they want scores that make results fall in buckets:
: 
: * Bucket 1: exact match on category (score = 4)
: * Bucket 2: exact match on name (score = 3)
: * Bucket 3: partial match on category (score = 2)
: * Bucket 4: partial match on name (score = 1)
...
: First thing we did was develop a custom similarity class that would
: return the correct score depending on the field and an exact or partial
: match.
...
: The only problem now is that when a document matches on both the
: category and name the scores are added together.

what QParser are you using?  what does the resulting Query data structure 
look like?

I think with your custom Similarity class you might be able to achieve 
your goal using the DisMaxQParser w/o any other custom code -- just set 
your "qf=category name" (i'm assuming your Similarity already handles the 
relative weighting) and set the "tie=0" ... that will ensure that the 
final score only comes from the "Max" scoring field (ie: no tie breaking 
values fro mthe other fields)

if thta doesn't do what you want -- then your best bet is probably to 
write a custom QParser that generates *exactly* the query structure you 
want (likely using a DisjunctionMaxQuery) thta will give you the scores 
you want in conjunction with your similarity class.


-Hoss



Re: Request log does not show QTime

2010-06-14 Thread Chris Hostetter
: How do you customize the RequestLog to include the query time, hits, and

the "RequestLog" is a jetty specific log file -- it's only going to know 
the concepts that Jetty specificly knows about.

: Note, I do see this information in log.solr.0, but it also includes the full
: query parameters which are too verbose, so I need to turn that logging off.
: Jun 10, 2010 1:35:03 PM org.apache.solr.core.SolrCore execute
: INFO: [] webapp=/solr path=/select/ params={...} hits=4587 status=0 QTime=19

that's the format Solr uses for logging individual requests.  if you 
want to change it you can either write a custom LogHandler or a custom 
LogFormatter, or you can post-process...

http://java.sun.com/j2se/1.5.0/docs/guide/logging/overview.html



-Hoss



Re: AW: how to get multicore to work?

2010-06-14 Thread Chris Hostetter
: As it stands, solr works fine, and sites like
: http://locahost:8983/solr/admin also work.
: 
: As soon as I put a solr.xml in the solr directory, and restart the tomcat
: service. It all stops working.
: 
:   
: 
:   
: 

You need to elaborate on "It all stops working" ... what does that mean? 
what are you trying to do? and what errors are you getting?

when i take an existing (functional) Solr 1.4 SolrHome dir, and drop that 
solr.xml file into it, everything works as expected for me

  1. Solr starts up 
  2. This URL lists a link to the admin page for a single core named 
 "core0"...
 http://localhost:8983/solr/
  3. This URL let's me use core0...
 http://localhost:8983/solr/core0/admin/
  4. this URL (specified in your solr.xml) let's my admin the cores 
 (ie: view-status/add/remove/reload) ...
 http://localhost:8983/solr/admin/cores


-Hoss



Re: Copyfield multi valued to single value

2010-06-14 Thread Chris Hostetter

: Is there a way to copy a multivalued field to a single value by taking 
: for example the first index of the multivalued field?

Unfortunately no.  This would either need to be done with an 
UpdateProcessor, or on the client constructing hte doc (either the remote 
client, or in your DIH config if that's how you are using Tika)



-Hoss



Re: Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread Chris Hostetter

: In solrconfig, in update/extract requesthandler I specified ./tika-config.xml , where tika-config.xml is in
: conf directory (same as solrconfig).

can you show us the full requestHandler decalration? ... tika.config needs 
to be a direct child of the requestHandler (not in the defaults)

I also don't know if using a "local" path like that will work -- depends 
on how that file is loaded (if solr loads it, then you might want to 
remove the "./";  if solr just gives the path to tika, then you probably 
need an absolute path.


-Hoss



Re: Custom faceting question

2010-06-14 Thread Chris Hostetter

: I believe I'll need to write some custom code to accomplish what I want
: (efficiently that is) but I'm unsure of what would be the best route to
: take. Will this require a custom request handler? Search component? 

You'll need a customized version of the FacetComponent if you want to do 
this all on the server side.

: We have a similar category structure whereas we have top level-categories
: and then sub-categories. I want to be able to perform a search and then only
: return the top 3 top-level categories with their sub-categories also
: faceted. The problem is I don't know what those top 3 top-level categories
: are until after I search.

the main complexity with a situation like this is how you model it.  
regardless of wether you do it server side or client side the straight 
forward appraoch is to do basic faceting on a "top level" category field, 
and then given the top three responses do secondary faceting o na field 
that contains the full category "breadcrumb" -- either using something 
like facet.prefix or by walking some other in memory data structure 
represending your category graph that lets you access the children of a 
particular category (depends wether you need complex rules to identify 
what documents are in a category)

: Second way. Have the client send multiple requests on the backend. First to
: determine the top 3 categories, then another for all the subcategories. This
: involves more client side coding and I would prefer not to perform 2x the
: requests. If at all possible I would like to do this on the Solr side.

...you've already got the conceptual model of how to do it, all you need 
now is to implement it as a Component that does the secondary-faceting in 
the same requests (which should definitley be more efficient since you can 
reuse the DocSets) instead of issuing secondary requets from your client



-Hoss



Re: Indexing Problem with SOLR multicore

2010-06-14 Thread Chris Hostetter

I can't think of any way this could happen -- can you provide some more 
detials on what example you are doing, and hat you are doing to observe 
the problem?

In particular:
  * what do each of your DIH config files look like?
  * what URLs are you using to trigger DIH imports?
  * how are you checking your document counts?
  * what URLs are you querying to see the results? 
- what results do you get from these URLs before you stop/start the 
  server that look correct?  
- what results do you get after the stop/start thta are incorrect? 


: Hi,
:   I am using SOLR with Tomcat server. I have configured two
: multicore inside the SOLR home directory. The solr.xml file looks like 
: 
: 
:   
: 
: 
:   
:  
: 
: I am also using DIH to upload the data in these two cores separately &
: document count in these two core is different. However whenever I restart
: the tomcat server my document count in these two core show the same. Also
: both the core exits but whenever I tried to search the data in any core it
: returns me data from different core.
: 
: E.g. If I tried to search the data in MyTestCore1 core then solr returns the
: result from MyTestCore2 core (this is a problem) & If I tried to search the
: data in MyTestCore2 core then solr returns the data from MyTestCore2 core
: (which is fine) OR some time vice-versa   happens...
: 
: Now if I reindex the data in MyTestCore1 core using "Full data-import with
: cleanup" then problem gets sort out. but comes gaing if I restart my tomcat
: server.
: 
: Is there any issue with my core configuration? Please help
: 
: 
: Thanks,
: Siddharth
: 
: 
: 
: -- 
: View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Problem-with-SOLR-multicore-tp884745p884745.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 



-Hoss



Re: Custom faceting question

2010-06-14 Thread Blargy

: ...you've already got the conceptual model of how to do it, all you need
: now is to implement it as a Component that does the secondary-faceting in
: the same requests (which should definitley be more efficient since you can
: reuse the DocSets) instead of issuing secondary requets from your client

Couldn't I just create a custom search handler to do this so it all the
logic resides on the server side? I'm guessing I would need to subclass
SearchHandler and override handleRequestBody.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-faceting-question-tp868015p895990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple location filters per search

2010-06-14 Thread Chris Hostetter
: I am currently working with the following:
: 
: {code}
: {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, longitude)
: {/code}
...
: {code}
: {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude,
: longitude) OR {!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152,
: latitude, longitude)
: {/code}
...
: I get an error. Hoping someone has an idea of how to work with
: multiple locations in a single search.

I think yo uare confused about how that query is getting parsed ... when 
SOlr sees the "{!frange" at the begining of hte param, that tells it that 
the *entire* praam value should be parsed by the frange parser.  The 
frange parser doesn't know anything about keywords like "OR"

What you probably want is to utilize the "_query_" hack of the 
LuceneQParser so that you can parse some "Lucene" syntax (ie: A OR B) 
where the clauses are then generated by using another parser...

http://wiki.apache.org/solr/SolrQuerySyntax

fq=_query_="{!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, 
longitude)" OR _query_:"{!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152, 
latitude, longitude)"

   ...or a little more readable...

fq=_query_="{!frange l=0 u=1 unit=mi v=$qa}" OR _query_:"{!frange l=0 u=1 
unit=mi v=$qb}"
qa=dist(2,32.6126, -86.3950, latitude, longitude)
qb=dist(2,44.1457, -73.8152, latitude, longitude)




-Hoss



Re: Custom faceting question

2010-06-14 Thread Chris Hostetter

: : ...you've already got the conceptual model of how to do it, all you need
: : now is to implement it as a Component that does the secondary-faceting in
: : the same requests (which should definitley be more efficient since you can
: : reuse the DocSets) instead of issuing secondary requets from your client
: 
: Couldn't I just create a custom search handler to do this so it all the
: logic resides on the server side? I'm guessing I would need to subclass
: SearchHandler and override handleRequestBody.

I think you're missunderstanding me -- i'm agreeing with you that you can 
do it on the server side, and that it will make sense to do it on the 
server side -- i'm saying that instead of  impelementing a SearchHandler, 
you should just implement a SearchComponent that you would use in place of 
(or in addition to) FacetComponent ...

  http://wiki.apache.org/solr/SearchComponent


-Hoss



CFP for Surge Scalability Conference 2010

2010-06-14 Thread Jason Dixon
We're excited to announce Surge, the Scalability and Performance
Conference, to be held in Baltimore on Sept 30 and Oct 1, 2010.  The
event focuses on case studies that demonstrate successes (and failures)
in Web applications and Internet architectures.

Our Keynote speakers include John Allspaw and Theo Schlossnagle.  We are
currently accepting submissions for the Call For Papers through July
9th.  You can find more information, including our current list of
speakers, online:

http://omniti.com/surge/2010

If you've been to Velocity, or wanted to but couldn't afford it, then
Surge is just what you've been waiting for.  For more information,
including CFP, sponsorship of the event, or participating as an
exhibitor, please contact us at su...@omniti.com.

Thanks,

-- 
Jason Dixon
OmniTI Computer Consulting, Inc.
jdi...@omniti.com
443.325.1357 x.241


Re: Indexing Problem with SOLR multicore

2010-06-14 Thread seesiddharth

Hi Chris,
Thank you so much for the help & reply to my query However my
problem got resolved. There was a configuration problem in my solrconfig.xml
file. The tag  was not configured properly that is why both core
were directing to the same directory for indexing. 

Regards,
Siddharth
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Problem-with-SOLR-multicore-tp884745p896347.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spellchecker index cannot be optimized

2010-06-14 Thread Pumpenmeier, Lutz SZ/HZA-ZSB3
Hello,
when I rebuild the spellchecker index ( by optimizing the data index or
by calling cmd=rebuild ) the spellchecker index is not optimized. I even
cannot delete the old indexfiles on the filesystem, because they are
locked by the solr server. I have to stop the  solr server(resin) to
optimize the spellchecker index with luke or by deleting the old files.
How can I optimize the index without stopping the solr server?

Thanks
 Lutz Pumpenmeier