Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread Uri Boness
There's work on the patch that is being done now which will enable you 
to ask for specific field values of the collapsed documents using a 
dedicated request parameter. This work is not committed yet to the 
latest patch, but will be very soon. There is of course a drawback to 
that as well, the collapsed documents set can be very large (depends on 
your data of course) in which case the returned result which includes 
the fields values can be rather large, which will impact performance, 
this is why this feature will be enabled only if you specify this extra 
parameter - by default no field values will be returned.


AFAIK, the latest patch should work fine with the latest build. Martijn 
(which is the main maintainer of this patch) tries to keep it up to date 
with the latest builds. But I guess the safest way is to work with the 
nightly build of the same date as the latest patch (though I would give 
it a try first with the latest build).


BTW, it's not an official suggestion from the Solr development team, but 
if you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, 
I would go for the later. 1.4 is supposed to be released in the upcoming 
week or two and it bring loads of bug fixes, enhancements and extra 
functionality. But again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:

Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part of
the results data?
What is the latest build that is safe to use on a production environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness  wrote:

  

The collapsed documents are represented by one "master" document which can
be part of the normal search result (the doc list), so pagination just works
as expected, meaning taking only the returned documents in account (ignoring
the collapsed ones). As for the scoring, the "master" document is actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All latest
patch are certainly *not* 1.3 compatible (I think they're also depending on
some changes in lucene which are not available for solr 1.3). I guess you'll
have to try some of the old patches, but I'm not sure about their stability.

cheers,
Uri


R. Tan wrote:



Thanks Uri. How does paging and scoring work when using field collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness  wrote:



  

The development on this patch is quite active. It works well for single
solr instance, but distributed search (ie. shards) is not yet supported.
Using this page you can group search results based on a specific field.
There are two flavors of field collapsing - adjacent and non-adjacent,
the
former collapses only document which happen to be located next to each
other
in the otherwise-non-collapsed results set. The later (the non-adjacent)
one
collapses all documents with the same field value (regardless of their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently
discussion
to extend this support so in addition to collapsing the documents, extra
information will be returned for the collapsed documents (see the
discussion
on the issue page).

Uri


R. Tan wrote:





I think this is what I'm looking for. What is the status of this patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan  wrote:





  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business
listings
that may be group into one parent business (such as 7-eleven having
several
locations). On the results page, I only want 7-eleven to show up once
but
also show how many locations matched the query (facet filtered by
state,
for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations
within
the a result is quite tricky. I can do the opposite, searching for the
locations and faceting on business names, but it will still basically
be
the
same thing and repeat results with the same business name.

Any advice?

Thanks,
R








  
  


  


Re: Solr, JNDI config, dataDir, and solr home problem

2009-09-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
ideally you should be able to us the variable

${solr.core.instanceDir}

but as I checked the code that is not being set for a single core
deployment. are you using single core?

On Fri, Sep 4, 2009 at 9:16 PM, Archon810 wrote:
>
> OK, so I can't access it by ${solr.home}, but is there a way to access it?
> After all, it's a variable defined in JNDI, shouldn't there be a way to
> refer to it?
>
> Also, what about the INFO message that says it can't find /solr/home, while
> the instructions refer to solr/home ?
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> ${solr.home} is used for documentation purpose. It is not set as a
>> variable.
>>
>> On Fri, Sep 4, 2009 at 3:58 PM, Archon810 wrote:
>>>
>>> I saw it being used in the default solrconfig.xml in this phrase:
>>> If you wish to hide files under ${solr.home}/conf, explicitly register
>>> the
>>> ShowFileRequestHandler using...
>>>
>>> It was only natural to assume it would work for something as trivial as
>>> dataDir.
>>>
>>> So, there's no way to refer to the solr/home value defined in JNDI?
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 it is nowhere mentioned that you can use a variable ${solr.home} in
 your solrconfig.xml. There is a bug related to this issue
 https://issues.apache.org/jira/browse/SOLR-1267

 On Fri, Sep 4, 2009 at 5:47 AM, Archon810 wrote:
>
> Here's my problem.
>
> I'm trying to follow a multi Solr setup, straight from the Solr wiki -
> http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac.
>
> Here's the relevant code:
>  >
>    value="/some/path/solr1home" override="true" />
> 
>
> Now I want to set the Solr  in solrconfig.xml, relative
> to
> the solr home property. The instructions
> http://wiki.apache.org/solr/SolrConfigXml#head-e8fbf2d748d90c5900aac712d0e3385ced5bd128
> say  is used to specify an alternate directory to hold
> all
> index data other than the default ./data under the Solr home. If
> replication
> is in use, this should match the replication configuration. If this
> directory is not absolute, then it is relative to the current working
> directory of the servlet container.
>
> However, no matter how I try to set the dataDir property, solr home is
> not
> being found. For example,
>  ${solr.home}/data
>
> What's even more confusing are these INFO notices in the log:
> INFO: No /solr/home in JNDI
> Sep 3, 2009 4:33:26 PM org.apache.solr.core.SolrResourceLoader
> locateSolrHome
> INFO: solr home defaulted to 'solr/' (could not find system property or
> JNDI)
>
> The JNDI instructions instruct to specify "solr/home", the log
> complains
> about "/solr/home" (extra slash), the solrconfig.xml file seems to
> expect
> ${solr.home} - how more confusing can it get?
>
> This person is having the same issue:
> http://mysolr.com/tips/setting-solr-home-solrhome-in-jndi-on-tomcat-55/
>
> So, how does one refer to solr home from solrconfig.xml in a JNDI
> configuration scenario? Also, is there a way to debug/see variables
> that
> are
> defined in a specific context, such as solrconfig.xml? I feel like I'm
> completely blind here.
>
> Thank you!
> --
> View this message in context:
> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25286277.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25292025.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25296862.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: TermsComponent

2009-09-05 Thread Israel Ekpo
Hi Todd,

I have not tried this yet.

But try setting the terms.raw parameter to true.

Maybe that will include the whitespace that is missing from the response.

On Fri, Sep 4, 2009 at 5:46 PM, Todd Benge  wrote:

> Hi,
>
> I was looking at TermsComponent in Solr 1.4 as a way of building a
> autocomplete function.  I have a prototype working but noticed that terms
> that have whitespace in them when indexed are absent the whitespace when
> returned from the TermsComponent.
>
> Any ideas on why that may be happening?  Am I just missing a configuration
> option?
>
> Thanks,
>
> Todd
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread R. Tan
Thanks Uri. Your personal suggestion is appreciated and I think I'll follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements. Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness  wrote:

> There's work on the patch that is being done now which will enable you to
> ask for specific field values of the collapsed documents using a dedicated
> request parameter. This work is not committed yet to the latest patch, but
> will be very soon. There is of course a drawback to that as well, the
> collapsed documents set can be very large (depends on your data of course)
> in which case the returned result which includes the fields values can be
> rather large, which will impact performance, this is why this feature will
> be enabled only if you specify this extra parameter - by default no field
> values will be returned.
>
> AFAIK, the latest patch should work fine with the latest build. Martijn
> (which is the main maintainer of this patch) tries to keep it up to date
> with the latest builds. But I guess the safest way is to work with the
> nightly build of the same date as the latest patch (though I would give it a
> try first with the latest build).
>
> BTW, it's not an official suggestion from the Solr development team, but if
> you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would
> go for the later. 1.4 is supposed to be released in the upcoming week or two
> and it bring loads of bug fixes, enhancements and extra functionality. But
> again, this is my personal suggestion.
>
>
> cheers,
> Uri
>
> R. Tan wrote:
>
>> Okay. Thanks for giving an insight on how it works in general. Without
>> trying it myself, are the field values for the collapsed ones also part of
>> the results data?
>> What is the latest build that is safe to use on a production environment?
>> I'd probably go for that and use field collapsing.
>>
>> Thank you very much.
>>
>>
>> On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness  wrote:
>>
>>
>>
>>> The collapsed documents are represented by one "master" document which
>>> can
>>> be part of the normal search result (the doc list), so pagination just
>>> works
>>> as expected, meaning taking only the returned documents in account
>>> (ignoring
>>> the collapsed ones). As for the scoring, the "master" document is
>>> actually
>>> the document with the highest score in the collapsed group.
>>>
>>> As for Solr 1.3 compatibility... well... it's very hart to tell. All
>>> latest
>>> patch are certainly *not* 1.3 compatible (I think they're also depending
>>> on
>>> some changes in lucene which are not available for solr 1.3). I guess
>>> you'll
>>> have to try some of the old patches, but I'm not sure about their
>>> stability.
>>>
>>> cheers,
>>> Uri
>>>
>>>
>>> R. Tan wrote:
>>>
>>>
>>>
 Thanks Uri. How does paging and scoring work when using field
 collapsing?
 What patch works with 1.3? Is it production ready?

 R


 On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness  wrote:





> The development on this patch is quite active. It works well for single
> solr instance, but distributed search (ie. shards) is not yet
> supported.
> Using this page you can group search results based on a specific field.
> There are two flavors of field collapsing - adjacent and non-adjacent,
> the
> former collapses only document which happen to be located next to each
> other
> in the otherwise-non-collapsed results set. The later (the
> non-adjacent)
> one
> collapses all documents with the same field value (regardless of their
> position in the otherwise-non-collapsed results set). Note, that
> non-adjacent performs better than adjacent one. There's currently
> discussion
> to extend this support so in addition to collapsing the documents,
> extra
> information will be returned for the collapsed documents (see the
> discussion
> on the issue page).
>
> Uri
>
>
> R. Tan wrote:
>
>
>
>
>
>> I think this is what I'm looking for. What is the status of this
>> patch?
>>
>> On Thu, Sep 3, 2009 at 12:00 PM, R. Tan 
>> wrote:
>>
>>
>>
>>
>>
>>
>>
>>> Hi Solrers,
>>> I would like to get your opinion on how to best approach a search
>>> requirement that I have. The scenario is I have a set of business
>>> listings
>>> that may be group into one parent business (such as 7-eleven having
>>> several
>>> locations). On the results page, I only want 7-eleven to show up once
>>> but
>>> also show how many locations matched the query (facet filtered by
>>> state,
>>> for
>>> example) and maybe a preview of the some of the locations.
>>>
>>> Searching for the business name is straightforward but the locations
>>

Re: TermsComponent

2009-09-05 Thread Yonik Seeley
On Fri, Sep 4, 2009 at 5:46 PM, Todd Benge wrote:
> I was looking at TermsComponent in Solr 1.4 as a way of building a
> autocomplete function.  I have a prototype working but noticed that terms
> that have whitespace in them when indexed are absent the whitespace when
> returned from the TermsComponent.

It works for me with the example data:
http://localhost:8983/solr/terms?terms.fl=manu_exact

-Yonik
http://www.lucidimagination.com


Re: TermsComponent

2009-09-05 Thread Todd Benge
Thanks - I'll give it a try

On 9/5/09, Yonik Seeley  wrote:
> On Fri, Sep 4, 2009 at 5:46 PM, Todd Benge wrote:
>> I was looking at TermsComponent in Solr 1.4 as a way of building a
>> autocomplete function.  I have a prototype working but noticed that terms
>> that have whitespace in them when indexed are absent the whitespace when
>> returned from the TermsComponent.
>
> It works for me with the example data:
> http://localhost:8983/solr/terms?terms.fl=manu_exact
>
> -Yonik
> http://www.lucidimagination.com
>

-- 
Sent from my mobile device


Re: Solr, JNDI config, dataDir, and solr home problem

2009-09-05 Thread Archon810

Yeah, I'm using single core (solrconfig.xml).



Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> ideally you should be able to us the variable
> 
> ${solr.core.instanceDir}
> 
> but as I checked the code that is not being set for a single core
> deployment. are you using single core?
> 
> On Fri, Sep 4, 2009 at 9:16 PM, Archon810 wrote:
>>
>> OK, so I can't access it by ${solr.home}, but is there a way to access
>> it?
>> After all, it's a variable defined in JNDI, shouldn't there be a way to
>> refer to it?
>>
>> Also, what about the INFO message that says it can't find /solr/home,
>> while
>> the instructions refer to solr/home ?
>>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> ${solr.home} is used for documentation purpose. It is not set as a
>>> variable.
>>>
>>> On Fri, Sep 4, 2009 at 3:58 PM, Archon810 wrote:

 I saw it being used in the default solrconfig.xml in this phrase:
 If you wish to hide files under ${solr.home}/conf, explicitly register
 the
 ShowFileRequestHandler using...

 It was only natural to assume it would work for something as trivial as
 dataDir.

 So, there's no way to refer to the solr/home value defined in JNDI?


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>
> it is nowhere mentioned that you can use a variable ${solr.home} in
> your solrconfig.xml. There is a bug related to this issue
> https://issues.apache.org/jira/browse/SOLR-1267
>
> On Fri, Sep 4, 2009 at 5:47 AM, Archon810 wrote:
>>
>> Here's my problem.
>>
>> I'm trying to follow a multi Solr setup, straight from the Solr wiki
>> -
>> http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac.
>>
>> Here's the relevant code:
>> > crossContext="true"
>> >
>>   > value="/some/path/solr1home" override="true" />
>> 
>>
>> Now I want to set the Solr  in solrconfig.xml,
>> relative
>> to
>> the solr home property. The instructions
>> http://wiki.apache.org/solr/SolrConfigXml#head-e8fbf2d748d90c5900aac712d0e3385ced5bd128
>> say  is used to specify an alternate directory to hold
>> all
>> index data other than the default ./data under the Solr home. If
>> replication
>> is in use, this should match the replication configuration. If this
>> directory is not absolute, then it is relative to the current working
>> directory of the servlet container.
>>
>> However, no matter how I try to set the dataDir property, solr home
>> is
>> not
>> being found. For example,
>>  ${solr.home}/data
>>
>> What's even more confusing are these INFO notices in the log:
>> INFO: No /solr/home in JNDI
>> Sep 3, 2009 4:33:26 PM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: solr home defaulted to 'solr/' (could not find system property
>> or
>> JNDI)
>>
>> The JNDI instructions instruct to specify "solr/home", the log
>> complains
>> about "/solr/home" (extra slash), the solrconfig.xml file seems to
>> expect
>> ${solr.home} - how more confusing can it get?
>>
>> This person is having the same issue:
>> http://mysolr.com/tips/setting-solr-home-solrhome-in-jndi-on-tomcat-55/
>>
>> So, how does one refer to solr home from solrconfig.xml in a JNDI
>> configuration scenario? Also, is there a way to debug/see variables
>> that
>> are
>> defined in a specific context, such as solrconfig.xml? I feel like
>> I'm
>> completely blind here.
>>
>> Thank you!
>> --
>> View this message in context:
>> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25286277.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
>

 --
 View this message in context:
 http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25292025.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25296862.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-so

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread Uri Boness
You can check out http://www.ilocal.nl. If you search for a bank in 
Amsterdam then you'll see that a lot of the results are collapsed. For 
this we used an older version of this patch (which works on 1.3) but a 
lot has changed since then. We're currently using this patch on another 
project, but it's not live yet.


Uri

R. Tan wrote:

Thanks Uri. Your personal suggestion is appreciated and I think I'll follow
your advice. We're still early in development and 1.4 would be a good
choice. I hope I can get field collapsing to work with my requirements. Do
you know any live site using field collapsing already?

On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness  wrote:

  

There's work on the patch that is being done now which will enable you to
ask for specific field values of the collapsed documents using a dedicated
request parameter. This work is not committed yet to the latest patch, but
will be very soon. There is of course a drawback to that as well, the
collapsed documents set can be very large (depends on your data of course)
in which case the returned result which includes the fields values can be
rather large, which will impact performance, this is why this feature will
be enabled only if you specify this extra parameter - by default no field
values will be returned.

AFAIK, the latest patch should work fine with the latest build. Martijn
(which is the main maintainer of this patch) tries to keep it up to date
with the latest builds. But I guess the safest way is to work with the
nightly build of the same date as the latest patch (though I would give it a
try first with the latest build).

BTW, it's not an official suggestion from the Solr development team, but if
you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would
go for the later. 1.4 is supposed to be released in the upcoming week or two
and it bring loads of bug fixes, enhancements and extra functionality. But
again, this is my personal suggestion.


cheers,
Uri

R. Tan wrote:



Okay. Thanks for giving an insight on how it works in general. Without
trying it myself, are the field values for the collapsed ones also part of
the results data?
What is the latest build that is safe to use on a production environment?
I'd probably go for that and use field collapsing.

Thank you very much.


On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness  wrote:



  

The collapsed documents are represented by one "master" document which
can
be part of the normal search result (the doc list), so pagination just
works
as expected, meaning taking only the returned documents in account
(ignoring
the collapsed ones). As for the scoring, the "master" document is
actually
the document with the highest score in the collapsed group.

As for Solr 1.3 compatibility... well... it's very hart to tell. All
latest
patch are certainly *not* 1.3 compatible (I think they're also depending
on
some changes in lucene which are not available for solr 1.3). I guess
you'll
have to try some of the old patches, but I'm not sure about their
stability.

cheers,
Uri


R. Tan wrote:





Thanks Uri. How does paging and scoring work when using field
collapsing?
What patch works with 1.3? Is it production ready?

R


On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness  wrote:





  

The development on this patch is quite active. It works well for single
solr instance, but distributed search (ie. shards) is not yet
supported.
Using this page you can group search results based on a specific field.
There are two flavors of field collapsing - adjacent and non-adjacent,
the
former collapses only document which happen to be located next to each
other
in the otherwise-non-collapsed results set. The later (the
non-adjacent)
one
collapses all documents with the same field value (regardless of their
position in the otherwise-non-collapsed results set). Note, that
non-adjacent performs better than adjacent one. There's currently
discussion
to extend this support so in addition to collapsing the documents,
extra
information will be returned for the collapsed documents (see the
discussion
on the issue page).

Uri


R. Tan wrote:







I think this is what I'm looking for. What is the status of this
patch?

On Thu, Sep 3, 2009 at 12:00 PM, R. Tan 
wrote:







  

Hi Solrers,
I would like to get your opinion on how to best approach a search
requirement that I have. The scenario is I have a set of business
listings
that may be group into one parent business (such as 7-eleven having
several
locations). On the results page, I only want 7-eleven to show up once
but
also show how many locations matched the query (facet filtered by
state,
for
example) and maybe a preview of the some of the locations.

Searching for the business name is straightforward but the locations
within
the a result is quite tricky. I can do the opposite, searching for
the
locations and faceting on business names, but it will still basically
be
the
same thing and repeat results with th

Concept Expansion

2009-09-05 Thread Villemos, Gert
We would like to support concept expansion in searches, i.e. when a user 
searches for 'software' then the system should also search for keywords / 
phrases such as program 

 , computer 

 , system 

 , package 

  and class.
 
I imagine that the right way of doing this is a request handler, which expands 
a query into its conceptual similar entries and aggregates the results. A 
simple change in the filter from;
 
q:software => q:software OR program OR computer OR system OR package 
 
would most likely do the job.
 
Does such a request handler already exist (... looking at the list on the wiki 
and in the javadocs the answer seems to be no, but maybe its maintained 
externally)?
 
And is this the right way to go at all?
 
Thanks,
Gert.


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



AW: Concept Expansion

2009-09-05 Thread Villemos, Gert
[Sorry, post submitted as HTML. Proper format below;]
 
 
We would like to support concept expansion in searches, i.e. when a user 
searches for 'software' then the system should also search for keywords / 
phrases such as program, computer , system, package and class.

I imagine that the right way of doing this is a request handler, which expands 
a query into its conceptual similar entries and aggregates the results. A 
simple change in the filter from;

q:software => q:software OR program OR computer OR system OR package

would most likely do the job.

Does such a request handler already exist (... looking at the list on the wiki 
and in the javadocs the answer seems to be no, but maybe its maintained 
externally)?

And is this the right way to go at all?

Thanks,
Gert.

 



Von: Villemos, Gert [mailto:gert.ville...@logica.com]
Gesendet: Sa 05.09.2009 22:21
An: solr-user@lucene.apache.org
Betreff: Concept Expansion



We would like to support concept expansion in searches, i.e. when a user 
searches for 'software' then the system should also search for keywords / 
phrases such as program 

 , computer 

 , system 

 , package 

  and class.

I imagine that the right way of doing this is a request handler, which expands 
a query into its conceptual similar entries and aggregates the results. A 
simple change in the filter from;

q:software => q:software OR program OR computer OR system OR package

would most likely do the job.

Does such a request handler already exist (... looking at the list on the wiki 
and in the javadocs the answer seems to be no, but maybe its maintained 
externally)?

And is this the right way to go at all?

Thanks,
Gert.


Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.





Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: AW: Concept Expansion

2009-09-05 Thread Paul Libbrecht

Gert,

we're doing a similar process on i2geo search, including simple  
language expansion (one word is queried in several fields of each  
language), and, though I haven't made it yet but will soon, I've been  
suggested to do it as qparser plugin.


paul


Le 05-sept.-09 à 22:47, Villemos, Gert a écrit :


[Sorry, post submitted as HTML. Proper format below;]


We would like to support concept expansion in searches, i.e. when a  
user searches for 'software' then the system should also search for  
keywords / phrases such as program, computer , system, package and  
class.


I imagine that the right way of doing this is a request handler,  
which expands a query into its conceptual similar entries and  
aggregates the results. A simple change in the filter from;


q:software => q:software OR program OR computer OR system OR package

would most likely do the job.

Does such a request handler already exist (... looking at the list  
on the wiki and in the javadocs the answer seems to be no, but maybe  
its maintained externally)?


And is this the right way to go at all?

Thanks,
Gert.






smime.p7s
Description: S/MIME cryptographic signature


Re: Concept Expansion

2009-09-05 Thread Shalin Shekhar Mangar
On Sun, Sep 6, 2009 at 2:17 AM, Villemos, Gert wrote:

>
> We would like to support concept expansion in searches, i.e. when a user
> searches for 'software' then the system should also search for keywords /
> phrases such as program, computer , system, package and class.
>
> I imagine that the right way of doing this is a request handler, which
> expands a query into its conceptual similar entries and aggregates the
> results.
>

Have you looked at SynonymFilterFactory?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

-- 
Regards,
Shalin Shekhar Mangar.


AW: AW: Concept Expansion

2009-09-05 Thread Villemos, Gert
Paul,
 
Thanks for the answer. Documentation on QParserPlugins concepts seems to be 
limited (well, at least my search didnt find it and the java doc doesnt provide 
much of an explanation).
 
Do I understand the concepts / your suggestion correctly;
 
- The QParserPlugin is a factory for the actual QParser parser, i.e. based on 
the query string and other parameters a parser is instantiated and setup.
- As part of the construction the plugin parses the q string and extracts the 
parameters, ading them as TermQuery(s) to the parser.
- A 'concept expansion' extension could simply be a QParserPlugin 
specialization, which as part of the 'createParser' method expands the terms in 
the q string, i.e. 'replace' the input 'q=software' with 'q=software OR program 
OR computer OR system OR package'.
 
Cheers,
Gert.
 
 
 



Von: Paul Libbrecht [mailto:p...@activemath.org]
Gesendet: Sa 05.09.2009 23:03
An: solr-user@lucene.apache.org
Betreff: Re: AW: Concept Expansion



Gert,

we're doing a similar process on i2geo search, including simple 
language expansion (one word is queried in several fields of each 
language), and, though I haven't made it yet but will soon, I've been 
suggested to do it as qparser plugin.

paul


Le 05-sept.-09 à 22:47, Villemos, Gert a écrit :

> [Sorry, post submitted as HTML. Proper format below;]
>
>
> We would like to support concept expansion in searches, i.e. when a 
> user searches for 'software' then the system should also search for 
> keywords / phrases such as program, computer , system, package and 
> class.
>
> I imagine that the right way of doing this is a request handler, 
> which expands a query into its conceptual similar entries and 
> aggregates the results. A simple change in the filter from;
>
> q:software => 
>
> would most likely do the job.
>
> Does such a request handler already exist (... looking at the list 
> on the wiki and in the javadocs the answer seems to be no, but maybe 
> its maintained externally)?
>
> And is this the right way to go at all?
>
> Thanks,
> Gert.
>
>





Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: AW: AW: Concept Expansion

2009-09-05 Thread Paul Libbrecht


Le 05-sept.-09 à 23:26, Villemos, Gert a écrit :

- The QParserPlugin is a factory for the actual QParser parser, i.e.  
based on the query string and other parameters a parser is  
instantiated and setup.


right.

- As part of the construction the plugin parses the q string and  
extracts the parameters, ading them as TermQuery(s) to the parser.


I think that's correct.

- A 'concept expansion' extension could simply be a QParserPlugin  
specialization, which as part of the 'createParser' method expands  
the terms in the q string, i.e. 'replace' the input 'q=software'  
with 'q=software OR program OR computer OR system OR package'.


Exactly.
The fact that you can master all the query classes is good luxury  
also, e.g. to do fine-grained queries without being worried about  
escapes by using once again a query-parser down the chain.


paul

smime.p7s
Description: S/MIME cryptographic signature


AW: Concept Expansion

2009-09-05 Thread Villemos, Gert
Well, this is very interesting.
 
Looking at the documentation provided in the link it seems like the synonym 
definitions must be in a file. We would define the concept expansions in 
another format. My question is thus; Is it possible to perform a synonym 
replacement based on not the file but another mechanism?
 
I guess no. The answer would thus be to create new TokenFilters and 
coresponding factory, and implement it to access our format. Right?
 
Would there be a way to enable / disable the expasion filter at runtime, i.e. 
for example through special parameters in the query sring?
 
Cheers,
Gert.
 
 
 
 



Von: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Gesendet: Sa 05.09.2009 23:23
An: solr-user@lucene.apache.org
Betreff: Re: Concept Expansion



On Sun, Sep 6, 2009 at 2:17 AM, Villemos, Gert wrote:

>
> We would like to support concept expansion in searches, i.e. when a user
> searches for 'software' then the system should also search for keywords /
> phrases such as program, computer , system, package and class.
>
> I imagine that the right way of doing this is a request handler, which
> expands a query into its conceptual similar entries and aggregates the
> results.
>

Have you looked at SynonymFilterFactory?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46

--
Regards,
Shalin Shekhar Mangar.




Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: AW: Concept Expansion

2009-09-05 Thread Koji Sekiguchi

Villemos, Gert wrote:

Well, this is very interesting.
 
Looking at the documentation provided in the link it seems like the synonym definitions must be in a file. We would define the concept expansions in another format. My question is thus; Is it possible to perform a synonym replacement based on not the file but another mechanism?
 
I guess no. The answer would thus be to create new TokenFilters and coresponding factory, and implement it to access our format. Right?
  
I've never tried but I think you can implement TokenFilterFactory that 
accesses your format,

creates SynonymMap and passes it to SynonymFilter.

 
Would there be a way to enable / disable the expasion filter at runtime, i.e. for example through special parameters in the query sring?
 
  

No. SynonymFilter works for specific fields as you defined in schema.xml.

Koji




Scandinavia Apache Lucene/Solr September Meetup: 9 September

2009-09-05 Thread Erik Hatcher
Excuse the cross-posted announcement.  Next week we'll be having a  
Lucene/Solr meetup around the JavaZone conference in Oslo, Norway.


Before, during, AND after - now that's my kind of meetup!

Erik


Details here and below: 
http://www.meetup.com/Scandinavia-Apache-Lucene-Solr-Meetup/

September 9th, next Wednesday, 6pm
Radisson SAS Plaza Hotel
Sonja Henies Plass 3
Oslo

Presentations and discussions on Lucene/Solr, the Apache Open Source  
Search Engine/Platform, alongside JavaZone 2009 of Oslo, Scandinavia's  
biggest meeting place for software developers:


Agenda:


	• "Solr at the Speed of Light": Erik Hatcher, Lucene/Solr PMC Member  
and Committer, co-author of Lucene In Action, Lucid Imagination
	• "Migrating from commercial search engines to Solr",Tobias Larsson  
Hult and Eskil Andreen, Findwise SE
	• Presentations followed by Lightning Talks from community  
members:talks are 7-10 minute presentations, electronic, demo, or on  
whiteboard; sign-ups at the event.

We'll have beer and food and socializing before, during and after. 

Re: Field Collapsing (was Re: Schema for group/child entity setup)

2009-09-05 Thread R. Tan
Great. Nice site and very similar to my requirements.

> There's work on the patch that is being done now which will enable you to
> ask for specific field values of the collapsed documents using a dedicated
> request parameter.


So, right now, you get all field values by default?


On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness  wrote:

> You can check out http://www.ilocal.nl. If you search for a bank in
> Amsterdam then you'll see that a lot of the results are collapsed. For this
> we used an older version of this patch (which works on 1.3) but a lot has
> changed since then. We're currently using this patch on another project, but
> it's not live yet.
>
>
> Uri
>
> R. Tan wrote:
>
>> Thanks Uri. Your personal suggestion is appreciated and I think I'll
>> follow
>> your advice. We're still early in development and 1.4 would be a good
>> choice. I hope I can get field collapsing to work with my requirements. Do
>> you know any live site using field collapsing already?
>>
>> On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness  wrote:
>>
>>
>>
>>> There's work on the patch that is being done now which will enable you to
>>> ask for specific field values of the collapsed documents using a
>>> dedicated
>>> request parameter. This work is not committed yet to the latest patch,
>>> but
>>> will be very soon. There is of course a drawback to that as well, the
>>> collapsed documents set can be very large (depends on your data of
>>> course)
>>> in which case the returned result which includes the fields values can be
>>> rather large, which will impact performance, this is why this feature
>>> will
>>> be enabled only if you specify this extra parameter - by default no field
>>> values will be returned.
>>>
>>> AFAIK, the latest patch should work fine with the latest build. Martijn
>>> (which is the main maintainer of this patch) tries to keep it up to date
>>> with the latest builds. But I guess the safest way is to work with the
>>> nightly build of the same date as the latest patch (though I would give
>>> it a
>>> try first with the latest build).
>>>
>>> BTW, it's not an official suggestion from the Solr development team, but
>>> if
>>> you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I
>>> would
>>> go for the later. 1.4 is supposed to be released in the upcoming week or
>>> two
>>> and it bring loads of bug fixes, enhancements and extra functionality.
>>> But
>>> again, this is my personal suggestion.
>>>
>>>
>>> cheers,
>>> Uri
>>>
>>> R. Tan wrote:
>>>
>>>
>>>
 Okay. Thanks for giving an insight on how it works in general. Without
 trying it myself, are the field values for the collapsed ones also part
 of
 the results data?
 What is the latest build that is safe to use on a production
 environment?
 I'd probably go for that and use field collapsing.

 Thank you very much.


 On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness  wrote:





> The collapsed documents are represented by one "master" document which
> can
> be part of the normal search result (the doc list), so pagination just
> works
> as expected, meaning taking only the returned documents in account
> (ignoring
> the collapsed ones). As for the scoring, the "master" document is
> actually
> the document with the highest score in the collapsed group.
>
> As for Solr 1.3 compatibility... well... it's very hart to tell. All
> latest
> patch are certainly *not* 1.3 compatible (I think they're also
> depending
> on
> some changes in lucene which are not available for solr 1.3). I guess
> you'll
> have to try some of the old patches, but I'm not sure about their
> stability.
>
> cheers,
> Uri
>
>
> R. Tan wrote:
>
>
>
>
>
>> Thanks Uri. How does paging and scoring work when using field
>> collapsing?
>> What patch works with 1.3? Is it production ready?
>>
>> R
>>
>>
>> On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness  wrote:
>>
>>
>>
>>
>>
>>
>>
>>> The development on this patch is quite active. It works well for
>>> single
>>> solr instance, but distributed search (ie. shards) is not yet
>>> supported.
>>> Using this page you can group search results based on a specific
>>> field.
>>> There are two flavors of field collapsing - adjacent and
>>> non-adjacent,
>>> the
>>> former collapses only document which happen to be located next to
>>> each
>>> other
>>> in the otherwise-non-collapsed results set. The later (the
>>> non-adjacent)
>>> one
>>> collapses all documents with the same field value (regardless of
>>> their
>>> position in the otherwise-non-collapsed results set). Note, that
>>> non-adjacent performs better than adjacent one. There's currently
>>> discussion
>>> to extend this support so in addition to collapsi

Re: Solr, JNDI config, dataDir, and solr home problem

2009-09-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
I have raised an issue https://issues.apache.org/jira/browse/SOLR-1414

On Sun, Sep 6, 2009 at 12:32 AM, Archon810 wrote:
>
> Yeah, I'm using single core (solrconfig.xml).
>
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> ideally you should be able to us the variable
>>
>> ${solr.core.instanceDir}
>>
>> but as I checked the code that is not being set for a single core
>> deployment. are you using single core?
>>
>> On Fri, Sep 4, 2009 at 9:16 PM, Archon810 wrote:
>>>
>>> OK, so I can't access it by ${solr.home}, but is there a way to access
>>> it?
>>> After all, it's a variable defined in JNDI, shouldn't there be a way to
>>> refer to it?
>>>
>>> Also, what about the INFO message that says it can't find /solr/home,
>>> while
>>> the instructions refer to solr/home ?
>>>
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 ${solr.home} is used for documentation purpose. It is not set as a
 variable.

 On Fri, Sep 4, 2009 at 3:58 PM, Archon810 wrote:
>
> I saw it being used in the default solrconfig.xml in this phrase:
> If you wish to hide files under ${solr.home}/conf, explicitly register
> the
> ShowFileRequestHandler using...
>
> It was only natural to assume it would work for something as trivial as
> dataDir.
>
> So, there's no way to refer to the solr/home value defined in JNDI?
>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> it is nowhere mentioned that you can use a variable ${solr.home} in
>> your solrconfig.xml. There is a bug related to this issue
>> https://issues.apache.org/jira/browse/SOLR-1267
>>
>> On Fri, Sep 4, 2009 at 5:47 AM, Archon810 wrote:
>>>
>>> Here's my problem.
>>>
>>> I'm trying to follow a multi Solr setup, straight from the Solr wiki
>>> -
>>> http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac.
>>>
>>> Here's the relevant code:
>>> >> crossContext="true"
>>> >
>>>   >> value="/some/path/solr1home" override="true" />
>>> 
>>>
>>> Now I want to set the Solr  in solrconfig.xml,
>>> relative
>>> to
>>> the solr home property. The instructions
>>> http://wiki.apache.org/solr/SolrConfigXml#head-e8fbf2d748d90c5900aac712d0e3385ced5bd128
>>> say  is used to specify an alternate directory to hold
>>> all
>>> index data other than the default ./data under the Solr home. If
>>> replication
>>> is in use, this should match the replication configuration. If this
>>> directory is not absolute, then it is relative to the current working
>>> directory of the servlet container.
>>>
>>> However, no matter how I try to set the dataDir property, solr home
>>> is
>>> not
>>> being found. For example,
>>>  ${solr.home}/data
>>>
>>> What's even more confusing are these INFO notices in the log:
>>> INFO: No /solr/home in JNDI
>>> Sep 3, 2009 4:33:26 PM org.apache.solr.core.SolrResourceLoader
>>> locateSolrHome
>>> INFO: solr home defaulted to 'solr/' (could not find system property
>>> or
>>> JNDI)
>>>
>>> The JNDI instructions instruct to specify "solr/home", the log
>>> complains
>>> about "/solr/home" (extra slash), the solrconfig.xml file seems to
>>> expect
>>> ${solr.home} - how more confusing can it get?
>>>
>>> This person is having the same issue:
>>> http://mysolr.com/tips/setting-solr-home-solrhome-in-jndi-on-tomcat-55/
>>>
>>> So, how does one refer to solr home from solrconfig.xml in a JNDI
>>> configuration scenario? Also, is there a way to debug/see variables
>>> that
>>> are
>>> defined in a specific context, such as solrconfig.xml? I feel like
>>> I'm
>>> completely blind here.
>>>
>>> Thank you!
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25286277.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25292025.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25296862.html
>>> Sent from the Solr - User mailing