date:20080506

Re: Your valuable suggestion on autocomplete

2008-05-06 Thread Vaijanath N. Rao


Hi Rantjil Bould,

I would suggest you to give a thought on Trie data structure which is 
used for auto-complete.  Hitting Solr for every prefix looks time 
consuming job, but I might be wrong. I have Trie implementation and it 
works very fast (of course it is in memory data structure unlike solr 
index which lies on disk)


--Thanks and Regards
Vaijanath



Rantjil Bould wrote:

Hi Group,
 I have already got some valuable suggestions from group. Based
on that, I have come out with following process to finally implement
autocomplete like fetaure in my system
1- Index the whole documents
2- Extract all terms using indexReader's terms() method

I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
to get absolute terms i.e. vlanand. The field definition in solr is


  






  
  







  


Would appreciate your input to get absolute terms??

3- For each term, extract documents containing those term using termDocs()
method
4- Create one more index with fields, term, frequency and docNo. This index
would be used for autocomplete feature.
5- Any letter typed by user in search field, use Ajax script (like
scriptaculous or JQuery) to extract all terms using prefix query.
6- Based on search term selected by user, keep track of document nos in
which this term belongs.
7- For next search term selection using documents nos to select all terms
excluding currently selected term.

This somehow works. As new to SOlr ans also to Lucene, I would like to know
in case it can be improved?

- RB

Re: Your valuable suggestion on autocomplete

2008-05-06 Thread Nishant Soni

Just FYI, we have also implemented a Trie approach (outside of solr, even 
though our mail search uses solr) at the link in the signature.

You can try out the auto-completion working on the comparison tool on the home 
page.

- nishant

www.reviewgist.com


 


- Original Message 
From: Vaijanath N. Rao <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 6, 2008 12:43:25 PM
Subject: Re: Your valuable suggestion on autocomplete

Hi Rantjil Bould,

I would suggest you to give a thought on Trie data structure which is 
used for auto-complete.  Hitting Solr for every prefix looks time 
consuming job, but I might be wrong. I have Trie implementation and it 
works very fast (of course it is in memory data structure unlike solr 
index which lies on disk)

--Thanks and Regards
Vaijanath



Rantjil Bould wrote:
> Hi Group,
>  I have already got some valuable suggestions from group. Based
> on that, I have come out with following process to finally implement
> autocomplete like fetaure in my system
> 1- Index the whole documents
> 2- Extract all terms using indexReader's terms() method
>
> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
> to get absolute terms i.e. vlanand. The field definition in solr is
>
> 
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true">
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0">
> 
>  protected="protwords.txt">
> 
>   
>   
> 
>  ignoreCase="true" expand="true">
>  words="stopwords.txt">
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1">
> 
>  protected="protwords.txt">
> 
>   
> 
>
> Would appreciate your input to get absolute terms??
>
> 3- For each term, extract documents containing those term using termDocs()
> method
> 4- Create one more index with fields, term, frequency and docNo. This index
> would be used for autocomplete feature.
> 5- Any letter typed by user in search field, use Ajax script (like
> scriptaculous or JQuery) to extract all terms using prefix query.
> 6- Based on search term selected by user, keep track of document nos in
> which this term belongs.
> 7- For next search term selection using documents nos to select all terms
> excluding currently selected term.
>
> This somehow works. As new to SOlr ans also to Lucene, I would like to know
> in case it can be improved?
>
> - RB
>
>  


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

RE: Delete's increase while adding new documents

2008-05-06 Thread Tim Mahy

Hi all,

it seems that we get errors during the auto-commit :


java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/index/_4x.fnm 
(No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:501)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:526)

the _4x.fnm file is not on the file system. When we switch from autocommit to 
manual commits throughout xml messages we get the same kind of errors.
Any idea what could be wrong in our configuration to cause these exceptions ?

Greetings,
Tim

Van: Tim Mahy [EMAIL PROTECTED]
Verzonden: maandag 28 april 2008 12:11
Aan: solr-user@lucene.apache.org
Onderwerp: RE: Delete's increase while adding new documents

Hi all,

thank you for your reply. The id's that we send are unique, so we still have no 
clue what is happening :)

greetings,
Tim

-Oorspronkelijk bericht-
Van: Mike Klaas [mailto:[EMAIL PROTECTED]
Verzonden: za 26-4-2008 1:52
Aan: solr-user@lucene.apache.org
Onderwerp: Re: Delete's increase while adding new documents

On 25-Apr-08, at 4:27 AM, Tim Mahy wrote:
>
> Hi all,
>
> we send xml add document messages to Solr and we notice something
> very strange.
> We autocommit at 10 documents, starting from a total clean index
> (removed the data folder), when we start uploading we notice that
> the docsPending is going up but also that the deletesPending is
> going up very fast. After reaching the first 10 we queried to
> solr to return everything and the total results count was not 10
> but somewhere around 77000 which is exactly 10 - docsDeleted
> from the stats page.
>
> We used that Solr instance before, so my question is : is it
> possible that Solr remembers the unique identities somewhere else as
> in the data folder ? Btw we stopped Solr, removed the data folder
> and restarted Solr and than this behavior began...

Are you sure that all the documents you added were unique?

(btw, deletePending doesn't necessarily mean that an old version of
the doc was in the index, I think).

-Mike



Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie 
http://www.infosupport.nl/disclaimer

[cid:60561037AEC348669B4CF2083E6168F4]

Re: Help optimizing

2008-05-06 Thread Grant Ingersoll



On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:


Hi (again) people

We've now invested in a server with 8 GB of RAM after too many  
OutOfMemory-errors.


Our database/index is 3.5 GB and contains 4,352,471 documents. Most  
documents are less than 1 kb. When performing a search, the results  
vary between 1.5 seconds up to 60 seconds.


I don't have a big problem with 1.5 seconds (even though below 1  
would be nice), but 60 seconds it just.. well, scary.


Is this pure Solr time or overall application time?  I ask, b/c it is  
often the case that people are measuring application time and the  
problem lies in the application, so I just want to clarify.


Also, have you done any profiling to see where the hotspots are?

-Grant

[poll] Change logging to SLF4J?

2008-05-06 Thread Ryan McKinley

Hello-

There has been a long running thread on solr-dev proposing switching
the logging system to use something other then JDK logging.
http://www.nabble.com/Solr-Logging-td16836646.html
http://www.nabble.com/logging-through-log4j-td13747253.html

We are considering using http://www.slf4j.org/.  Check:
https://issues.apache.org/jira/browse/SOLR-560

The "pro" argument is that:
* SLFJ allows more flexibility for people using solr outside the
canned .war to configure logging without touching JDK logging.

The "con" argument goes something like:
* JDK logging is already is the standard logging framework.
* JDK logging is already in in use.
* SLF4J adds another dependency (for something that already works)

On the dev lists there are a strong opinions on either side, but we
would like to get a larger sampling of option and validation before
making this change.

[  ] Keep solr logging as it is.  (JDK Logging)
[  ] Use SLF4J.

As an bonus question (this time fill in the blank):
I have tried SOLR-560 with my logging system and ___.

thanks
ryan

Re: Your valuable suggestion on autocomplete

2008-05-06 Thread Walter Underwood

I wrote a prefix map (ternary search tree) in Java and load it with
queries to Solr every two hours. That keeps the autocomplete and
search index in sync.

Our autocomplete gets over 25M hits per day, so we don't really
want to send all that traffic to Solr.

wunder

On 5/6/08 2:37 AM, "Nishant Soni" <[EMAIL PROTECTED]> wrote:

> Just FYI, we have also implemented a Trie approach (outside of solr, even
> though our mail search uses solr) at the link in the signature.
> 
> You can try out the auto-completion working on the comparison tool on the home
> page.
> 
> - nishant
> 
> www.reviewgist.com
> 
> 
>  
> 
> 
> - Original Message 
> From: Vaijanath N. Rao <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 12:43:25 PM
> Subject: Re: Your valuable suggestion on autocomplete
> 
> Hi Rantjil Bould,
> 
> I would suggest you to give a thought on Trie data structure which is
> used for auto-complete.  Hitting Solr for every prefix looks time
> consuming job, but I might be wrong. I have Trie implementation and it
> works very fast (of course it is in memory data structure unlike solr
> index which lies on disk)
> 
> --Thanks and Regards
> Vaijanath
> 
> 
> 
> Rantjil Bould wrote:
>> Hi Group,
>>  I have already got some valuable suggestions from group. Based
>> on that, I have come out with following process to finally implement
>> autocomplete like fetaure in my system
>> 1- Index the whole documents
>> 2- Extract all terms using indexReader's terms() method
>> 
>> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
>> to get absolute terms i.e. vlanand. The field definition in solr is
>> 
>> 
>>   
>> 
>> > words="stopwords.txt" enablePositionIncrements="true">
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0">
>> 
>> > protected="protwords.txt">
>> 
>>   
>>   
>> 
>> > ignoreCase="true" expand="true">
>> > words="stopwords.txt">
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1">
>> 
>> > protected="protwords.txt">
>> 
>>   
>> 
>> 
>> Would appreciate your input to get absolute terms??
>> 
>> 3- For each term, extract documents containing those term using termDocs()
>> method
>> 4- Create one more index with fields, term, frequency and docNo. This index
>> would be used for autocomplete feature.
>> 5- Any letter typed by user in search field, use Ajax script (like
>> scriptaculous or JQuery) to extract all terms using prefix query.
>> 6- Based on search term selected by user, keep track of document nos in
>> which this term belongs.
>> 7- For next search term selection using documents nos to select all terms
>> excluding currently selected term.
>> 
>> This somehow works. As new to SOlr ans also to Lucene, I would like to know
>> in case it can be improved?
>> 
>> - RB
>> 
>>  
> 
> 
>   
> __
> __
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: multi-language searching with Solr

2008-05-06 Thread Eli K

Peter,

Thanks for your help, I will prototype your solution and see if it
makes sense for me.

Eli

On Mon, May 5, 2008 at 5:38 PM, Binkley, Peter
<[EMAIL PROTECTED]> wrote:
> It won't make much difference to the index size, since you'll only be
>  populating one of the language fields for each document, and empty
>  fields cost nothing. The performance may suffer a bit but Lucene may
>  surprise you with how good it is with that kind of boolean query.
>
>  I agree that as the number of fields and languages increases, this is
>  going to become a lot to manage. But you're up against some basic
>  problems when you try to model this in Solr: for each token, you care
>  about not just its value (which is all Lucene cares about) but also its
>  language and its stem; and the stem for a given token depends on the
>  language (different stemming rules); and at query time you may not know
>  the language. I don't think you're going to get a solution without some
>  redundancy; but solving problems by adding redundant fields is a common
>  method in Solr.
>
>
>  Peter
>
>
>  -Original Message-
>  From: Eli K [mailto:[EMAIL PROTECTED]
>
> Sent: Monday, May 05, 2008 2:28 PM
>  To: solr-user@lucene.apache.org
>
>
> Subject: Re: multi-language searching with Solr
>
>  Wouldn't this impact both indexing and search performance and the size
>  of the index?
>  It is also probable that I will have more then one free text fields
>  later on and with at least 20 languages this approach does not seem very
>  manageable.  Are there other options for making this work with stemming?
>
>  Thanks,
>
>  Eli
>
>
>  On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
>  <[EMAIL PROTECTED]> wrote:
>  > I think you would have to declare a separate field for each language
>  > (freetext_en, freetext_fr, etc.), each with its own appropriate
>  > stemming. Your ingestion process would have to assign the free text
>  > content for each document to the appropriate field; so, for each
>  > document, only one of the freetext fields would be populated. At
>  > search  time, you would either search against the appropriate field if
>
>  > you know  the search language, or search across them with
>  > "freetext_fr:query OR  freetext_en:query OR ...". That way your query
>  > will be interpreted by  each language field using that language's
>  stemming rules.
>  >
>  >  Other options for combining indexes, such as copyfield or dynamic
>  > fields  (see http://wiki.apache.org/solr/SchemaXml), would lead to a
>  > single  field type and therefore a single type of stemming. You could
>  > always use  copyfield to create an unstemmed common index, if you
>  > don't care about  stemming when you search across languages (since
>  > you're likely to get  odd results when a query in one language is
>  > stemmed according to the  rules of another language).
>  >
>  >  Peter
>  >
>  >
>  >
>  >  -Original Message-
>  >  From: Eli K [mailto:[EMAIL PROTECTED]
>  >  Sent: Monday, May 05, 2008 8:27 AM
>  >  To: solr-user@lucene.apache.org
>  >  Subject: multi-language searching with Solr
>  >
>  >  Hello folks,
>  >
>  >  Let me start by saying that I am new to Lucene and Solr.
>  >
>  >  I am in the process of designing a search back-end for a system that
>
>  > receives 20k documents a day and needs to keep them available for 30
>  > days.  The documents should be searchable on a free text field and on
>
>  > about 8 other fields.
>  >
>  >  One of my requirements is to index and search documents in multiple
>  > languages.  I would like to have the ability to stem and provide the
>  > advanced search features that are based on it.  This will only affect
>
>  > the free text field because the rest of the fields are in English.
>  >
>  >  I can find out the language of the document before indexing and I
>  > might  be able to provide the language to search on.  I also need to
>  > have the  ability to search across all indexed languages (there will
>  > be 20 in  total).
>  >
>  >  Given these requirements do you think this is doable with Solr?  A
>  > major  limiting factor is that I need to stick to the 1.2 GA version
>  > and I  cannot utilize the multi-core features in the 1.3 trunk.
>  >
>  >  I considered writing my own analyzer that will call the appropriate
>  > Lucene analyzer for the given language but I did not see any way for
>  > it  to access the field that specifies the language of the document.
>  >
>  >  Thanks,
>  >
>  >  Eli
>  >
>  >  p.s. I am looking for an experienced Lucene/Solr consultant to help
>  > with  the design of this system.
>  >
>  >
>
>

Re: Your valuable suggestion on autocomplete

2008-05-06 Thread Otis Gospodnetic

Hi Wunder,

- Original Message 
> From: Walter Underwood <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 11:21:31 AM
> Subject: Re: Your valuable suggestion on autocomplete
> 
> I wrote a prefix map (ternary search tree) in Java and load it with
> queries to Solr every two hours. That keeps the autocomplete and
> search index in sync.

What do you mean by the two staying in sync?  If you fill the TST with info 
from query logs, how does that make it stay in sync with the index?  Or do you 
mean you look for queries with >N hits (maybe even N=1) and only feed those 
into TST, thus ensuring autocomplete always suggests queries that yield hits?

Thanks,
Otis

> Our autocomplete gets over 25M hits per day, so we don't really
> want to send all that traffic to Solr.
> 
> wunder
> 
> On 5/6/08 2:37 AM, "Nishant Soni"  wrote:
> 
> > Just FYI, we have also implemented a Trie approach (outside of solr, even
> > though our mail search uses solr) at the link in the signature.
> > 
> > You can try out the auto-completion working on the comparison tool on the 
> > home
> > page.
> > 
> > - nishant
> > 
> > www.reviewgist.com
> > 
> > 
> >  
> > 
> > 
> > - Original Message 
> > From: Vaijanath N. Rao 
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, May 6, 2008 12:43:25 PM
> > Subject: Re: Your valuable suggestion on autocomplete
> > 
> > Hi Rantjil Bould,
> > 
> > I would suggest you to give a thought on Trie data structure which is
> > used for auto-complete.  Hitting Solr for every prefix looks time
> > consuming job, but I might be wrong. I have Trie implementation and it
> > works very fast (of course it is in memory data structure unlike solr
> > index which lies on disk)
> > 
> > --Thanks and Regards
> > Vaijanath
> > 
> > 
> > 
> > Rantjil Bould wrote:
> >> Hi Group,
> >>  I have already got some valuable suggestions from group. Based
> >> on that, I have come out with following process to finally implement
> >> autocomplete like fetaure in my system
> >> 1- Index the whole documents
> >> 2- Extract all terms using indexReader's terms() method
> >> 
> >> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
> >> to get absolute terms i.e. vlanand. The field definition in solr is
> >> 
> >> 
> >>   
> >> 
> >> 
> >> words="stopwords.txt" enablePositionIncrements="true">
> >> 
> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0">
> >> 
> >> 
> >> protected="protwords.txt">
> >> 
> >>   
> >>   
> >> 
> >> 
> >> ignoreCase="true" expand="true">
> >> 
> >> words="stopwords.txt">
> >> 
> >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1">
> >> 
> >> 
> >> protected="protwords.txt">
> >> 
> >>   
> >> 
> >> 
> >> Would appreciate your input to get absolute terms??
> >> 
> >> 3- For each term, extract documents containing those term using termDocs()
> >> method
> >> 4- Create one more index with fields, term, frequency and docNo. This index
> >> would be used for autocomplete feature.
> >> 5- Any letter typed by user in search field, use Ajax script (like
> >> scriptaculous or JQuery) to extract all terms using prefix query.
> >> 6- Based on search term selected by user, keep track of document nos in
> >> which this term belongs.
> >> 7- For next search term selection using documents nos to select all terms
> >> excluding currently selected term.
> >> 
> >> This somehow works. As new to SOlr ans also to Lucene, I would like to know
> >> in case it can be improved?
> >> 
> >> - RB
> >> 
> >>  
> > 
> > 
> >   
> > __
> > __
> > Be a better friend, newshound, and
> > know-it-all with Yahoo! Mobile.  Try it now.
> > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
>

Welcome, Koji

2008-05-06 Thread Erik Hatcher

A warm welcome to our newest Solr committer, Koji Sekiguchi!  He's  
been providing solid patches and improvements to Solr and the Ruby  
(solr-ruby/Flare) integration for a while now.


Erik

RE: Help optimizing

2008-05-06 Thread Lance Norskog

One cause of out-of-memory is multiple simultaneous requests. If you limit
the query stream to one or two simultaneous requests, you might fix this.
No, Solr does not have an option for this. The servlet containers have
controls for this that you have to dig very deep to find.

Lance Norskog 

-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 06, 2008 5:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Help optimizing

On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:

> Hi (again) people
>
> We've now invested in a server with 8 GB of RAM after too many 
> OutOfMemory-errors.
>
> Our database/index is 3.5 GB and contains 4,352,471 documents. Most 
> documents are less than 1 kb. When performing a search, the results 
> vary between 1.5 seconds up to 60 seconds.
>
> I don't have a big problem with 1.5 seconds (even though below 1 would 
> be nice), but 60 seconds it just.. well, scary.

Is this pure Solr time or overall application time?  I ask, b/c it is often
the case that people are measuring application time and the problem lies in
the application, so I just want to clarify.

Also, have you done any profiling to see where the hotspots are?

-Grant

Re: Multiple SpellCheckRequestHandlers

2008-05-06 Thread solr_user


And how do I specify in the query which requesthandler to use?



Otis Gospodnetic wrote:
> 
> Yes, just define two instances (with two distinct names) in solrconfig.xml
> and point each of them to a different index.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
>> From: solr_user <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, May 6, 2008 12:16:07 AM
>> Subject: Multiple SpellCheckRequestHandlers
>> 
>> 
>> Hi all,
>> 
>>   Is it possible in Solr to have multiple SpellCheckRequestHandlers.  In
>> my
>> application I have got two different spell check indexes.  I want the
>> spell
>> checker to check for a spelling suggestion in the first index and if it
>> fails to get any suggestion from the first index only then it should try
>> to
>> get a suggestion from the second index.  
>>   
>>   Is it possible to have a separate SpellCheckRequestHandler one for each
>> index?
>> 
>> Solr-User
>> 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete's increase while adding new documents

2008-05-06 Thread Mike Klaas


On 6-May-08, at 4:56 AM, Tim Mahy wrote:


Hi all,

it seems that we get errors during the auto-commit :


java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/ 
index/_4x.fnm (No such file or directory)

   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile. 
(RandomAccessFile.java:212)
   at org.apache.lucene.store.FSDirectory$FSIndexInput 
$Descriptor.(FSDirectory.java:501)
   at org.apache.lucene.store.FSDirectory 
$FSIndexInput.(FSDirectory.java:526)


the _4x.fnm file is not on the file system. When we switch from  
autocommit to manual commits throughout xml messages we get the same  
kind of errors.
Any idea what could be wrong in our configuration to cause these  
exceptions ?


I have only heard of that error appearing in two cases.  Either the  
index is corrupt, or something else deleted the file.  Are you sure  
that there is only one Solr instance that accesses the directory, and  
that nothing else ever touches it?


Can you reproduce the deletion issue with a small number of documents  
(something that could be tested by one of us)?


-Mike

RE: multi-language searching with Solr

2008-05-06 Thread Tim Mahy

Hi,

you could also use multiple Solr instances having specific settings and 
stopwords etc for the same field and upload your documents to the correct 
instance

and than merge the indexes to one searchable index ...

greetings,
Tim

Van: Eli K [EMAIL PROTECTED]
Verzonden: dinsdag 6 mei 2008 18:26
Aan: solr-user@lucene.apache.org
Onderwerp: Re: multi-language searching with Solr

Peter,

Thanks for your help, I will prototype your solution and see if it
makes sense for me.

Eli

On Mon, May 5, 2008 at 5:38 PM, Binkley, Peter
<[EMAIL PROTECTED]> wrote:
> It won't make much difference to the index size, since you'll only be
>  populating one of the language fields for each document, and empty
>  fields cost nothing. The performance may suffer a bit but Lucene may
>  surprise you with how good it is with that kind of boolean query.
>
>  I agree that as the number of fields and languages increases, this is
>  going to become a lot to manage. But you're up against some basic
>  problems when you try to model this in Solr: for each token, you care
>  about not just its value (which is all Lucene cares about) but also its
>  language and its stem; and the stem for a given token depends on the
>  language (different stemming rules); and at query time you may not know
>  the language. I don't think you're going to get a solution without some
>  redundancy; but solving problems by adding redundant fields is a common
>  method in Solr.
>
>
>  Peter
>
>
>  -Original Message-
>  From: Eli K [mailto:[EMAIL PROTECTED]
>
> Sent: Monday, May 05, 2008 2:28 PM
>  To: solr-user@lucene.apache.org
>
>
> Subject: Re: multi-language searching with Solr
>
>  Wouldn't this impact both indexing and search performance and the size
>  of the index?
>  It is also probable that I will have more then one free text fields
>  later on and with at least 20 languages this approach does not seem very
>  manageable.  Are there other options for making this work with stemming?
>
>  Thanks,
>
>  Eli
>
>
>  On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter
>  <[EMAIL PROTECTED]> wrote:
>  > I think you would have to declare a separate field for each language
>  > (freetext_en, freetext_fr, etc.), each with its own appropriate
>  > stemming. Your ingestion process would have to assign the free text
>  > content for each document to the appropriate field; so, for each
>  > document, only one of the freetext fields would be populated. At
>  > search  time, you would either search against the appropriate field if
>
>  > you know  the search language, or search across them with
>  > "freetext_fr:query OR  freetext_en:query OR ...". That way your query
>  > will be interpreted by  each language field using that language's
>  stemming rules.
>  >
>  >  Other options for combining indexes, such as copyfield or dynamic
>  > fields  (see http://wiki.apache.org/solr/SchemaXml), would lead to a
>  > single  field type and therefore a single type of stemming. You could
>  > always use  copyfield to create an unstemmed common index, if you
>  > don't care about  stemming when you search across languages (since
>  > you're likely to get  odd results when a query in one language is
>  > stemmed according to the  rules of another language).
>  >
>  >  Peter
>  >
>  >
>  >
>  >  -Original Message-
>  >  From: Eli K [mailto:[EMAIL PROTECTED]
>  >  Sent: Monday, May 05, 2008 8:27 AM
>  >  To: solr-user@lucene.apache.org
>  >  Subject: multi-language searching with Solr
>  >
>  >  Hello folks,
>  >
>  >  Let me start by saying that I am new to Lucene and Solr.
>  >
>  >  I am in the process of designing a search back-end for a system that
>
>  > receives 20k documents a day and needs to keep them available for 30
>  > days.  The documents should be searchable on a free text field and on
>
>  > about 8 other fields.
>  >
>  >  One of my requirements is to index and search documents in multiple
>  > languages.  I would like to have the ability to stem and provide the
>  > advanced search features that are based on it.  This will only affect
>
>  > the free text field because the rest of the fields are in English.
>  >
>  >  I can find out the language of the document before indexing and I
>  > might  be able to provide the language to search on.  I also need to
>  > have the  ability to search across all indexed languages (there will
>  > be 20 in  total).
>  >
>  >  Given these requirements do you think this is doable with Solr?  A
>  > major  limiting factor is that I need to stick to the 1.2 GA version
>  > and I  cannot utilize the multi-core features in the 1.3 trunk.
>  >
>  >  I considered writing my own analyzer that will call the appropriate
>  > Lucene analyzer for the given language but I did not see any way for
>  > it  to access the field that specifies the language of the document.
>  >
>  >  Thanks,
>  >
>  >  Eli
>  >
>  >  p.s. I am looking for an experienced Lucene/Solr consultant to help
>  >

Composition of multiple smaller fields into another larger field?

2008-05-06 Thread Brian Johnson

I am interested in using the suggest feature against a composition of other 
more granular facets. Let me provide an example to help explain my problem and 
proposed approaches.

Say I have a set of facets for these artifacts:
   
   
   

So far things work OK. Now I want my suggest feature to work on a composition 
equivalent to 

{city}, {state} {zipcode}

I have these fields defined per the suggestions on adding suggest capabilities. 
I'm experimenting so I am trying both options.

   
   

I would like to 'compose' the value for these 2 suggest fields based on the 
existing 'atomic' fields. The copyField feature doesn't get me the whole way 
there but I am interested in a similar mechanism.

1) Is there an existing feature, approach, mechanism, ... to get this done that 
I'm just not aware of?

2) Assuming that #1 is 'no', then would this be a generally useful feature to 
add in? If so how would people like this to be done? 

Obviously I can push this down into the document preparation myself outside of 
Solr. I would prefer to have a mechanism to handle this in the schema.xml since 
I don't want to do any real manipulation/transformation of the data elements at 
this point. Here was an initial thought on what it might look like...

Here source is formatted similar to
java.text.MessageFormat but with named rather than indexed
substitutions so that.


Here source is formatted similar to Velocity templates.


I am not interested in creating a new template language or pulling in a new 
dependency to get this done though (velocity, freemarker, ...) per se. I just 
want to do some simple composition. If folks think this is a good idea though, 
it could be setup like this instead.



template_filename.vm file contains the following line
$city, $state $zipcode

Any feedback would be appreciated.

Thanks,

Brian

Re: Multiple SpellCheckRequestHandlers

2008-05-06 Thread Otis Gospodnetic

Hello,

If you configured "/sc1" and "/sc2", then use something like 
http://../sc1?. for the first one and http://./sc2? for the 
second one.

Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: solr_user <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 1:57:17 PM
> Subject: Re: Multiple SpellCheckRequestHandlers
> 
> 
> And how do I specify in the query which requesthandler to use?
> 
> 
> 
> Otis Gospodnetic wrote:
> > 
> > Yes, just define two instances (with two distinct names) in solrconfig.xml
> > and point each of them to a different index.
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > - Original Message 
> >> From: solr_user 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, May 6, 2008 12:16:07 AM
> >> Subject: Multiple SpellCheckRequestHandlers
> >> 
> >> 
> >> Hi all,
> >> 
> >>   Is it possible in Solr to have multiple SpellCheckRequestHandlers.  In
> >> my
> >> application I have got two different spell check indexes.  I want the
> >> spell
> >> checker to check for a spelling suggestion in the first index and if it
> >> fails to get any suggestion from the first index only then it should try
> >> to
> >> get a suggestion from the second index.  
> >>   
> >>   Is it possible to have a separate SpellCheckRequestHandler one for each
> >> index?
> >> 
> >> Solr-User
> >> 
> >> 
> >> -- 
> >> View this message in context: 
> >> 
> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> 
> >> 
> > 
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Re: Help optimizing

2008-05-06 Thread Otis Gospodnetic

Hello,

If you are using Jetty, you don't have to dig very deep - just look for the 
section about threads.  Here is a snippet from Jetty 6.1.9's jetty.xml:



  
  
10
50
25
  

  

  



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Lance Norskog <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Cc: "Norskog, Lance" <[EMAIL PROTECTED]>
> Sent: Tuesday, May 6, 2008 1:26:28 PM
> Subject: RE: Help optimizing
> 
> One cause of out-of-memory is multiple simultaneous requests. If you limit
> the query stream to one or two simultaneous requests, you might fix this.
> No, Solr does not have an option for this. The servlet containers have
> controls for this that you have to dig very deep to find.
> 
> Lance Norskog 
> 
> -Original Message-
> From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 06, 2008 5:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Help optimizing
> 
> 
> On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:
> 
> > Hi (again) people
> >
> > We've now invested in a server with 8 GB of RAM after too many 
> > OutOfMemory-errors.
> >
> > Our database/index is 3.5 GB and contains 4,352,471 documents. Most 
> > documents are less than 1 kb. When performing a search, the results 
> > vary between 1.5 seconds up to 60 seconds.
> >
> > I don't have a big problem with 1.5 seconds (even though below 1 would 
> > be nice), but 60 seconds it just.. well, scary.
> 
> Is this pure Solr time or overall application time?  I ask, b/c it is often
> the case that people are measuring application time and the problem lies in
> the application, so I just want to clarify.
> 
> Also, have you done any profiling to see where the hotspots are?
> 
> -Grant
> 
>

Re: Multiple SpellCheckRequestHandlers

2008-05-06 Thread solr_user


Thanks Otis,

  Actually, I am planning to make use of the qt parameter to specify which
handler should be used for the query.  Would there be any downside to that?



Otis Gospodnetic wrote:
> 
> Hello,
> 
> If you configured "/sc1" and "/sc2", then use something like
> http://../sc1?. for the first one and http://./sc2? for
> the second one.
> 
> Otis 
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
>> From: solr_user <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, May 6, 2008 1:57:17 PM
>> Subject: Re: Multiple SpellCheckRequestHandlers
>> 
>> 
>> And how do I specify in the query which requesthandler to use?
>> 
>> 
>> 
>> Otis Gospodnetic wrote:
>> > 
>> > Yes, just define two instances (with two distinct names) in
>> solrconfig.xml
>> > and point each of them to a different index.
>> > 
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> > 
>> > - Original Message 
>> >> From: solr_user 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Tuesday, May 6, 2008 12:16:07 AM
>> >> Subject: Multiple SpellCheckRequestHandlers
>> >> 
>> >> 
>> >> Hi all,
>> >> 
>> >>   Is it possible in Solr to have multiple SpellCheckRequestHandlers. 
>> In
>> >> my
>> >> application I have got two different spell check indexes.  I want the
>> >> spell
>> >> checker to check for a spelling suggestion in the first index and if
>> it
>> >> fails to get any suggestion from the first index only then it should
>> try
>> >> to
>> >> get a suggestion from the second index.  
>> >>   
>> >>   Is it possible to have a separate SpellCheckRequestHandler one for
>> each
>> >> index?
>> >> 
>> >> Solr-User
>> >> 
>> >> 
>> >> -- 
>> >> View this message in context: 
>> >> 
>> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >> 
>> >> 
>> > 
>> > 
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17090642.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple SpellCheckRequestHandlers

2008-05-06 Thread Otis Gospodnetic

I don't think so.  I just prefer shorter (cleaner?) URLs.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: solr_user <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 3:35:43 PM
> Subject: Re: Multiple SpellCheckRequestHandlers
> 
> 
> Thanks Otis,
> 
>   Actually, I am planning to make use of the qt parameter to specify which
> handler should be used for the query.  Would there be any downside to that?
> 
> 
> 
> Otis Gospodnetic wrote:
> > 
> > Hello,
> > 
> > If you configured "/sc1" and "/sc2", then use something like
> > http://../sc1?. for the first one and http://./sc2? for
> > the second one.
> > 
> > Otis 
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > - Original Message 
> >> From: solr_user 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, May 6, 2008 1:57:17 PM
> >> Subject: Re: Multiple SpellCheckRequestHandlers
> >> 
> >> 
> >> And how do I specify in the query which requesthandler to use?
> >> 
> >> 
> >> 
> >> Otis Gospodnetic wrote:
> >> > 
> >> > Yes, just define two instances (with two distinct names) in
> >> solrconfig.xml
> >> > and point each of them to a different index.
> >> > 
> >> > Otis
> >> > --
> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >> > 
> >> > - Original Message 
> >> >> From: solr_user 
> >> >> To: solr-user@lucene.apache.org
> >> >> Sent: Tuesday, May 6, 2008 12:16:07 AM
> >> >> Subject: Multiple SpellCheckRequestHandlers
> >> >> 
> >> >> 
> >> >> Hi all,
> >> >> 
> >> >>   Is it possible in Solr to have multiple SpellCheckRequestHandlers. 
> >> In
> >> >> my
> >> >> application I have got two different spell check indexes.  I want the
> >> >> spell
> >> >> checker to check for a spelling suggestion in the first index and if
> >> it
> >> >> fails to get any suggestion from the first index only then it should
> >> try
> >> >> to
> >> >> get a suggestion from the second index.  
> >> >>   
> >> >>   Is it possible to have a separate SpellCheckRequestHandler one for
> >> each
> >> >> index?
> >> >> 
> >> >> Solr-User
> >> >> 
> >> >> 
> >> >> -- 
> >> >> View this message in context: 
> >> >> 
> >> 
> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >> 
> >> >> 
> >> > 
> >> > 
> >> > 
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> 
> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> 
> >> 
> > 
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17090642.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Re: Help optimizing

2008-05-06 Thread Daniel Andersson


Thanks Otis!


On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote:

You have a lot of fields of type text, but a number of field sound  
like they really need not be tokenized and should thus be of type  
string.


I've changed quite a few of them over to string. Still not sure about  
the difference between 'string' and 'text' :-/




Do you really need 6 warming searchers?


That I have no idea about. Currently it's a very small site, well,  
visitor-wise anyway.



I think "date" type is pretty granular.  Do you really need that  
type of precision?


Probably not, have changed it to sint and will index the date in this  
format 20070310, which should do the trick.



I don't have shell handy here to check, but is that 'M' in -Xmx...  
recognized, or should it be lowercase 'm'?


"Append the letter k  or K to indicate kilobytes or the letter m or M  
to indicate megabytes.", so yeah, should recognize it.



Have you noticed anything weird while looking at the Solr Java  
process with jConsole?


I'm not very familiar with Java, so no idea what jConsole is :-/


Will be re-indexing tomorrow with the date->sint and text->string  
changes, will report back after it's done.


Cheers,
Daniel

Re: Help optimizing

2008-05-06 Thread Daniel Andersson



On May 6, 2008, at 4:00 AM, Mike Klaas wrote:


On 3-May-08, at 10:06 AM, Daniel Andersson wrote:


How do I optimize Solr to better use all the RAM? I'm using java6,  
64bit version, and start Solr using:

java -Xmx7500M -Xms4096M -jar start.jar

But according to top it only seems to be using 7.7% of the memory  
(around 600 MB).


Don't try to give Solr _all_ the memory on the system.  Solr depends  
on the index existing in the OS's disk cache (this is "cached" in  
top).  You should have at least 2 GB memory for a 3.5GB index,  
depending on how much of the index is stored (best is of course to  
have 3.5GB available so it can be cached completely).


Solr will require a wide distribution of queries to "warm up" (get  
the index in the OS disk cache).   This is automatically prioritize  
the "hot spots" in the index.  If you want to load the whole thing  
'cd datadir; cat * > /dev/null' works, but I don't recommend relying  
on that.


Ah. Have given it 4 GB of RAM now (Xmx=4 GB, Xms=2 GB)


Most queries are for make_id + model_id or city + state and almost  
all of the queries are ordered by datetime_found (newest -> oldest).


How many documents match, typically?  How many documents are  
returned, typically?  How often do you commit() [I suspect  
frequently, based on the problems you are having]?



Average documents matched/found: 6427
Only return 10 documents per page

Commit every 10,000 documents. Tried it at 100,000 with 2 GB of ram (1  
GB dedicated to Solr) and it just gave me OutOfMemory every time.  
Haven't tried increasing it since moving it to this new server.


Cheers,
Daniel

Re: Help optimizing

2008-05-06 Thread Daniel Andersson



On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote:


On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:

When performing a search, the results vary between 1.5 seconds up  
to 60 seconds.


Is this pure Solr time or overall application time?  I ask, b/c it  
is often the case that people are measuring application time and the  
problem lies in the application, so I just want to clarify.


It's 1.5 to send the command to Solr, wait for it to search and get  
the data back.


The web server is located in the US and the Solr-machine is in Sweden  
(don't ask), so I can see it taking a while to send data back and  
forth, so getting the searches below 1.5s is not something I'm  
expecting. I "just" want to get away from the >5s searches.


Is there a way of getting Solr to output the total time spent on any  
command? Just so I can eliminate some odd network problem/error.




Also, have you done any profiling to see where the hotspots are?


I have not. Not a Java person, so not sure how to do this. Is there  
something in the Solr admin that will allow me to do this? Have looked  
around and read what I could find in the Wiki, but didn't find  
anything that looked like profiling.


Cheers,
Daniel

Re: Help optimizing

2008-05-06 Thread Daniel Andersson



On May 6, 2008, at 7:26 PM, Lance Norskog wrote:

One cause of out-of-memory is multiple simultaneous requests. If you  
limit
the query stream to one or two simultaneous requests, you might fix  
this.

No, Solr does not have an option for this. The servlet containers have
controls for this that you have to dig very deep to find.


Unfortunately the website is still very small, in terms of visitors.
Was running MySQL, Apache and Solr on the same machine which only had  
2 GB of RAM, so understandable if Solr throws an error or two at me.


Cheers,
Daniel

Searching for empty fields

2008-05-06 Thread Daniel Andersson


Hi (again)

One of the fields in my database is color. It can either contain a  
value (blue, red etc) or be blank. When I perform a search with facet  
counts on, I get a count for "_empty_".


How do I go about searching for this?

I've tried color:"" which gives me an error. Same with color:.  
color:_empty_ returns nothing at all.


Thanks in advance!

/ d

Re: multi-language searching with Solr

2008-05-06 Thread Mike Klaas


On 5-May-08, at 1:28 PM, Eli K wrote:


Wouldn't this impact both indexing and search performance and the size
of the index?
It is also probable that I will have more then one free text fields
later on and with at least 20 languages this approach does not seem
very manageable.  Are there other options for making this work with
stemming?


If you want stemming, then you have to execute one query per language  
anyway, since the stemming will be different in every language.


This is a fundamental requirement: you somehow need to track the  
language of every token if you want correct multi-language stemming.   
The easiest way to do this would be to split each language into its  
own field.  But there are other options: you could prefix every  
indexed token with the language:


en:The en:quick en:brown en:fox en:jumped ...
fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ...

Separate fields seems easier to me, though.

-Mike

Re: Welcome, Koji

2008-05-06 Thread Koji Sekiguchi


Hi Erik and everyone!

I'm looking forward to working with you. :)

Cheers,

Koji

Erik Hatcher wrote:
A warm welcome to our newest Solr committer, Koji Sekiguchi!  He's 
been providing solid patches and improvements to Solr and the Ruby 
(solr-ruby/Flare) integration for a while now.


Erik

Solr (text) <> RDMBS (dynamic data) - best practies?

2008-05-06 Thread igrigorik


We're investigating migrating from an RDMBS to Solr to add text search
support, as well as, offload the text storage from our RDMBS (which is
arguably not designed for this kind of stuff).. While whiteboarding the
basic requirements, we realized that we have some 'special' requirements:

Basic setup:
 - A subset of data is immutable and is perfectly suited to be stored in
Solr
 - A subset of data is dynamic and changes frequently (should still be
stored in an RDBMS)

Question:
 1) We need access to dynamic data stored in our RDBMS to perform filtering
 2) When Solr returns its result set, we need to augment the results with
meta-data from our RDMBS

For (1), based on my research we're in fairly standard territory: implement
a custom Filter, or a ChainedFilter to return a bitmask based on an RDBMS
query. However, can this step be somehow coupled with (2), where the data we
retrieved in (1) is also appended to the result set?

With proper caching policies, I don't think this implementation will be all
that painful. (Sanity check?)

So having said that, are there any features or mechanisms in Solr/Lucene
that you would recommend / are there any best practices we should be aware
of to help us with the migration?

Appreciate the help.

ig
-- 
View this message in context: 
http://www.nabble.com/Solr-%28text%29-%3C%3E-RDMBS-%28dynamic-data%29---best-practies--tp17093678p17093678.html
Sent from the Solr - User mailing list archive at Nabble.com.

complex queries

2008-05-06 Thread Kevin Osborn

I don't think this is possible, but I figure that I would ask.

So, I want to find documents that match a search term and where a field in 
those documents are also in the results of a subquery. Basically, I am looking 
for the Solr equivalent of doing a SQL IN clause.

As I said, I don't think it is possible and would be highly suprised if it was.

Re: complex queries

2008-05-06 Thread Erik Hatcher



On May 6, 2008, at 8:57 PM, Kevin Osborn wrote:

I don't think this is possible, but I figure that I would ask.

So, I want to find documents that match a search term and where a  
field in those documents are also in the results of a subquery.  
Basically, I am looking for the Solr equivalent of doing a SQL IN  
clause.


"search clause" AND field:(value1 OR value2 OR value3)

does that do the trick for you?If not, could you elaborate with  
an example?


Erik

Re: Searching for empty fields

2008-05-06 Thread Brendan Grainger


Hi,

Not sure if this is what you want, but to search for 'empty' fields we  
use something like this:


(*:* AND -color:[* TO *])

Hope that helps.

Brendan

On May 6, 2008, at 6:43 PM, Daniel Andersson wrote:


Hi (again)

One of the fields in my database is color. It can either contain a  
value (blue, red etc) or be blank. When I perform a search with  
facet counts on, I get a count for "_empty_".


How do I go about searching for this?

I've tried color:"" which gives me an error. Same with color:.  
color:_empty_ returns nothing at all.


Thanks in advance!

/ d

RE: Help optimizing

2008-05-06 Thread Lance Norskog

There are two integer types, 'sint' and 'integer'. On an integer, you cannot
do a range check (that makes sense).
But!  Lucene sort makes an array of integers for every record. On an integer
field, it creates an integer array. On any other kind of field, each array
item has a lot more.

So, if you want fast sorts with small memory footprint, you want 'integer' =
20070310, not 'sint' = 20070310.  We did exactly this for exactly this
reason.

-Original Message-
From: Daniel Andersson [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 06, 2008 2:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Help optimizing

Thanks Otis!

On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote:

> You have a lot of fields of type text, but a number of field sound 
> like they really need not be tokenized and should thus be of type 
> string.

I've changed quite a few of them over to string. Still not sure about the
difference between 'string' and 'text' :-/

> Do you really need 6 warming searchers?

That I have no idea about. Currently it's a very small site, well,
visitor-wise anyway.

> I think "date" type is pretty granular.  Do you really need that type 
> of precision?

Probably not, have changed it to sint and will index the date in this format
20070310, which should do the trick.

> I don't have shell handy here to check, but is that 'M' in -Xmx...  
> recognized, or should it be lowercase 'm'?

"Append the letter k  or K to indicate kilobytes or the letter m or M to
indicate megabytes.", so yeah, should recognize it.

> Have you noticed anything weird while looking at the Solr Java process 
> with jConsole?

I'm not very familiar with Java, so no idea what jConsole is :-/

Will be re-indexing tomorrow with the date->sint and text->string changes,
will report back after it's done.

Cheers,
Daniel

Re: Help optimizing

2008-05-06 Thread Otis Gospodnetic

Daniel - regarding query time - yes, look at the response (assuming you are 
using XML responses) and look for "Qtime" in the top part of the response.  
That's the number of milliseconds it took to execute the query.  This time does 
not include the network time (request to Solr + time to send the whole response 
back to the client).

US <--> Sweden nice ;)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Daniel Andersson <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 6:01:01 PM
> Subject: Re: Help optimizing
> 
> 
> On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote:
> 
> > On May 3, 2008, at 1:06 PM, Daniel Andersson wrote:
> >
> >> When performing a search, the results vary between 1.5 seconds up  
> >> to 60 seconds.
> >>
> > Is this pure Solr time or overall application time?  I ask, b/c it  
> > is often the case that people are measuring application time and the  
> > problem lies in the application, so I just want to clarify.
> 
> It's 1.5 to send the command to Solr, wait for it to search and get  
> the data back.
> 
> The web server is located in the US and the Solr-machine is in Sweden  
> (don't ask), so I can see it taking a while to send data back and  
> forth, so getting the searches below 1.5s is not something I'm  
> expecting. I "just" want to get away from the >5s searches.
> 
> Is there a way of getting Solr to output the total time spent on any  
> command? Just so I can eliminate some odd network problem/error.
> 
> 
> > Also, have you done any profiling to see where the hotspots are?
> 
> I have not. Not a Java person, so not sure how to do this. Is there  
> something in the Solr admin that will allow me to do this? Have looked  
> around and read what I could find in the Wiki, but didn't find  
> anything that looked like profiling.
> 
> Cheers,
> Daniel
>

Re: Help optimizing

2008-05-06 Thread Otis Gospodnetic

Daniel,

The main difference is that string type fields are not tokenized, while text 
type fields are.
Example:
input text: milk with honey is god
String fields will end up with a single token: "milk with honey is god"
Text fields will end up with 5 tokens (assuming no stop word filtering): 
"milk", "with", "honey", "is", "god"

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Daniel Andersson <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 5:43:44 PM
> Subject: Re: Help optimizing
> 
> Thanks Otis!
> 
> 
> On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote:
> 
> > You have a lot of fields of type text, but a number of field sound  
> > like they really need not be tokenized and should thus be of type  
> > string.
> 
> I've changed quite a few of them over to string. Still not sure about  
> the difference between 'string' and 'text' :-/
> 
> 
> > Do you really need 6 warming searchers?
> 
> That I have no idea about. Currently it's a very small site, well,  
> visitor-wise anyway.
> 
> 
> > I think "date" type is pretty granular.  Do you really need that  
> > type of precision?
> 
> Probably not, have changed it to sint and will index the date in this  
> format 20070310, which should do the trick.
> 
> 
> > I don't have shell handy here to check, but is that 'M' in -Xmx...  
> > recognized, or should it be lowercase 'm'?
> 
> "Append the letter k  or K to indicate kilobytes or the letter m or M  
> to indicate megabytes.", so yeah, should recognize it.
> 
> 
> > Have you noticed anything weird while looking at the Solr Java  
> > process with jConsole?
> 
> I'm not very familiar with Java, so no idea what jConsole is :-/
> 
> 
> Will be re-indexing tomorrow with the date->sint and text->string  
> changes, will report back after it's done.
> 
> Cheers,
> Daniel
>

Re: Composition of multiple smaller fields into another larger field?

2008-05-06 Thread Otis Gospodnetic

Brian,

I think most people would just manipulate the data prior to sending it to Solr 
for indexing but you don't want that.
Your composeField proposal looks fine to me - I can't think of a problem there. 
 It sounds like you are asking about the language/syntax for field 
specification.  Could/should you not use the ${fifi} syntax?  We already use 
that in solrconfig.xml, for example.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Brian Johnson <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 2:53:13 PM
> Subject: Composition of multiple smaller fields into another larger field?
> 
> I am interested in using the suggest feature against a composition of other 
> more 
> granular facets. Let me provide an example to help explain my problem and 
> proposed approaches.
> 
> Say I have a set of facets for these artifacts:
>
>
>
> 
> So far things work OK. Now I want my suggest feature to work on a composition 
> equivalent to 
> 
> {city}, {state} {zipcode}
> 
> I have these fields defined per the suggestions on adding suggest 
> capabilities. 
> I'm experimenting so I am trying both options.
> 
>
>
> stored="true"/>
> 
> I would like to 'compose' the value for these 2 suggest fields based on the 
> existing 'atomic' fields. The copyField feature doesn't get me the whole way 
> there but I am interested in a similar mechanism.
> 
> 1) Is there an existing feature, approach, mechanism, ... to get this done 
> that 
> I'm just not aware of?
> 
> 2) Assuming that #1 is 'no', then would this be a generally useful feature to 
> add in? If so how would people like this to be done? 
> 
> Obviously I can push this down into the document preparation myself outside 
> of 
> Solr. I would prefer to have a mechanism to handle this in the schema.xml 
> since 
> I don't want to do any real manipulation/transformation of the data elements 
> at 
> this point. Here was an initial thought on what it might look like...
> 
> Here source is formatted similar to
> java.text.MessageFormat but with named rather than indexed
> substitutions so that.
> 
> 
> Here source is formatted similar to Velocity templates.
> 
> 
> I am not interested in creating a new template language or pulling in a new 
> dependency to get this done though (velocity, freemarker, ...) per se. I just 
> want to do some simple composition. If folks think this is a good idea 
> though, 
> it could be setup like this instead.
> 
> 
> class="solr.VelocityTemplateFactory" />
> 
> template_filename.vm file contains the following line
> $city, $state $zipcode
> 
> Any feedback would be appreciated.
> 
> Thanks,
> 
> Brian

Re: Solr (text) <> RDMBS (dynamic data) - best practies?

2008-05-06 Thread Otis Gospodnetic

AideRSS, eh, nice, welcome :)

Since for 1) you will have to go to your DB, why not just store the retrieved 
data somewhere (JVM, memcached...) and simply re-use it for 2?
* get query
* get data from DB for filtering
* store data from DB in cache
* run query
* write response using custom response writer? (this may not be right, I'd 
have to check) that grabs the extra data from cache and includes it with 
each hit

Maybe I'm over-simplifying something...

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: igrigorik <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 8:26:17 PM
> Subject: Solr (text) <> RDMBS (dynamic data) - best practies?
> 
> 
> We're investigating migrating from an RDMBS to Solr to add text search
> support, as well as, offload the text storage from our RDMBS (which is
> arguably not designed for this kind of stuff).. While whiteboarding the
> basic requirements, we realized that we have some 'special' requirements:
> 
> Basic setup:
>  - A subset of data is immutable and is perfectly suited to be stored in
> Solr
>  - A subset of data is dynamic and changes frequently (should still be
> stored in an RDBMS)
> 
> Question:
>  1) We need access to dynamic data stored in our RDBMS to perform filtering
>  2) When Solr returns its result set, we need to augment the results with
> meta-data from our RDMBS
> 
> For (1), based on my research we're in fairly standard territory: implement
> a custom Filter, or a ChainedFilter to return a bitmask based on an RDBMS
> query. However, can this step be somehow coupled with (2), where the data we
> retrieved in (1) is also appended to the result set?
> 
> With proper caching policies, I don't think this implementation will be all
> that painful. (Sanity check?)
> 
> So having said that, are there any features or mechanisms in Solr/Lucene
> that you would recommend / are there any best practices we should be aware
> of to help us with the migration?
> 
> Appreciate the help.
> 
> ig
> -- 
> View this message in context: 
> http://www.nabble.com/Solr-%28text%29-%3C%3E-RDMBS-%28dynamic-data%29---best-practies--tp17093678p17093678.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Re: complex queries

2008-05-06 Thread Kevin Osborn

Unfortunately, I don't know value1, value2, value3, etc.

This goes back to my question about access control lists. So, I have all my 
documents, which are products. And then someone suggested that I have a 
separate user document type with a multi-value field of productIDs.

In SQL, this would be the equivalent of:

"SELECT * from product where ... AND productId IN (SELECT productId from user 
where userId = ?)"

So, my main search clause is a normal search. But, I want to filter the results 
by the values in a completely different document where they match on some field.

- Original Message 
From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 6, 2008 6:03:34 PM
Subject: Re: complex queries

On May 6, 2008, at 8:57 PM, Kevin Osborn wrote:
> I don't think this is possible, but I figure that I would ask.
>
> So, I want to find documents that match a search term and where a  
> field in those documents are also in the results of a subquery.  
> Basically, I am looking for the Solr equivalent of doing a SQL IN  
> clause.

"search clause" AND field:(value1 OR value2 OR value3)

does that do the trick for you?If not, could you elaborate with  
an example?

Erik

Multiple Index creation

2008-05-06 Thread Vaijanath N. Rao


Hi All,

I tried to search within the SOLR archive, but could not find the answer 
of how can I create multiple index within SOLR. In case of lucene I can 
create an IndexWriter with a new Index, and hence can have multiple 
Index, I can allow search on that multiple index. How can I create in 
Solr a multiple Index.


--Thanks and Regards
Vaijanath

Re: Solr (text) <> RDMBS (dynamic data) - best practies?

2008-05-06 Thread igrigorik


Otis Gospodnetic wrote:
> AideRSS, eh, nice, welcome :)

;-)

> 
> Since for 1) you will have to go to your DB, why not just store the
> retrieved data somewhere (JVM, memcached...) and simply re-use it for 2?
> * get query
> * get data from DB for filtering
> * store data from DB in cache
> * run query
> * write response using custom response writer? (this may not be right,
> I'd have to check) that grabs the extra data from cache and includes
> it with each hit

Right, that's what we figured on first attempt as well. I'm curious if there
is a 'cleaner' way to do this. Essentially, retrieve a set of results,
decorate it with a bunch of dynamic fields, and then filter via those
fields.

-- 
View this message in context: 
http://www.nabble.com/Solr-%28text%29-%3C%3E-RDMBS-%28dynamic-data%29---best-practies--tp17093678p17097036.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Your valuable suggestion on autocomplete

2008-05-06 Thread Walter Underwood

Query logs are full of junk. We fill from the correct values in the
search index. We used to fill directly from the DB, but there were
updates in the DB that weren't in Solr.

Every two hours, it does a search for "type:movie" and retrieves the
title field for every match. Those are loaded into the ternary
search tree. The search box completes movie titles. Very helpful
for Ratatouille or Koyaanisqatsi.

You can try it on the non-member pages at www.netflix.com, click
the "Browse" tab instead of signing up. It would be OK if you signed
up, of course.

The number of hits per request are sized to match the max cached
request in our middle tier HTTP server. We have over twenty front
end webapps and five back end Solr servers.

wunder 

On 5/6/08 9:50 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> Hi Wunder,
> 
> - Original Message 
>> From: Walter Underwood <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, May 6, 2008 11:21:31 AM
>> Subject: Re: Your valuable suggestion on autocomplete
>> 
>> I wrote a prefix map (ternary search tree) in Java and load it with
>> queries to Solr every two hours. That keeps the autocomplete and
>> search index in sync.
> 
> What do you mean by the two staying in sync?  If you fill the TST with info
> from query logs, how does that make it stay in sync with the index?  Or do you
> mean you look for queries with >N hits (maybe even N=1) and only feed those
> into TST, thus ensuring autocomplete always suggests queries that yield hits?
> 
> Thanks,
> Otis
> 
>> Our autocomplete gets over 25M hits per day, so we don't really
>> want to send all that traffic to Solr.
>> 
>> wunder
>> 
>> On 5/6/08 2:37 AM, "Nishant Soni"  wrote:
>> 
>>> Just FYI, we have also implemented a Trie approach (outside of solr, even
>>> though our mail search uses solr) at the link in the signature.
>>> 
>>> You can try out the auto-completion working on the comparison tool on the
>>> home
>>> page.
>>> 
>>> - nishant
>>> 
>>> www.reviewgist.com
>>> 
>>> 
>>>  
>>> 
>>> 
>>> - Original Message 
>>> From: Vaijanath N. Rao
>>> To: solr-user@lucene.apache.org
>>> Sent: Tuesday, May 6, 2008 12:43:25 PM
>>> Subject: Re: Your valuable suggestion on autocomplete
>>> 
>>> Hi Rantjil Bould,
>>> 
>>> I would suggest you to give a thought on Trie data structure which is
>>> used for auto-complete.  Hitting Solr for every prefix looks time
>>> consuming job, but I might be wrong. I have Trie implementation and it
>>> works very fast (of course it is in memory data structure unlike solr
>>> index which lies on disk)
>>> 
>>> --Thanks and Regards
>>> Vaijanath
>>> 
>>> 
>>> 
>>> Rantjil Bould wrote:
 Hi Group,
  I have already got some valuable suggestions from group. Based
 on that, I have come out with following process to finally implement
 autocomplete like fetaure in my system
 1- Index the whole documents
 2- Extract all terms using indexReader's terms() method

 I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
 to get absolute terms i.e. vlanand. The field definition in solr is

 words="stopwords.txt" enablePositionIncrements="true">

 generateWordParts="1" generateNumberParts="1" catenateWords="1"
 catenateNumbers="1" catenateAll="0" splitOnCaseChange="0">

 protected="protwords.txt">

 ignoreCase="true" expand="true">

 words="stopwords.txt">

 generateWordParts="1" generateNumberParts="1" catenateWords="0"
 catenateNumbers="0" catenateAll="0" splitOnCaseChange="1">

 protected="protwords.txt">

 Would appreciate your input to get absolute terms??

 3- For each term, extract documents containing those term using termDocs()
 method
 4- Create one more index with fields, term, frequency and docNo. This index
 would be used for autocomplete feature.
 5- Any letter typed by user in search field, use Ajax script (like
 scriptaculous or JQuery) to extract all terms using prefix query.
 6- Based on search term selected by user, keep track of document nos in
 which this term belongs.
 7- For next search term selection using documents nos to select all terms
 excluding currently selected term.

 This somehow works. As new to SOlr ans also to Lucene, I would like to know
 in case it can be improved?

 - RB

>>> 
>>> 
>>>   
>>> 
>>> __
>>> __
>>> Be a better friend, newshound, and
>>> know-it-all with Yahoo! Mobile.  Try it now.
>>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>> 
>> 
> 
>

Re: access control list

2008-05-06 Thread Chris Hostetter


: I thought of that method. The problem I was thinking of is that if a new 
: customer is added, that could potentially cause an update of about 
: 2,000,000 records or so. Fortunately, this does not happen everyday. It 

FWIW: at some point i nthe future, LUCENE-1231 might make this type of 
thing much easier ... the particularly adventerous might wnat to 
experiment with trying to integrate that patch into Solr.


-Hoss

Re: Multiple Index creation

2008-05-06 Thread Shalin Shekhar Mangar

Hi Vajinath,

I believe you want multiple schemas. Take a look at
http://wiki.apache.org/solr/MultiCore

Note that this feature is available only with the Solr 1.3 trunk code. With
Solr 1.2,  you can have two instances of Tomcat or two solr webapps deployed
in one Tomcat instance. You can also think about creating one schema which
can accomodate everything.

On Wed, May 7, 2008 at 9:40 AM, Vaijanath N. Rao <[EMAIL PROTECTED]>
wrote:

> Hi All,
>
> I tried to search within the SOLR archive, but could not find the answer
> of how can I create multiple index within SOLR. In case of lucene I can
> create an IndexWriter with a new Index, and hence can have multiple Index, I
> can allow search on that multiple index. How can I create in Solr a multiple
> Index.
>
> --Thanks and Regards
> Vaijanath
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: SOLR-470 & default value in schema with NOW (update)

2008-05-06 Thread Chris Hostetter


: Second Try:
: * same date column setup

: * 2 files uploaded into the index. Updated the file with the timestamps 
: to be 3 digit millis to 'match' what NOW was supposed to be doing. I 
: left the other file alone.
: --> got the exception.. check data in Luke to confirm it was all 3 digit 
: millis and it was.

The two exceptions you cited both indicate there was at least one date 
instance with no millis included -- NOW can't do that.  it always inlcudes 
millis (even though it shouldn't).  are you certain you didn't miss an 
instance in the data you indexed (or didn't purge all previous values from 
the index before rebuilding?)




-Hoss

Re: stemming the synonyms

2008-05-06 Thread Chris Hostetter


: things related to vacation.   However, when I enter in travelling it does
: not find anything related to vacation, I assume it's because I'm not
: explicitly putting travelling in the synonyms file.   Is there a way to
: activate stemming for all of the synonym terms in the file without having to
: manually put 'travel' and 'travelling' and 'travelers' in the synonym file? 
: Thanks.

stemming and synonyms are part of analysis -- you can pick any ordering 
you want for analysis by changing hte order of the TokenFilterFactories 
for your field types.  

just put the synonym filter before the stemming filter.

 


-Hoss

Re: Sorting results

2008-05-06 Thread Chris Hostetter


: I perform the search like Matahari. The returned results may include "A big
: life: Matahari", "War and Matahari", "Matahari" (in that order). How can I
: return results by sorting at first the results that matches the begiging of
: string? I want to score higher the results that starts with search string
: than the other matches.

what you're describing is mainly a function of scoring (sorting may be by 
score, or by a concrete field value)

scoring documents higher when the term appears closer to the begining of 
the field can be done using a SpanFirstQuery, but Solr doesn't use 
SpanFirstQueries by default -- you'd need to write a plugin.

FWIW: if what you really want is to score the documents higher because the 
titles are *shorter* and the word is a bigger precentage of the title, 
then that should already be happeing ... i'm suprised by the ordering that 
you said you're getting.

what does the debugQuery output look like for those 3 docs?



-Hoss

Re: top documented in faceted query?

2008-05-06 Thread Chris Hostetter


: I could then get the top document for each value by issuing a sequence of
: queries
: q=x&fq=f:a&row=1
: q=x&fq=f:b&row=1
: q=x&fq=f:c&row=1
: ...
: 
: Is there a way to do this in one query?

Only if you write your own plugin ... Solr doesn't have anything do it for 
you.


-Hoss

Re: Solr (text) <> RDMBS (dynamic data) - best practies?

2008-05-06 Thread Ryan McKinley



* write response using custom response writer? (this may not be  
right, I'd have to check) that grabs the extra data from cache  
and includes it with each hit




Not a custom response writer... use a custom QueryComponent to augment  
the document.  Localsolr has a good example of this.


ryan

Re: SOLR-470 & default value in schema with NOW (update)

2008-05-06 Thread Brian Johnson

Unfortunately that data set is long gone, but I can say that I am quite sure 
the data was consistently sent to Solr with 3 digits of millis when I provided 
the data in the documents. I confirmed this using luke and the data was 
consistent, but the exception persisted. I looked into the associated classes 
and didn't see anything obviously wrong. The process I was using to insure each 
iteration was isolated was to move the lucene index folder to a new name and 
let the "java -jar start.jar" invocation create a new empty lucene index and 
index folder.

The problem appeared for me any time I tried to mix using the default value NOW 
with any documents that had this data. That should be a 2 document set to 
recreate the problem if it is the case. I didn't try that hard to isolate the 
problem, I just changed my data and removed the default from the schema.

Thanks,

Brian


- Original Message 
From: Chris Hostetter <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 6, 2008 9:41:20 PM
Subject: Re: SOLR-470 & default value in schema with NOW (update)


: Second Try:
: * same date column setup

: * 2 files uploaded into the index. Updated the file with the timestamps 
: to be 3 digit millis to 'match' what NOW was supposed to be doing. I 
: left the other file alone.
: --> got the exception.. check data in Luke to confirm it was all 3 digit 
: millis and it was.

The two exceptions you cited both indicate there was at least one date 
instance with no millis included -- NOW can't do that.  it always inlcudes 
millis (even though it shouldn't).  are you certain you didn't miss an 
instance in the data you indexed (or didn't purge all previous values from 
the index before rebuilding?)




-Hoss

Re: Composition of multiple smaller fields into another larger field?

2008-05-06 Thread Brian Johnson

Thank you for the reference to the ${foo} format.

I am looking at trying to minimize the redundant data in my document feed since 
I have lots of records with an overall small footprint per record. This simple 
change can save me maybe 20% of my data set size. It also provides a mechanism 
to isolate one (probably small) class of schema changes from the documents. I 
don't know how unique my situation is among the community.

Brian



- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 6, 2008 8:03:33 PM
Subject: Re: Composition of multiple smaller fields into another larger field?

Brian,

I think most people would just manipulate the data prior to sending it to Solr 
for indexing but you don't want that.
Your composeField proposal looks fine to me - I can't think of a problem there. 
 It sounds like you are asking about the language/syntax for field 
specification.  Could/should you not use the ${fifi} syntax?  We already use 
that in solrconfig.xml, for example.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Brian Johnson <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 6, 2008 2:53:13 PM
> Subject: Composition of multiple smaller fields into another larger field?
> 
> I am interested in using the suggest feature against a composition of other 
> more 
> granular facets. Let me provide an example to help explain my problem and 
> proposed approaches.
> 
> Say I have a set of facets for these artifacts:
>
>
>
> 
> So far things work OK. Now I want my suggest feature to work on a composition 
> equivalent to 
> 
> {city}, {state} {zipcode}
> 
> I have these fields defined per the suggestions on adding suggest 
> capabilities. 
> I'm experimenting so I am trying both options.
> 
>
>
> stored="true"/>
> 
> I would like to 'compose' the value for these 2 suggest fields based on the 
> existing 'atomic' fields. The copyField feature doesn't get me the whole way 
> there but I am interested in a similar mechanism.
> 
> 1) Is there an existing feature, approach, mechanism, ... to get this done 
> that 
> I'm just not aware of?
> 
> 2) Assuming that #1 is 'no', then would this be a generally useful feature to 
> add in? If so how would people like this to be done? 
> 
> Obviously I can push this down into the document preparation myself outside 
> of 
> Solr. I would prefer to have a mechanism to handle this in the schema.xml 
> since 
> I don't want to do any real manipulation/transformation of the data elements 
> at 
> this point. Here was an initial thought on what it might look like...
> 
> Here source is formatted similar to
> java.text.MessageFormat but with named rather than indexed
> substitutions so that.
>
>
> Here source is formatted similar to Velocity templates.
>
> 
> I am not interested in creating a new template language or pulling in a new 
> dependency to get this done though (velocity, freemarker, ...) per se. I just 
> want to do some simple composition. If folks think this is a good idea 
> though, 
> it could be setup like this instead.
> 
>
> class="solr.VelocityTemplateFactory" />
> 
> template_filename.vm file contains the following line
> $city, $state $zipcode
> 
> Any feedback would be appreciated.
> 
> Thanks,
> 
> Brian

47 matches

Mail list logo