Re: Your valuable suggestion on autocomplete
Hi Rantjil Bould, I would suggest you to give a thought on Trie data structure which is used for auto-complete. Hitting Solr for every prefix looks time consuming job, but I might be wrong. I have Trie implementation and it works very fast (of course it is in memory data structure unlike solr index which lies on disk) --Thanks and Regards Vaijanath Rantjil Bould wrote: Hi Group, I have already got some valuable suggestions from group. Based on that, I have come out with following process to finally implement autocomplete like fetaure in my system 1- Index the whole documents 2- Extract all terms using indexReader's terms() method I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like to get absolute terms i.e. vlanand. The field definition in solr is Would appreciate your input to get absolute terms?? 3- For each term, extract documents containing those term using termDocs() method 4- Create one more index with fields, term, frequency and docNo. This index would be used for autocomplete feature. 5- Any letter typed by user in search field, use Ajax script (like scriptaculous or JQuery) to extract all terms using prefix query. 6- Based on search term selected by user, keep track of document nos in which this term belongs. 7- For next search term selection using documents nos to select all terms excluding currently selected term. This somehow works. As new to SOlr ans also to Lucene, I would like to know in case it can be improved? - RB
Re: Your valuable suggestion on autocomplete
Just FYI, we have also implemented a Trie approach (outside of solr, even though our mail search uses solr) at the link in the signature. You can try out the auto-completion working on the comparison tool on the home page. - nishant www.reviewgist.com - Original Message From: Vaijanath N. Rao <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, May 6, 2008 12:43:25 PM Subject: Re: Your valuable suggestion on autocomplete Hi Rantjil Bould, I would suggest you to give a thought on Trie data structure which is used for auto-complete. Hitting Solr for every prefix looks time consuming job, but I might be wrong. I have Trie implementation and it works very fast (of course it is in memory data structure unlike solr index which lies on disk) --Thanks and Regards Vaijanath Rantjil Bould wrote: > Hi Group, > I have already got some valuable suggestions from group. Based > on that, I have come out with following process to finally implement > autocomplete like fetaure in my system > 1- Index the whole documents > 2- Extract all terms using indexReader's terms() method > > I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like > to get absolute terms i.e. vlanand. The field definition in solr is > > > > > words="stopwords.txt" enablePositionIncrements="true"> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"> > > protected="protwords.txt"> > > > > > ignoreCase="true" expand="true"> > words="stopwords.txt"> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"> > > protected="protwords.txt"> > > > > > Would appreciate your input to get absolute terms?? > > 3- For each term, extract documents containing those term using termDocs() > method > 4- Create one more index with fields, term, frequency and docNo. This index > would be used for autocomplete feature. > 5- Any letter typed by user in search field, use Ajax script (like > scriptaculous or JQuery) to extract all terms using prefix query. > 6- Based on search term selected by user, keep track of document nos in > which this term belongs. > 7- For next search term selection using documents nos to select all terms > excluding currently selected term. > > This somehow works. As new to SOlr ans also to Lucene, I would like to know > in case it can be improved? > > - RB > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
RE: Delete's increase while adding new documents
Hi all, it seems that we get errors during the auto-commit : java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/index/_4x.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor. (FSDirectory.java:501) at org.apache.lucene.store.FSDirectory$FSIndexInput. (FSDirectory.java:526) the _4x.fnm file is not on the file system. When we switch from autocommit to manual commits throughout xml messages we get the same kind of errors. Any idea what could be wrong in our configuration to cause these exceptions ? Greetings, Tim Van: Tim Mahy [EMAIL PROTECTED] Verzonden: maandag 28 april 2008 12:11 Aan: solr-user@lucene.apache.org Onderwerp: RE: Delete's increase while adding new documents Hi all, thank you for your reply. The id's that we send are unique, so we still have no clue what is happening :) greetings, Tim -Oorspronkelijk bericht- Van: Mike Klaas [mailto:[EMAIL PROTECTED] Verzonden: za 26-4-2008 1:52 Aan: solr-user@lucene.apache.org Onderwerp: Re: Delete's increase while adding new documents On 25-Apr-08, at 4:27 AM, Tim Mahy wrote: > > Hi all, > > we send xml add document messages to Solr and we notice something > very strange. > We autocommit at 10 documents, starting from a total clean index > (removed the data folder), when we start uploading we notice that > the docsPending is going up but also that the deletesPending is > going up very fast. After reaching the first 10 we queried to > solr to return everything and the total results count was not 10 > but somewhere around 77000 which is exactly 10 - docsDeleted > from the stats page. > > We used that Solr instance before, so my question is : is it > possible that Solr remembers the unique identities somewhere else as > in the data folder ? Btw we stopped Solr, removed the data folder > and restarted Solr and than this behavior began... Are you sure that all the documents you added were unique? (btw, deletePending doesn't necessarily mean that an old version of the doc was in the index, I think). -Mike Op dit e-mail bericht is de disclaimer van Info Support van toepassing, zie http://www.infosupport.nl/disclaimer [cid:60561037AEC348669B4CF2083E6168F4]
Re: Help optimizing
On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: Hi (again) people We've now invested in a server with 8 GB of RAM after too many OutOfMemory-errors. Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though below 1 would be nice), but 60 seconds it just.. well, scary. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. Also, have you done any profiling to see where the hotspots are? -Grant
[poll] Change logging to SLF4J?
Hello- There has been a long running thread on solr-dev proposing switching the logging system to use something other then JDK logging. http://www.nabble.com/Solr-Logging-td16836646.html http://www.nabble.com/logging-through-log4j-td13747253.html We are considering using http://www.slf4j.org/. Check: https://issues.apache.org/jira/browse/SOLR-560 The "pro" argument is that: * SLFJ allows more flexibility for people using solr outside the canned .war to configure logging without touching JDK logging. The "con" argument goes something like: * JDK logging is already is the standard logging framework. * JDK logging is already in in use. * SLF4J adds another dependency (for something that already works) On the dev lists there are a strong opinions on either side, but we would like to get a larger sampling of option and validation before making this change. [ ] Keep solr logging as it is. (JDK Logging) [ ] Use SLF4J. As an bonus question (this time fill in the blank): I have tried SOLR-560 with my logging system and ___. thanks ryan
Re: Your valuable suggestion on autocomplete
I wrote a prefix map (ternary search tree) in Java and load it with queries to Solr every two hours. That keeps the autocomplete and search index in sync. Our autocomplete gets over 25M hits per day, so we don't really want to send all that traffic to Solr. wunder On 5/6/08 2:37 AM, "Nishant Soni" <[EMAIL PROTECTED]> wrote: > Just FYI, we have also implemented a Trie approach (outside of solr, even > though our mail search uses solr) at the link in the signature. > > You can try out the auto-completion working on the comparison tool on the home > page. > > - nishant > > www.reviewgist.com > > > > > > - Original Message > From: Vaijanath N. Rao <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 12:43:25 PM > Subject: Re: Your valuable suggestion on autocomplete > > Hi Rantjil Bould, > > I would suggest you to give a thought on Trie data structure which is > used for auto-complete. Hitting Solr for every prefix looks time > consuming job, but I might be wrong. I have Trie implementation and it > works very fast (of course it is in memory data structure unlike solr > index which lies on disk) > > --Thanks and Regards > Vaijanath > > > > Rantjil Bould wrote: >> Hi Group, >> I have already got some valuable suggestions from group. Based >> on that, I have come out with following process to finally implement >> autocomplete like fetaure in my system >> 1- Index the whole documents >> 2- Extract all terms using indexReader's terms() method >> >> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like >> to get absolute terms i.e. vlanand. The field definition in solr is >> >> >> >> >> > words="stopwords.txt" enablePositionIncrements="true"> >> > generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"> >> >> > protected="protwords.txt"> >> >> >> >> >> > ignoreCase="true" expand="true"> >> > words="stopwords.txt"> >> > generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"> >> >> > protected="protwords.txt"> >> >> >> >> >> Would appreciate your input to get absolute terms?? >> >> 3- For each term, extract documents containing those term using termDocs() >> method >> 4- Create one more index with fields, term, frequency and docNo. This index >> would be used for autocomplete feature. >> 5- Any letter typed by user in search field, use Ajax script (like >> scriptaculous or JQuery) to extract all terms using prefix query. >> 6- Based on search term selected by user, keep track of document nos in >> which this term belongs. >> 7- For next search term selection using documents nos to select all terms >> excluding currently selected term. >> >> This somehow works. As new to SOlr ans also to Lucene, I would like to know >> in case it can be improved? >> >> - RB >> >> > > > > __ > __ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
Re: multi-language searching with Solr
Peter, Thanks for your help, I will prototype your solution and see if it makes sense for me. Eli On Mon, May 5, 2008 at 5:38 PM, Binkley, Peter <[EMAIL PROTECTED]> wrote: > It won't make much difference to the index size, since you'll only be > populating one of the language fields for each document, and empty > fields cost nothing. The performance may suffer a bit but Lucene may > surprise you with how good it is with that kind of boolean query. > > I agree that as the number of fields and languages increases, this is > going to become a lot to manage. But you're up against some basic > problems when you try to model this in Solr: for each token, you care > about not just its value (which is all Lucene cares about) but also its > language and its stem; and the stem for a given token depends on the > language (different stemming rules); and at query time you may not know > the language. I don't think you're going to get a solution without some > redundancy; but solving problems by adding redundant fields is a common > method in Solr. > > > Peter > > > -Original Message- > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 2:28 PM > To: solr-user@lucene.apache.org > > > Subject: Re: multi-language searching with Solr > > Wouldn't this impact both indexing and search performance and the size > of the index? > It is also probable that I will have more then one free text fields > later on and with at least 20 languages this approach does not seem very > manageable. Are there other options for making this work with stemming? > > Thanks, > > Eli > > > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter > <[EMAIL PROTECTED]> wrote: > > I think you would have to declare a separate field for each language > > (freetext_en, freetext_fr, etc.), each with its own appropriate > > stemming. Your ingestion process would have to assign the free text > > content for each document to the appropriate field; so, for each > > document, only one of the freetext fields would be populated. At > > search time, you would either search against the appropriate field if > > > you know the search language, or search across them with > > "freetext_fr:query OR freetext_en:query OR ...". That way your query > > will be interpreted by each language field using that language's > stemming rules. > > > > Other options for combining indexes, such as copyfield or dynamic > > fields (see http://wiki.apache.org/solr/SchemaXml), would lead to a > > single field type and therefore a single type of stemming. You could > > always use copyfield to create an unstemmed common index, if you > > don't care about stemming when you search across languages (since > > you're likely to get odd results when a query in one language is > > stemmed according to the rules of another language). > > > > Peter > > > > > > > > -Original Message- > > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 8:27 AM > > To: solr-user@lucene.apache.org > > Subject: multi-language searching with Solr > > > > Hello folks, > > > > Let me start by saying that I am new to Lucene and Solr. > > > > I am in the process of designing a search back-end for a system that > > > receives 20k documents a day and needs to keep them available for 30 > > days. The documents should be searchable on a free text field and on > > > about 8 other fields. > > > > One of my requirements is to index and search documents in multiple > > languages. I would like to have the ability to stem and provide the > > advanced search features that are based on it. This will only affect > > > the free text field because the rest of the fields are in English. > > > > I can find out the language of the document before indexing and I > > might be able to provide the language to search on. I also need to > > have the ability to search across all indexed languages (there will > > be 20 in total). > > > > Given these requirements do you think this is doable with Solr? A > > major limiting factor is that I need to stick to the 1.2 GA version > > and I cannot utilize the multi-core features in the 1.3 trunk. > > > > I considered writing my own analyzer that will call the appropriate > > Lucene analyzer for the given language but I did not see any way for > > it to access the field that specifies the language of the document. > > > > Thanks, > > > > Eli > > > > p.s. I am looking for an experienced Lucene/Solr consultant to help > > with the design of this system. > > > > > >
Re: Your valuable suggestion on autocomplete
Hi Wunder, - Original Message > From: Walter Underwood <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 11:21:31 AM > Subject: Re: Your valuable suggestion on autocomplete > > I wrote a prefix map (ternary search tree) in Java and load it with > queries to Solr every two hours. That keeps the autocomplete and > search index in sync. What do you mean by the two staying in sync? If you fill the TST with info from query logs, how does that make it stay in sync with the index? Or do you mean you look for queries with >N hits (maybe even N=1) and only feed those into TST, thus ensuring autocomplete always suggests queries that yield hits? Thanks, Otis > Our autocomplete gets over 25M hits per day, so we don't really > want to send all that traffic to Solr. > > wunder > > On 5/6/08 2:37 AM, "Nishant Soni" wrote: > > > Just FYI, we have also implemented a Trie approach (outside of solr, even > > though our mail search uses solr) at the link in the signature. > > > > You can try out the auto-completion working on the comparison tool on the > > home > > page. > > > > - nishant > > > > www.reviewgist.com > > > > > > > > > > > > - Original Message > > From: Vaijanath N. Rao > > To: solr-user@lucene.apache.org > > Sent: Tuesday, May 6, 2008 12:43:25 PM > > Subject: Re: Your valuable suggestion on autocomplete > > > > Hi Rantjil Bould, > > > > I would suggest you to give a thought on Trie data structure which is > > used for auto-complete. Hitting Solr for every prefix looks time > > consuming job, but I might be wrong. I have Trie implementation and it > > works very fast (of course it is in memory data structure unlike solr > > index which lies on disk) > > > > --Thanks and Regards > > Vaijanath > > > > > > > > Rantjil Bould wrote: > >> Hi Group, > >> I have already got some valuable suggestions from group. Based > >> on that, I have come out with following process to finally implement > >> autocomplete like fetaure in my system > >> 1- Index the whole documents > >> 2- Extract all terms using indexReader's terms() method > >> > >> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like > >> to get absolute terms i.e. vlanand. The field definition in solr is > >> > >> > >> > >> > >> > >> words="stopwords.txt" enablePositionIncrements="true"> > >> > >> generateWordParts="1" generateNumberParts="1" catenateWords="1" > >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"> > >> > >> > >> protected="protwords.txt"> > >> > >> > >> > >> > >> > >> ignoreCase="true" expand="true"> > >> > >> words="stopwords.txt"> > >> > >> generateWordParts="1" generateNumberParts="1" catenateWords="0" > >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"> > >> > >> > >> protected="protwords.txt"> > >> > >> > >> > >> > >> Would appreciate your input to get absolute terms?? > >> > >> 3- For each term, extract documents containing those term using termDocs() > >> method > >> 4- Create one more index with fields, term, frequency and docNo. This index > >> would be used for autocomplete feature. > >> 5- Any letter typed by user in search field, use Ajax script (like > >> scriptaculous or JQuery) to extract all terms using prefix query. > >> 6- Based on search term selected by user, keep track of document nos in > >> which this term belongs. > >> 7- For next search term selection using documents nos to select all terms > >> excluding currently selected term. > >> > >> This somehow works. As new to SOlr ans also to Lucene, I would like to know > >> in case it can be improved? > >> > >> - RB > >> > >> > > > > > > > > __ > > __ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > >
Welcome, Koji
A warm welcome to our newest Solr committer, Koji Sekiguchi! He's been providing solid patches and improvements to Solr and the Ruby (solr-ruby/Flare) integration for a while now. Erik
RE: Help optimizing
One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Lance Norskog -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 5:19 AM To: solr-user@lucene.apache.org Subject: Re: Help optimizing On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: > Hi (again) people > > We've now invested in a server with 8 GB of RAM after too many > OutOfMemory-errors. > > Our database/index is 3.5 GB and contains 4,352,471 documents. Most > documents are less than 1 kb. When performing a search, the results > vary between 1.5 seconds up to 60 seconds. > > I don't have a big problem with 1.5 seconds (even though below 1 would > be nice), but 60 seconds it just.. well, scary. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. Also, have you done any profiling to see where the hotspots are? -Grant
Re: Multiple SpellCheckRequestHandlers
And how do I specify in the query which requesthandler to use? Otis Gospodnetic wrote: > > Yes, just define two instances (with two distinct names) in solrconfig.xml > and point each of them to a different index. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message >> From: solr_user <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, May 6, 2008 12:16:07 AM >> Subject: Multiple SpellCheckRequestHandlers >> >> >> Hi all, >> >> Is it possible in Solr to have multiple SpellCheckRequestHandlers. In >> my >> application I have got two different spell check indexes. I want the >> spell >> checker to check for a spelling suggestion in the first index and if it >> fails to get any suggestion from the first index only then it should try >> to >> get a suggestion from the second index. >> >> Is it possible to have a separate SpellCheckRequestHandler one for each >> index? >> >> Solr-User >> >> >> -- >> View this message in context: >> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- View this message in context: http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete's increase while adding new documents
On 6-May-08, at 4:56 AM, Tim Mahy wrote: Hi all, it seems that we get errors during the auto-commit : java.io.FileNotFoundException: /opt/solr/upload/nl/archive/data/ index/_4x.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput $Descriptor. (FSDirectory.java:501) at org.apache.lucene.store.FSDirectory $FSIndexInput. (FSDirectory.java:526) the _4x.fnm file is not on the file system. When we switch from autocommit to manual commits throughout xml messages we get the same kind of errors. Any idea what could be wrong in our configuration to cause these exceptions ? I have only heard of that error appearing in two cases. Either the index is corrupt, or something else deleted the file. Are you sure that there is only one Solr instance that accesses the directory, and that nothing else ever touches it? Can you reproduce the deletion issue with a small number of documents (something that could be tested by one of us)? -Mike
RE: multi-language searching with Solr
Hi, you could also use multiple Solr instances having specific settings and stopwords etc for the same field and upload your documents to the correct instance and than merge the indexes to one searchable index ... greetings, Tim Van: Eli K [EMAIL PROTECTED] Verzonden: dinsdag 6 mei 2008 18:26 Aan: solr-user@lucene.apache.org Onderwerp: Re: multi-language searching with Solr Peter, Thanks for your help, I will prototype your solution and see if it makes sense for me. Eli On Mon, May 5, 2008 at 5:38 PM, Binkley, Peter <[EMAIL PROTECTED]> wrote: > It won't make much difference to the index size, since you'll only be > populating one of the language fields for each document, and empty > fields cost nothing. The performance may suffer a bit but Lucene may > surprise you with how good it is with that kind of boolean query. > > I agree that as the number of fields and languages increases, this is > going to become a lot to manage. But you're up against some basic > problems when you try to model this in Solr: for each token, you care > about not just its value (which is all Lucene cares about) but also its > language and its stem; and the stem for a given token depends on the > language (different stemming rules); and at query time you may not know > the language. I don't think you're going to get a solution without some > redundancy; but solving problems by adding redundant fields is a common > method in Solr. > > > Peter > > > -Original Message- > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 2:28 PM > To: solr-user@lucene.apache.org > > > Subject: Re: multi-language searching with Solr > > Wouldn't this impact both indexing and search performance and the size > of the index? > It is also probable that I will have more then one free text fields > later on and with at least 20 languages this approach does not seem very > manageable. Are there other options for making this work with stemming? > > Thanks, > > Eli > > > On Mon, May 5, 2008 at 3:41 PM, Binkley, Peter > <[EMAIL PROTECTED]> wrote: > > I think you would have to declare a separate field for each language > > (freetext_en, freetext_fr, etc.), each with its own appropriate > > stemming. Your ingestion process would have to assign the free text > > content for each document to the appropriate field; so, for each > > document, only one of the freetext fields would be populated. At > > search time, you would either search against the appropriate field if > > > you know the search language, or search across them with > > "freetext_fr:query OR freetext_en:query OR ...". That way your query > > will be interpreted by each language field using that language's > stemming rules. > > > > Other options for combining indexes, such as copyfield or dynamic > > fields (see http://wiki.apache.org/solr/SchemaXml), would lead to a > > single field type and therefore a single type of stemming. You could > > always use copyfield to create an unstemmed common index, if you > > don't care about stemming when you search across languages (since > > you're likely to get odd results when a query in one language is > > stemmed according to the rules of another language). > > > > Peter > > > > > > > > -Original Message- > > From: Eli K [mailto:[EMAIL PROTECTED] > > Sent: Monday, May 05, 2008 8:27 AM > > To: solr-user@lucene.apache.org > > Subject: multi-language searching with Solr > > > > Hello folks, > > > > Let me start by saying that I am new to Lucene and Solr. > > > > I am in the process of designing a search back-end for a system that > > > receives 20k documents a day and needs to keep them available for 30 > > days. The documents should be searchable on a free text field and on > > > about 8 other fields. > > > > One of my requirements is to index and search documents in multiple > > languages. I would like to have the ability to stem and provide the > > advanced search features that are based on it. This will only affect > > > the free text field because the rest of the fields are in English. > > > > I can find out the language of the document before indexing and I > > might be able to provide the language to search on. I also need to > > have the ability to search across all indexed languages (there will > > be 20 in total). > > > > Given these requirements do you think this is doable with Solr? A > > major limiting factor is that I need to stick to the 1.2 GA version > > and I cannot utilize the multi-core features in the 1.3 trunk. > > > > I considered writing my own analyzer that will call the appropriate > > Lucene analyzer for the given language but I did not see any way for > > it to access the field that specifies the language of the document. > > > > Thanks, > > > > Eli > > > > p.s. I am looking for an experienced Lucene/Solr consultant to help > >
Composition of multiple smaller fields into another larger field?
I am interested in using the suggest feature against a composition of other more granular facets. Let me provide an example to help explain my problem and proposed approaches. Say I have a set of facets for these artifacts: So far things work OK. Now I want my suggest feature to work on a composition equivalent to {city}, {state} {zipcode} I have these fields defined per the suggestions on adding suggest capabilities. I'm experimenting so I am trying both options. I would like to 'compose' the value for these 2 suggest fields based on the existing 'atomic' fields. The copyField feature doesn't get me the whole way there but I am interested in a similar mechanism. 1) Is there an existing feature, approach, mechanism, ... to get this done that I'm just not aware of? 2) Assuming that #1 is 'no', then would this be a generally useful feature to add in? If so how would people like this to be done? Obviously I can push this down into the document preparation myself outside of Solr. I would prefer to have a mechanism to handle this in the schema.xml since I don't want to do any real manipulation/transformation of the data elements at this point. Here was an initial thought on what it might look like... Here source is formatted similar to java.text.MessageFormat but with named rather than indexed substitutions so that. Here source is formatted similar to Velocity templates. I am not interested in creating a new template language or pulling in a new dependency to get this done though (velocity, freemarker, ...) per se. I just want to do some simple composition. If folks think this is a good idea though, it could be setup like this instead. template_filename.vm file contains the following line $city, $state $zipcode Any feedback would be appreciated. Thanks, Brian
Re: Multiple SpellCheckRequestHandlers
Hello, If you configured "/sc1" and "/sc2", then use something like http://../sc1?. for the first one and http://./sc2? for the second one. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: solr_user <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 1:57:17 PM > Subject: Re: Multiple SpellCheckRequestHandlers > > > And how do I specify in the query which requesthandler to use? > > > > Otis Gospodnetic wrote: > > > > Yes, just define two instances (with two distinct names) in solrconfig.xml > > and point each of them to a different index. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > >> From: solr_user > >> To: solr-user@lucene.apache.org > >> Sent: Tuesday, May 6, 2008 12:16:07 AM > >> Subject: Multiple SpellCheckRequestHandlers > >> > >> > >> Hi all, > >> > >> Is it possible in Solr to have multiple SpellCheckRequestHandlers. In > >> my > >> application I have got two different spell check indexes. I want the > >> spell > >> checker to check for a spelling suggestion in the first index and if it > >> fails to get any suggestion from the first index only then it should try > >> to > >> get a suggestion from the second index. > >> > >> Is it possible to have a separate SpellCheckRequestHandler one for each > >> index? > >> > >> Solr-User > >> > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > > > > > -- > View this message in context: > http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Help optimizing
Hello, If you are using Jetty, you don't have to dig very deep - just look for the section about threads. Here is a snippet from Jetty 6.1.9's jetty.xml: 10 50 25 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Lance Norskog <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Cc: "Norskog, Lance" <[EMAIL PROTECTED]> > Sent: Tuesday, May 6, 2008 1:26:28 PM > Subject: RE: Help optimizing > > One cause of out-of-memory is multiple simultaneous requests. If you limit > the query stream to one or two simultaneous requests, you might fix this. > No, Solr does not have an option for this. The servlet containers have > controls for this that you have to dig very deep to find. > > Lance Norskog > > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 06, 2008 5:19 AM > To: solr-user@lucene.apache.org > Subject: Re: Help optimizing > > > On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: > > > Hi (again) people > > > > We've now invested in a server with 8 GB of RAM after too many > > OutOfMemory-errors. > > > > Our database/index is 3.5 GB and contains 4,352,471 documents. Most > > documents are less than 1 kb. When performing a search, the results > > vary between 1.5 seconds up to 60 seconds. > > > > I don't have a big problem with 1.5 seconds (even though below 1 would > > be nice), but 60 seconds it just.. well, scary. > > Is this pure Solr time or overall application time? I ask, b/c it is often > the case that people are measuring application time and the problem lies in > the application, so I just want to clarify. > > Also, have you done any profiling to see where the hotspots are? > > -Grant > >
Re: Multiple SpellCheckRequestHandlers
Thanks Otis, Actually, I am planning to make use of the qt parameter to specify which handler should be used for the query. Would there be any downside to that? Otis Gospodnetic wrote: > > Hello, > > If you configured "/sc1" and "/sc2", then use something like > http://../sc1?. for the first one and http://./sc2? for > the second one. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message >> From: solr_user <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, May 6, 2008 1:57:17 PM >> Subject: Re: Multiple SpellCheckRequestHandlers >> >> >> And how do I specify in the query which requesthandler to use? >> >> >> >> Otis Gospodnetic wrote: >> > >> > Yes, just define two instances (with two distinct names) in >> solrconfig.xml >> > and point each of them to a different index. >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > - Original Message >> >> From: solr_user >> >> To: solr-user@lucene.apache.org >> >> Sent: Tuesday, May 6, 2008 12:16:07 AM >> >> Subject: Multiple SpellCheckRequestHandlers >> >> >> >> >> >> Hi all, >> >> >> >> Is it possible in Solr to have multiple SpellCheckRequestHandlers. >> In >> >> my >> >> application I have got two different spell check indexes. I want the >> >> spell >> >> checker to check for a spelling suggestion in the first index and if >> it >> >> fails to get any suggestion from the first index only then it should >> try >> >> to >> >> get a suggestion from the second index. >> >> >> >> Is it possible to have a separate SpellCheckRequestHandler one for >> each >> >> index? >> >> >> >> Solr-User >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- View this message in context: http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17090642.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple SpellCheckRequestHandlers
I don't think so. I just prefer shorter (cleaner?) URLs. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: solr_user <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 3:35:43 PM > Subject: Re: Multiple SpellCheckRequestHandlers > > > Thanks Otis, > > Actually, I am planning to make use of the qt parameter to specify which > handler should be used for the query. Would there be any downside to that? > > > > Otis Gospodnetic wrote: > > > > Hello, > > > > If you configured "/sc1" and "/sc2", then use something like > > http://../sc1?. for the first one and http://./sc2? for > > the second one. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > >> From: solr_user > >> To: solr-user@lucene.apache.org > >> Sent: Tuesday, May 6, 2008 1:57:17 PM > >> Subject: Re: Multiple SpellCheckRequestHandlers > >> > >> > >> And how do I specify in the query which requesthandler to use? > >> > >> > >> > >> Otis Gospodnetic wrote: > >> > > >> > Yes, just define two instances (with two distinct names) in > >> solrconfig.xml > >> > and point each of them to a different index. > >> > > >> > Otis > >> > -- > >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > > >> > - Original Message > >> >> From: solr_user > >> >> To: solr-user@lucene.apache.org > >> >> Sent: Tuesday, May 6, 2008 12:16:07 AM > >> >> Subject: Multiple SpellCheckRequestHandlers > >> >> > >> >> > >> >> Hi all, > >> >> > >> >> Is it possible in Solr to have multiple SpellCheckRequestHandlers. > >> In > >> >> my > >> >> application I have got two different spell check indexes. I want the > >> >> spell > >> >> checker to check for a spelling suggestion in the first index and if > >> it > >> >> fails to get any suggestion from the first index only then it should > >> try > >> >> to > >> >> get a suggestion from the second index. > >> >> > >> >> Is it possible to have a separate SpellCheckRequestHandler one for > >> each > >> >> index? > >> >> > >> >> Solr-User > >> >> > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17071568.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17088834.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > > > > > -- > View this message in context: > http://www.nabble.com/Multiple-SpellCheckRequestHandlers-tp17071568p17090642.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Help optimizing
Thanks Otis! On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote: You have a lot of fields of type text, but a number of field sound like they really need not be tokenized and should thus be of type string. I've changed quite a few of them over to string. Still not sure about the difference between 'string' and 'text' :-/ Do you really need 6 warming searchers? That I have no idea about. Currently it's a very small site, well, visitor-wise anyway. I think "date" type is pretty granular. Do you really need that type of precision? Probably not, have changed it to sint and will index the date in this format 20070310, which should do the trick. I don't have shell handy here to check, but is that 'M' in -Xmx... recognized, or should it be lowercase 'm'? "Append the letter k or K to indicate kilobytes or the letter m or M to indicate megabytes.", so yeah, should recognize it. Have you noticed anything weird while looking at the Solr Java process with jConsole? I'm not very familiar with Java, so no idea what jConsole is :-/ Will be re-indexing tomorrow with the date->sint and text->string changes, will report back after it's done. Cheers, Daniel
Re: Help optimizing
On May 6, 2008, at 4:00 AM, Mike Klaas wrote: On 3-May-08, at 10:06 AM, Daniel Andersson wrote: How do I optimize Solr to better use all the RAM? I'm using java6, 64bit version, and start Solr using: java -Xmx7500M -Xms4096M -jar start.jar But according to top it only seems to be using 7.7% of the memory (around 600 MB). Don't try to give Solr _all_ the memory on the system. Solr depends on the index existing in the OS's disk cache (this is "cached" in top). You should have at least 2 GB memory for a 3.5GB index, depending on how much of the index is stored (best is of course to have 3.5GB available so it can be cached completely). Solr will require a wide distribution of queries to "warm up" (get the index in the OS disk cache). This is automatically prioritize the "hot spots" in the index. If you want to load the whole thing 'cd datadir; cat * > /dev/null' works, but I don't recommend relying on that. Ah. Have given it 4 GB of RAM now (Xmx=4 GB, Xms=2 GB) Most queries are for make_id + model_id or city + state and almost all of the queries are ordered by datetime_found (newest -> oldest). How many documents match, typically? How many documents are returned, typically? How often do you commit() [I suspect frequently, based on the problems you are having]? Average documents matched/found: 6427 Only return 10 documents per page Commit every 10,000 documents. Tried it at 100,000 with 2 GB of ram (1 GB dedicated to Solr) and it just gave me OutOfMemory every time. Haven't tried increasing it since moving it to this new server. Cheers, Daniel
Re: Help optimizing
On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote: On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: When performing a search, the results vary between 1.5 seconds up to 60 seconds. Is this pure Solr time or overall application time? I ask, b/c it is often the case that people are measuring application time and the problem lies in the application, so I just want to clarify. It's 1.5 to send the command to Solr, wait for it to search and get the data back. The web server is located in the US and the Solr-machine is in Sweden (don't ask), so I can see it taking a while to send data back and forth, so getting the searches below 1.5s is not something I'm expecting. I "just" want to get away from the >5s searches. Is there a way of getting Solr to output the total time spent on any command? Just so I can eliminate some odd network problem/error. Also, have you done any profiling to see where the hotspots are? I have not. Not a Java person, so not sure how to do this. Is there something in the Solr admin that will allow me to do this? Have looked around and read what I could find in the Wiki, but didn't find anything that looked like profiling. Cheers, Daniel
Re: Help optimizing
On May 6, 2008, at 7:26 PM, Lance Norskog wrote: One cause of out-of-memory is multiple simultaneous requests. If you limit the query stream to one or two simultaneous requests, you might fix this. No, Solr does not have an option for this. The servlet containers have controls for this that you have to dig very deep to find. Unfortunately the website is still very small, in terms of visitors. Was running MySQL, Apache and Solr on the same machine which only had 2 GB of RAM, so understandable if Solr throws an error or two at me. Cheers, Daniel
Searching for empty fields
Hi (again) One of the fields in my database is color. It can either contain a value (blue, red etc) or be blank. When I perform a search with facet counts on, I get a count for "_empty_". How do I go about searching for this? I've tried color:"" which gives me an error. Same with color:. color:_empty_ returns nothing at all. Thanks in advance! / d
Re: multi-language searching with Solr
On 5-May-08, at 1:28 PM, Eli K wrote: Wouldn't this impact both indexing and search performance and the size of the index? It is also probable that I will have more then one free text fields later on and with at least 20 languages this approach does not seem very manageable. Are there other options for making this work with stemming? If you want stemming, then you have to execute one query per language anyway, since the stemming will be different in every language. This is a fundamental requirement: you somehow need to track the language of every token if you want correct multi-language stemming. The easiest way to do this would be to split each language into its own field. But there are other options: you could prefix every indexed token with the language: en:The en:quick en:brown en:fox en:jumped ... fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ... Separate fields seems easier to me, though. -Mike
Re: Welcome, Koji
Hi Erik and everyone! I'm looking forward to working with you. :) Cheers, Koji Erik Hatcher wrote: A warm welcome to our newest Solr committer, Koji Sekiguchi! He's been providing solid patches and improvements to Solr and the Ruby (solr-ruby/Flare) integration for a while now. Erik
Solr (text) <> RDMBS (dynamic data) - best practies?
We're investigating migrating from an RDMBS to Solr to add text search support, as well as, offload the text storage from our RDMBS (which is arguably not designed for this kind of stuff).. While whiteboarding the basic requirements, we realized that we have some 'special' requirements: Basic setup: - A subset of data is immutable and is perfectly suited to be stored in Solr - A subset of data is dynamic and changes frequently (should still be stored in an RDBMS) Question: 1) We need access to dynamic data stored in our RDBMS to perform filtering 2) When Solr returns its result set, we need to augment the results with meta-data from our RDMBS For (1), based on my research we're in fairly standard territory: implement a custom Filter, or a ChainedFilter to return a bitmask based on an RDBMS query. However, can this step be somehow coupled with (2), where the data we retrieved in (1) is also appended to the result set? With proper caching policies, I don't think this implementation will be all that painful. (Sanity check?) So having said that, are there any features or mechanisms in Solr/Lucene that you would recommend / are there any best practices we should be aware of to help us with the migration? Appreciate the help. ig -- View this message in context: http://www.nabble.com/Solr-%28text%29-%3C%3E-RDMBS-%28dynamic-data%29---best-practies--tp17093678p17093678.html Sent from the Solr - User mailing list archive at Nabble.com.
complex queries
I don't think this is possible, but I figure that I would ask. So, I want to find documents that match a search term and where a field in those documents are also in the results of a subquery. Basically, I am looking for the Solr equivalent of doing a SQL IN clause. As I said, I don't think it is possible and would be highly suprised if it was.
Re: complex queries
On May 6, 2008, at 8:57 PM, Kevin Osborn wrote: I don't think this is possible, but I figure that I would ask. So, I want to find documents that match a search term and where a field in those documents are also in the results of a subquery. Basically, I am looking for the Solr equivalent of doing a SQL IN clause. "search clause" AND field:(value1 OR value2 OR value3) does that do the trick for you?If not, could you elaborate with an example? Erik
Re: Searching for empty fields
Hi, Not sure if this is what you want, but to search for 'empty' fields we use something like this: (*:* AND -color:[* TO *]) Hope that helps. Brendan On May 6, 2008, at 6:43 PM, Daniel Andersson wrote: Hi (again) One of the fields in my database is color. It can either contain a value (blue, red etc) or be blank. When I perform a search with facet counts on, I get a count for "_empty_". How do I go about searching for this? I've tried color:"" which gives me an error. Same with color:. color:_empty_ returns nothing at all. Thanks in advance! / d
RE: Help optimizing
There are two integer types, 'sint' and 'integer'. On an integer, you cannot do a range check (that makes sense). But! Lucene sort makes an array of integers for every record. On an integer field, it creates an integer array. On any other kind of field, each array item has a lot more. So, if you want fast sorts with small memory footprint, you want 'integer' = 20070310, not 'sint' = 20070310. We did exactly this for exactly this reason. -Original Message- From: Daniel Andersson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 06, 2008 2:44 PM To: solr-user@lucene.apache.org Subject: Re: Help optimizing Thanks Otis! On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote: > You have a lot of fields of type text, but a number of field sound > like they really need not be tokenized and should thus be of type > string. I've changed quite a few of them over to string. Still not sure about the difference between 'string' and 'text' :-/ > Do you really need 6 warming searchers? That I have no idea about. Currently it's a very small site, well, visitor-wise anyway. > I think "date" type is pretty granular. Do you really need that type > of precision? Probably not, have changed it to sint and will index the date in this format 20070310, which should do the trick. > I don't have shell handy here to check, but is that 'M' in -Xmx... > recognized, or should it be lowercase 'm'? "Append the letter k or K to indicate kilobytes or the letter m or M to indicate megabytes.", so yeah, should recognize it. > Have you noticed anything weird while looking at the Solr Java process > with jConsole? I'm not very familiar with Java, so no idea what jConsole is :-/ Will be re-indexing tomorrow with the date->sint and text->string changes, will report back after it's done. Cheers, Daniel
Re: Help optimizing
Daniel - regarding query time - yes, look at the response (assuming you are using XML responses) and look for "Qtime" in the top part of the response. That's the number of milliseconds it took to execute the query. This time does not include the network time (request to Solr + time to send the whole response back to the client). US <--> Sweden nice ;) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Daniel Andersson <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 6:01:01 PM > Subject: Re: Help optimizing > > > On May 6, 2008, at 2:19 PM, Grant Ingersoll wrote: > > > On May 3, 2008, at 1:06 PM, Daniel Andersson wrote: > > > >> When performing a search, the results vary between 1.5 seconds up > >> to 60 seconds. > >> > > Is this pure Solr time or overall application time? I ask, b/c it > > is often the case that people are measuring application time and the > > problem lies in the application, so I just want to clarify. > > It's 1.5 to send the command to Solr, wait for it to search and get > the data back. > > The web server is located in the US and the Solr-machine is in Sweden > (don't ask), so I can see it taking a while to send data back and > forth, so getting the searches below 1.5s is not something I'm > expecting. I "just" want to get away from the >5s searches. > > Is there a way of getting Solr to output the total time spent on any > command? Just so I can eliminate some odd network problem/error. > > > > Also, have you done any profiling to see where the hotspots are? > > I have not. Not a Java person, so not sure how to do this. Is there > something in the Solr admin that will allow me to do this? Have looked > around and read what I could find in the Wiki, but didn't find > anything that looked like profiling. > > Cheers, > Daniel >
Re: Help optimizing
Daniel, The main difference is that string type fields are not tokenized, while text type fields are. Example: input text: milk with honey is god String fields will end up with a single token: "milk with honey is god" Text fields will end up with 5 tokens (assuming no stop word filtering): "milk", "with", "honey", "is", "god" Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Daniel Andersson <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 5:43:44 PM > Subject: Re: Help optimizing > > Thanks Otis! > > > On May 4, 2008, at 4:32 AM, Otis Gospodnetic wrote: > > > You have a lot of fields of type text, but a number of field sound > > like they really need not be tokenized and should thus be of type > > string. > > I've changed quite a few of them over to string. Still not sure about > the difference between 'string' and 'text' :-/ > > > > Do you really need 6 warming searchers? > > That I have no idea about. Currently it's a very small site, well, > visitor-wise anyway. > > > > I think "date" type is pretty granular. Do you really need that > > type of precision? > > Probably not, have changed it to sint and will index the date in this > format 20070310, which should do the trick. > > > > I don't have shell handy here to check, but is that 'M' in -Xmx... > > recognized, or should it be lowercase 'm'? > > "Append the letter k or K to indicate kilobytes or the letter m or M > to indicate megabytes.", so yeah, should recognize it. > > > > Have you noticed anything weird while looking at the Solr Java > > process with jConsole? > > I'm not very familiar with Java, so no idea what jConsole is :-/ > > > Will be re-indexing tomorrow with the date->sint and text->string > changes, will report back after it's done. > > Cheers, > Daniel >
Re: Composition of multiple smaller fields into another larger field?
Brian, I think most people would just manipulate the data prior to sending it to Solr for indexing but you don't want that. Your composeField proposal looks fine to me - I can't think of a problem there. It sounds like you are asking about the language/syntax for field specification. Could/should you not use the ${fifi} syntax? We already use that in solrconfig.xml, for example. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Brian Johnson <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 2:53:13 PM > Subject: Composition of multiple smaller fields into another larger field? > > I am interested in using the suggest feature against a composition of other > more > granular facets. Let me provide an example to help explain my problem and > proposed approaches. > > Say I have a set of facets for these artifacts: > > > > > So far things work OK. Now I want my suggest feature to work on a composition > equivalent to > > {city}, {state} {zipcode} > > I have these fields defined per the suggestions on adding suggest > capabilities. > I'm experimenting so I am trying both options. > > > > stored="true"/> > > I would like to 'compose' the value for these 2 suggest fields based on the > existing 'atomic' fields. The copyField feature doesn't get me the whole way > there but I am interested in a similar mechanism. > > 1) Is there an existing feature, approach, mechanism, ... to get this done > that > I'm just not aware of? > > 2) Assuming that #1 is 'no', then would this be a generally useful feature to > add in? If so how would people like this to be done? > > Obviously I can push this down into the document preparation myself outside > of > Solr. I would prefer to have a mechanism to handle this in the schema.xml > since > I don't want to do any real manipulation/transformation of the data elements > at > this point. Here was an initial thought on what it might look like... > > Here source is formatted similar to > java.text.MessageFormat but with named rather than indexed > substitutions so that. > > > Here source is formatted similar to Velocity templates. > > > I am not interested in creating a new template language or pulling in a new > dependency to get this done though (velocity, freemarker, ...) per se. I just > want to do some simple composition. If folks think this is a good idea > though, > it could be setup like this instead. > > > class="solr.VelocityTemplateFactory" /> > > template_filename.vm file contains the following line > $city, $state $zipcode > > Any feedback would be appreciated. > > Thanks, > > Brian
Re: Solr (text) <> RDMBS (dynamic data) - best practies?
AideRSS, eh, nice, welcome :) Since for 1) you will have to go to your DB, why not just store the retrieved data somewhere (JVM, memcached...) and simply re-use it for 2? * get query * get data from DB for filtering * store data from DB in cache * run query * write response using custom response writer? (this may not be right, I'd have to check) that grabs the extra data from cache and includes it with each hit Maybe I'm over-simplifying something... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: igrigorik <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 8:26:17 PM > Subject: Solr (text) <> RDMBS (dynamic data) - best practies? > > > We're investigating migrating from an RDMBS to Solr to add text search > support, as well as, offload the text storage from our RDMBS (which is > arguably not designed for this kind of stuff).. While whiteboarding the > basic requirements, we realized that we have some 'special' requirements: > > Basic setup: > - A subset of data is immutable and is perfectly suited to be stored in > Solr > - A subset of data is dynamic and changes frequently (should still be > stored in an RDBMS) > > Question: > 1) We need access to dynamic data stored in our RDBMS to perform filtering > 2) When Solr returns its result set, we need to augment the results with > meta-data from our RDMBS > > For (1), based on my research we're in fairly standard territory: implement > a custom Filter, or a ChainedFilter to return a bitmask based on an RDBMS > query. However, can this step be somehow coupled with (2), where the data we > retrieved in (1) is also appended to the result set? > > With proper caching policies, I don't think this implementation will be all > that painful. (Sanity check?) > > So having said that, are there any features or mechanisms in Solr/Lucene > that you would recommend / are there any best practices we should be aware > of to help us with the migration? > > Appreciate the help. > > ig > -- > View this message in context: > http://www.nabble.com/Solr-%28text%29-%3C%3E-RDMBS-%28dynamic-data%29---best-practies--tp17093678p17093678.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: complex queries
Unfortunately, I don't know value1, value2, value3, etc. This goes back to my question about access control lists. So, I have all my documents, which are products. And then someone suggested that I have a separate user document type with a multi-value field of productIDs. In SQL, this would be the equivalent of: "SELECT * from product where ... AND productId IN (SELECT productId from user where userId = ?)" So, my main search clause is a normal search. But, I want to filter the results by the values in a completely different document where they match on some field. - Original Message From: Erik Hatcher <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, May 6, 2008 6:03:34 PM Subject: Re: complex queries On May 6, 2008, at 8:57 PM, Kevin Osborn wrote: > I don't think this is possible, but I figure that I would ask. > > So, I want to find documents that match a search term and where a > field in those documents are also in the results of a subquery. > Basically, I am looking for the Solr equivalent of doing a SQL IN > clause. "search clause" AND field:(value1 OR value2 OR value3) does that do the trick for you?If not, could you elaborate with an example? Erik
Multiple Index creation
Hi All, I tried to search within the SOLR archive, but could not find the answer of how can I create multiple index within SOLR. In case of lucene I can create an IndexWriter with a new Index, and hence can have multiple Index, I can allow search on that multiple index. How can I create in Solr a multiple Index. --Thanks and Regards Vaijanath
Re: Solr (text) <> RDMBS (dynamic data) - best practies?
Otis Gospodnetic wrote: > AideRSS, eh, nice, welcome :) ;-) > > Since for 1) you will have to go to your DB, why not just store the > retrieved data somewhere (JVM, memcached...) and simply re-use it for 2? > * get query > * get data from DB for filtering > * store data from DB in cache > * run query > * write response using custom response writer? (this may not be right, > I'd have to check) that grabs the extra data from cache and includes > it with each hit Right, that's what we figured on first attempt as well. I'm curious if there is a 'cleaner' way to do this. Essentially, retrieve a set of results, decorate it with a bunch of dynamic fields, and then filter via those fields. -- View this message in context: http://www.nabble.com/Solr-%28text%29-%3C%3E-RDMBS-%28dynamic-data%29---best-practies--tp17093678p17097036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Your valuable suggestion on autocomplete
Query logs are full of junk. We fill from the correct values in the search index. We used to fill directly from the DB, but there were updates in the DB that weren't in Solr. Every two hours, it does a search for "type:movie" and retrieves the title field for every match. Those are loaded into the ternary search tree. The search box completes movie titles. Very helpful for Ratatouille or Koyaanisqatsi. You can try it on the non-member pages at www.netflix.com, click the "Browse" tab instead of signing up. It would be OK if you signed up, of course. The number of hits per request are sized to match the max cached request in our middle tier HTTP server. We have over twenty front end webapps and five back end Solr servers. wunder On 5/6/08 9:50 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > Hi Wunder, > > - Original Message >> From: Walter Underwood <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, May 6, 2008 11:21:31 AM >> Subject: Re: Your valuable suggestion on autocomplete >> >> I wrote a prefix map (ternary search tree) in Java and load it with >> queries to Solr every two hours. That keeps the autocomplete and >> search index in sync. > > What do you mean by the two staying in sync? If you fill the TST with info > from query logs, how does that make it stay in sync with the index? Or do you > mean you look for queries with >N hits (maybe even N=1) and only feed those > into TST, thus ensuring autocomplete always suggests queries that yield hits? > > Thanks, > Otis > >> Our autocomplete gets over 25M hits per day, so we don't really >> want to send all that traffic to Solr. >> >> wunder >> >> On 5/6/08 2:37 AM, "Nishant Soni" wrote: >> >>> Just FYI, we have also implemented a Trie approach (outside of solr, even >>> though our mail search uses solr) at the link in the signature. >>> >>> You can try out the auto-completion working on the comparison tool on the >>> home >>> page. >>> >>> - nishant >>> >>> www.reviewgist.com >>> >>> >>> >>> >>> >>> - Original Message >>> From: Vaijanath N. Rao >>> To: solr-user@lucene.apache.org >>> Sent: Tuesday, May 6, 2008 12:43:25 PM >>> Subject: Re: Your valuable suggestion on autocomplete >>> >>> Hi Rantjil Bould, >>> >>> I would suggest you to give a thought on Trie data structure which is >>> used for auto-complete. Hitting Solr for every prefix looks time >>> consuming job, but I might be wrong. I have Trie implementation and it >>> works very fast (of course it is in memory data structure unlike solr >>> index which lies on disk) >>> >>> --Thanks and Regards >>> Vaijanath >>> >>> >>> >>> Rantjil Bould wrote: Hi Group, I have already got some valuable suggestions from group. Based on that, I have come out with following process to finally implement autocomplete like fetaure in my system 1- Index the whole documents 2- Extract all terms using indexReader's terms() method I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like to get absolute terms i.e. vlanand. The field definition in solr is words="stopwords.txt" enablePositionIncrements="true"> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"> protected="protwords.txt"> ignoreCase="true" expand="true"> words="stopwords.txt"> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"> protected="protwords.txt"> Would appreciate your input to get absolute terms?? 3- For each term, extract documents containing those term using termDocs() method 4- Create one more index with fields, term, frequency and docNo. This index would be used for autocomplete feature. 5- Any letter typed by user in search field, use Ajax script (like scriptaculous or JQuery) to extract all terms using prefix query. 6- Based on search term selected by user, keep track of document nos in which this term belongs. 7- For next search term selection using documents nos to select all terms excluding currently selected term. This somehow works. As new to SOlr ans also to Lucene, I would like to know in case it can be improved? - RB >>> >>> >>> >>> >>> __ >>> __ >>> Be a better friend, newshound, and >>> know-it-all with Yahoo! Mobile. Try it now. >>> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ >> >> > >
Re: access control list
: I thought of that method. The problem I was thinking of is that if a new : customer is added, that could potentially cause an update of about : 2,000,000 records or so. Fortunately, this does not happen everyday. It FWIW: at some point i nthe future, LUCENE-1231 might make this type of thing much easier ... the particularly adventerous might wnat to experiment with trying to integrate that patch into Solr. -Hoss
Re: Multiple Index creation
Hi Vajinath, I believe you want multiple schemas. Take a look at http://wiki.apache.org/solr/MultiCore Note that this feature is available only with the Solr 1.3 trunk code. With Solr 1.2, you can have two instances of Tomcat or two solr webapps deployed in one Tomcat instance. You can also think about creating one schema which can accomodate everything. On Wed, May 7, 2008 at 9:40 AM, Vaijanath N. Rao <[EMAIL PROTECTED]> wrote: > Hi All, > > I tried to search within the SOLR archive, but could not find the answer > of how can I create multiple index within SOLR. In case of lucene I can > create an IndexWriter with a new Index, and hence can have multiple Index, I > can allow search on that multiple index. How can I create in Solr a multiple > Index. > > --Thanks and Regards > Vaijanath > > -- Regards, Shalin Shekhar Mangar.
Re: SOLR-470 & default value in schema with NOW (update)
: Second Try: : * same date column setup : * 2 files uploaded into the index. Updated the file with the timestamps : to be 3 digit millis to 'match' what NOW was supposed to be doing. I : left the other file alone. : --> got the exception.. check data in Luke to confirm it was all 3 digit : millis and it was. The two exceptions you cited both indicate there was at least one date instance with no millis included -- NOW can't do that. it always inlcudes millis (even though it shouldn't). are you certain you didn't miss an instance in the data you indexed (or didn't purge all previous values from the index before rebuilding?) -Hoss
Re: stemming the synonyms
: things related to vacation. However, when I enter in travelling it does : not find anything related to vacation, I assume it's because I'm not : explicitly putting travelling in the synonyms file. Is there a way to : activate stemming for all of the synonym terms in the file without having to : manually put 'travel' and 'travelling' and 'travelers' in the synonym file? : Thanks. stemming and synonyms are part of analysis -- you can pick any ordering you want for analysis by changing hte order of the TokenFilterFactories for your field types. just put the synonym filter before the stemming filter. -Hoss
Re: Sorting results
: I perform the search like Matahari. The returned results may include "A big : life: Matahari", "War and Matahari", "Matahari" (in that order). How can I : return results by sorting at first the results that matches the begiging of : string? I want to score higher the results that starts with search string : than the other matches. what you're describing is mainly a function of scoring (sorting may be by score, or by a concrete field value) scoring documents higher when the term appears closer to the begining of the field can be done using a SpanFirstQuery, but Solr doesn't use SpanFirstQueries by default -- you'd need to write a plugin. FWIW: if what you really want is to score the documents higher because the titles are *shorter* and the word is a bigger precentage of the title, then that should already be happeing ... i'm suprised by the ordering that you said you're getting. what does the debugQuery output look like for those 3 docs? -Hoss
Re: top documented in faceted query?
: I could then get the top document for each value by issuing a sequence of : queries : q=x&fq=f:a&row=1 : q=x&fq=f:b&row=1 : q=x&fq=f:c&row=1 : ... : : Is there a way to do this in one query? Only if you write your own plugin ... Solr doesn't have anything do it for you. -Hoss
Re: Solr (text) <> RDMBS (dynamic data) - best practies?
* write response using custom response writer? (this may not be right, I'd have to check) that grabs the extra data from cache and includes it with each hit Not a custom response writer... use a custom QueryComponent to augment the document. Localsolr has a good example of this. ryan
Re: SOLR-470 & default value in schema with NOW (update)
Unfortunately that data set is long gone, but I can say that I am quite sure the data was consistently sent to Solr with 3 digits of millis when I provided the data in the documents. I confirmed this using luke and the data was consistent, but the exception persisted. I looked into the associated classes and didn't see anything obviously wrong. The process I was using to insure each iteration was isolated was to move the lucene index folder to a new name and let the "java -jar start.jar" invocation create a new empty lucene index and index folder. The problem appeared for me any time I tried to mix using the default value NOW with any documents that had this data. That should be a 2 document set to recreate the problem if it is the case. I didn't try that hard to isolate the problem, I just changed my data and removed the default from the schema. Thanks, Brian - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, May 6, 2008 9:41:20 PM Subject: Re: SOLR-470 & default value in schema with NOW (update) : Second Try: : * same date column setup : * 2 files uploaded into the index. Updated the file with the timestamps : to be 3 digit millis to 'match' what NOW was supposed to be doing. I : left the other file alone. : --> got the exception.. check data in Luke to confirm it was all 3 digit : millis and it was. The two exceptions you cited both indicate there was at least one date instance with no millis included -- NOW can't do that. it always inlcudes millis (even though it shouldn't). are you certain you didn't miss an instance in the data you indexed (or didn't purge all previous values from the index before rebuilding?) -Hoss
Re: Composition of multiple smaller fields into another larger field?
Thank you for the reference to the ${foo} format. I am looking at trying to minimize the redundant data in my document feed since I have lots of records with an overall small footprint per record. This simple change can save me maybe 20% of my data set size. It also provides a mechanism to isolate one (probably small) class of schema changes from the documents. I don't know how unique my situation is among the community. Brian - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, May 6, 2008 8:03:33 PM Subject: Re: Composition of multiple smaller fields into another larger field? Brian, I think most people would just manipulate the data prior to sending it to Solr for indexing but you don't want that. Your composeField proposal looks fine to me - I can't think of a problem there. It sounds like you are asking about the language/syntax for field specification. Could/should you not use the ${fifi} syntax? We already use that in solrconfig.xml, for example. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Brian Johnson <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 6, 2008 2:53:13 PM > Subject: Composition of multiple smaller fields into another larger field? > > I am interested in using the suggest feature against a composition of other > more > granular facets. Let me provide an example to help explain my problem and > proposed approaches. > > Say I have a set of facets for these artifacts: > > > > > So far things work OK. Now I want my suggest feature to work on a composition > equivalent to > > {city}, {state} {zipcode} > > I have these fields defined per the suggestions on adding suggest > capabilities. > I'm experimenting so I am trying both options. > > > > stored="true"/> > > I would like to 'compose' the value for these 2 suggest fields based on the > existing 'atomic' fields. The copyField feature doesn't get me the whole way > there but I am interested in a similar mechanism. > > 1) Is there an existing feature, approach, mechanism, ... to get this done > that > I'm just not aware of? > > 2) Assuming that #1 is 'no', then would this be a generally useful feature to > add in? If so how would people like this to be done? > > Obviously I can push this down into the document preparation myself outside > of > Solr. I would prefer to have a mechanism to handle this in the schema.xml > since > I don't want to do any real manipulation/transformation of the data elements > at > this point. Here was an initial thought on what it might look like... > > Here source is formatted similar to > java.text.MessageFormat but with named rather than indexed > substitutions so that. > > > Here source is formatted similar to Velocity templates. > > > I am not interested in creating a new template language or pulling in a new > dependency to get this done though (velocity, freemarker, ...) per se. I just > want to do some simple composition. If folks think this is a good idea > though, > it could be setup like this instead. > > > class="solr.VelocityTemplateFactory" /> > > template_filename.vm file contains the following line > $city, $state $zipcode > > Any feedback would be appreciated. > > Thanks, > > Brian