AW: AW: AW: My First Solr
Hi Brian, i don´t find any documentation or howto for the DisMaxQueryHandler? Regards Thomas -Ursprüngliche Nachricht- Von: Brian Carmalt [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 08:52 An: solr-user@lucene.apache.org Betreff: Re: AW: AW: My First Solr The DisMaxQueryHandler is your friend. Am Freitag, den 13.06.2008, 08:29 +0200 schrieb Thomas Lauer: > ok, i find my files now. can I make all files to the default search file? > > Regards Thomas > > -Ursprüngliche Nachricht- > Von: Brian Carmalt [mailto:[EMAIL PROTECTED] > Gesendet: Freitag, 13. Juni 2008 08:03 > An: solr-user@lucene.apache.org > Betreff: Re: AW: My First Solr > > Do you see if the document update is sucessful? When you start solr with > java -jar start.jar for the example, Solr will list the the document id > of the docs that you are adding and tell you how long the update took. > > A simple but brute force method to findout if a document has been > commited is to stop the server and then restart it. > > You can also use the solr/admin/stats.jsp page to see if the docs are > there. > > After looking at your query in the results you posted, I would bet that > you are not specifying a search field. try searching for "anwendung:KIS" > or "id:[1 TO *]" to see all the docs in you index. > > Brian > > Am Freitag, den 13.06.2008, 07:40 +0200 schrieb Thomas Lauer: > > i have tested: > > SimplePostTool: version 1.2 > > SimplePostTool: WARNING: Make sure your XML documents are encoded in > > UTF-8, other encodings are not currently supported > > SimplePostTool: POSTing files to http://localhost:8983/solr/update.. > > SimplePostTool: POSTing file import_sample.xml > > SimplePostTool: COMMITting Solr index changes.. > > > > __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 > (20080612) __ > > E-Mail wurde geprüft mit ESET NOD32 Antivirus. > > http://www.eset.com > __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 (20080612) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com
Re: My First Solr
http://wiki.apache.org/solr/DisMaxRequestHandler In solrconfig.xml there are example configurations for the DisMax. Sorry I told you the wrong name, not enough coffee this morning. Brian. Am Freitag, den 13.06.2008, 09:40 +0200 schrieb Thomas Lauer:
AW: My First Solr
ok, my dismax explicit 0.01 beschreibung^0.5 ordner^1.0 register^1.2 Benutzer^1.5 guid^10.0 mandant^1.1 beschreibung^0.2 ordner^1.1 register^1.5 manu^1.4 manu_exact^1.9 ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 guid,beschreibung,mandant,Benutzer 2<-1 5<-2 6<90% 100 *:* must i make a reindex? I seek with this url http://localhost:8983/solr/select?indent=on&version=2.2&q=bonow&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther=&hl.fl= The response is: HTTP Status 400 - undefined field text type Status report message undefined field text description The request sent by the client was syntactically incorrect (undefined field text). Regards Thomas -Ursprüngliche Nachricht- Von: Brian Carmalt [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 09:50 An: solr-user@lucene.apache.org Betreff: Re: My First Solr http://wiki.apache.org/solr/DisMaxRequestHandler In solrconfig.xml there are example configurations for the DisMax. Sorry I told you the wrong name, not enough coffee this morning. Brian. Am Freitag, den 13.06.2008, 09:40 +0200 schrieb Thomas Lauer: __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 (20080612) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com
Problems finding solr/home using JNDI on tomcat
Hi I'm using solr 1.2.0 on a Tomcat 5.5 engine And have copied a solr.xml in catalina_home/conf/hostname value="/solr/example/solr"/> And Tomcat certainly reads the solr.xml file, because solr is deployed fine. However it cannot find the environment property, because there is a javax.naming.NoInitialContextException when trying to lookup the JNDI name. 13-06-2008 10:24:46 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir Any suggestions for how to solve that? Regards Kjeld
Re: Problems finding solr/home using JNDI on tomcat
Hi, I'm using tomcat5.5 too. I believe you need to specify override to be true. HTH, mit freundlichen Grüßen, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 -Ursprüngliche Nachricht- Von: Kjeld Froberg [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:22 An: solr-user@lucene.apache.org Betreff: Problems finding solr/home using JNDI on tomcat Hi I'm using solr 1.2.0 on a Tomcat 5.5 engine And have copied a solr.xml in catalina_home/conf/hostname And Tomcat certainly reads the solr.xml file, because solr is deployed fine. However it cannot find the environment property, because there is a javax.naming.NoInitialContextException when trying to lookup the JNDI name. 13-06-2008 10:24:46 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir Any suggestions for how to solve that? Regards Kjeld
Re: Problems finding solr/home using JNDI on tomcat
Hi, Same problem. Contextfile: value="/solr/example/solr" override="true"/> Output. 13-06-2008 11:36:20 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 11:36:20 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 11:36:20 org.apache.solr.core.Config getInstanceDir Regards Kjeld Stefan Oestreicher skrev: Hi, I'm using tomcat5.5 too. I believe you need to specify override to be true. HTH, mit freundlichen Grüßen, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 -Ursprüngliche Nachricht- Von: Kjeld Froberg [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:22 An: solr-user@lucene.apache.org Betreff: Problems finding solr/home using JNDI on tomcat Hi I'm using solr 1.2.0 on a Tomcat 5.5 engine And have copied a solr.xml in catalina_home/conf/hostname And Tomcat certainly reads the solr.xml file, because solr is deployed fine. However it cannot find the environment property, because there is a javax.naming.NoInitialContextException when trying to lookup the JNDI name. 13-06-2008 10:24:46 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir Any suggestions for how to solve that? Regards Kjeld
Re: AW: My First Solr
No, you do not have to reindex. You do have to restart the server. The bf has fields listed that are not in your document: popularity, price. delete the bf field, you do not need it unless you want to use boost functions. Brian Am Freitag, den 13.06.2008, 10:36 +0200 schrieb Thomas Lauer: > ok, > > my dismax > > > > explicit > 0.01 > > beschreibung^0.5 ordner^1.0 register^1.2 Benutzer^1.5 guid^10.0 > mandant^1.1 > > > beschreibung^0.2 ordner^1.1 register^1.5 manu^1.4 manu_exact^1.9 > > > ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 > > > guid,beschreibung,mandant,Benutzer > > > 2<-1 5<-2 6<90% > > 100 > *:* > > > > must i make a reindex? > > I seek with this url > http://localhost:8983/solr/select?indent=on&version=2.2&q=bonow&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther=&hl.fl= > > The response is: > HTTP Status 400 - undefined field text > > type Status report > message undefined field text > description The request sent by the client was syntactically incorrect > (undefined field text). > > > Regards Thomas > > > -Ursprüngliche Nachricht- > Von: Brian Carmalt [mailto:[EMAIL PROTECTED] > Gesendet: Freitag, 13. Juni 2008 09:50 > An: solr-user@lucene.apache.org > Betreff: Re: My First Solr > > http://wiki.apache.org/solr/DisMaxRequestHandler > > In solrconfig.xml there are example configurations for the DisMax. > > Sorry I told you the wrong name, not enough coffee this morning. > > Brian. > > > Am Freitag, den 13.06.2008, 09:40 +0200 schrieb Thomas Lauer: > > > > __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 > (20080612) __ > > E-Mail wurde geprüft mit ESET NOD32 Antivirus. > > http://www.eset.com >
AW: AW: My First Solr
HI Brian thank you for help. Where are you from? Regards Thomas -Ursprüngliche Nachricht- Von: Brian Carmalt [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:43 An: solr-user@lucene.apache.org Betreff: Re: AW: My First Solr No, you do not have to reindex. You do have to restart the server. The bf has fields listed that are not in your document: popularity, price. delete the bf field, you do not need it unless you want to use boost functions. Brian Am Freitag, den 13.06.2008, 10:36 +0200 schrieb Thomas Lauer: > ok, > > my dismax > > > > explicit > 0.01 > > beschreibung^0.5 ordner^1.0 register^1.2 Benutzer^1.5 guid^10.0 > mandant^1.1 > > > beschreibung^0.2 ordner^1.1 register^1.5 manu^1.4 manu_exact^1.9 > > > ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 > > > guid,beschreibung,mandant,Benutzer > > > 2<-1 5<-2 6<90% > > 100 > *:* > > > > must i make a reindex? > > I seek with this url > http://localhost:8983/solr/select?indent=on&version=2.2&q=bonow&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther=&hl.fl= > > The response is: > HTTP Status 400 - undefined field text > > type Status report > message undefined field text > description The request sent by the client was syntactically incorrect > (undefined field text). > > > Regards Thomas > > > -Ursprüngliche Nachricht- > Von: Brian Carmalt [mailto:[EMAIL PROTECTED] > Gesendet: Freitag, 13. Juni 2008 09:50 > An: solr-user@lucene.apache.org > Betreff: Re: My First Solr > > http://wiki.apache.org/solr/DisMaxRequestHandler > > In solrconfig.xml there are example configurations for the DisMax. > > Sorry I told you the wrong name, not enough coffee this morning. > > Brian. > > > Am Freitag, den 13.06.2008, 09:40 +0200 schrieb Thomas Lauer: > > > > __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 > (20080612) __ > > E-Mail wurde geprüft mit ESET NOD32 Antivirus. > > http://www.eset.com >
Re: My First Solr
Hi, Thomas! Please, can you give me instructions on how did you installed Solr on Tomcat? Quoting Thomas Lauer : HI, i have installed my first solr on tomcat. I have modify my shema.xml for my XML´s and I have import with the post.jar some xml files. tomcat runs solr/admin runs post.jar imports files but I can´t find my files. the reponse ist always 0 0 10 0 on KIS 2.2 My files in the attachment Regards Thomas Ar cieņu, Mihails Links: -- [1] mailto:[EMAIL PROTECTED]
Re: Problems finding solr/home using JNDI on tomcat
Unfortunately I'm neither a solr nor a tomcat expert. My setup is as follows: solr.xml in /etc/tomcat5.5/Catalina//solr.xml And my is /data/java/dev02 Is your solr.home writable by tomcat and outside of ? HTH, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 -Ursprüngliche Nachricht- Von: Kjeld Froberg [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:42 An: solr-user@lucene.apache.org Betreff: Re: Problems finding solr/home using JNDI on tomcat Hi, Same problem. Contextfile: Output. 13-06-2008 11:36:20 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 11:36:20 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 11:36:20 org.apache.solr.core.Config getInstanceDir Regards Kjeld Stefan Oestreicher skrev: Hi, I'm using tomcat5.5 too. I believe you need to specify override to be true. HTH, mit freundlichen Grüßen, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 -Ursprüngliche Nachricht- Von: Kjeld Froberg [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:22 An: solr-user@lucene.apache.org Betreff: Problems finding solr/home using JNDI on tomcat Hi I'm using solr 1.2.0 on a Tomcat 5.5 engine And have copied a solr.xml in catalina_home/conf/hostname And Tomcat certainly reads the solr.xml file, because solr is deployed fine. However it cannot find the environment property, because there is a javax.naming.NoInitialContextException when trying to lookup the JNDI name. 13-06-2008 10:24:46 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir Any suggestions for how to solve that? Regards Kjeld
Re: Problems finding solr/home using JNDI on tomcat
Hi, yes, it is writable for tomcat. Thanks for trying to help Kjeld Stefan Oestreicher skrev: Unfortunately I'm neither a solr nor a tomcat expert. My setup is as follows: solr.xml in /etc/tomcat5.5/Catalina//solr.xml And my ? HTH, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 -Ursprüngliche Nachricht- Von: Kjeld Froberg [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:42 An: solr-user@lucene.apache.org Betreff: Re: Problems finding solr/home using JNDI on tomcat Hi, Same problem. Contextfile: Output. 13-06-2008 11:36:20 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 11:36:20 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 11:36:20 org.apache.solr.core.Config getInstanceDir Regards Kjeld Stefan Oestreicher skrev: Hi, I'm using tomcat5.5 too. I believe you need to specify override to be true. HTH, mit freundlichen Grüßen, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6 -Ursprüngliche Nachricht- Von: Kjeld Froberg [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 11:22 An: solr-user@lucene.apache.org Betreff: Problems finding solr/home using JNDI on tomcat Hi I'm using solr 1.2.0 on a Tomcat 5.5 engine And have copied a solr.xml in catalina_home/conf/hostname And Tomcat certainly reads the solr.xml file, because solr is deployed fine. However it cannot find the environment property, because there is a javax.naming.NoInitialContextException when trying to lookup the JNDI name. 13-06-2008 10:24:46 org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir INFO: JNDI not configured for Solr (NoInitialContextEx) 13-06-2008 10:24:46 org.apache.solr.core.Config getInstanceDir Any suggestions for how to solve that? Regards Kjeld
Re: Rsyncd start and stop for multiple instances
The rsyncd-start scripts gets the data_dir path from the command line and create a rsyncd.conf on the fly exporting the path as the rsync module named "solr". The salves need the data_dir path on the master to look for the latest snapshot. But the rsync command used by the slaves relies on the rsync module name "solr" to do the file transfer using rsyncd. Bill On Tue, Jun 10, 2008 at 4:24 AM, Jacob Singh <[EMAIL PROTECTED]> wrote: > Hey folks, > > I'm messing around with running multiple indexes on the same server > using Jetty contexts. I've got the running groovy thanks to the > tutorial on the wiki, however I'm a little confused how the collection > distribution stuff will work for replication. > > The rsyncd-enable command is simple enough, but the rsyncd-start command > takes a -d (data dir) as an argument... Since I'm hosting 4 different > instances, all with their own data dirs, how do I do this? > > Also, you have to specify the master data dir when you are connecting > from the slave anyway, so why does it need to be specified when I start > the daemon? If I just start it with any old data dir will it work for > anything the user running it has perms on? > > Thanks, > Jacob >
Memory problems when highlight with not very big index
Hi users/developers, I´m new with solr and i have been reading the list for a few hours but i didn´t found anything to solve my doubt. I´m using 5GB index in a 2GB RAM maquine, and i´m trying to optimize the solr configuration for searching. I´ve have good searching times but when i activate highlighting the RAM memory grows a lot, it grows the same as if a want to retrieve the content of the files found. I´m not sure if for highlighting solr needs to allocate all the content of the resulting documents to be able to highlight them. How it works? It´s possible to only allocate the 10 first results to make the snippet of only those results and use less memory? Thanks in advance. Rober.
Re: Memory problems when highlight with not very big index
On Fri, Jun 13, 2008 at 1:07 PM, Roberto Nieto <[EMAIL PROTECTED]> wrote: > It´s possible to only > allocate the 10 first results to make the snippet of only those results and > use less memory? That's how it currently works. But there is a Document cache to make things more efficient. If you have large documents, you might want to decrease this from it's default size (see solrconfig.xml) which is currently 512. Perhaps move it down to 60 (which would allow for 6 concurrent requests of 10 docs each w/o re-fetching the doc between highlighting and response writing). -Yonik
Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3?
The Solr Developers would like some feedback from the user community regarding some changes that have been made to StopFilterFactory, SynonymFilterFactory, and EnglishProterFilterFactory since Solr 1.2 which breaks backwards compatibility in situations where client Java code directly construction and initializes instances of these classes. These changes do *NOT* affect Solr users who use Solr "out of the box". The only people who might possibly be impacted by these changes are users who write custom Java code using the Solr APIs and directly construct instances (instead of getting them from an IndexSchema object) using code such as this StopFilterFactory f = new StopFilterFactory() f.init(new MapIf this does apply to you, please review SOLR-594 and the mailing list threads linked to from that issue and let us know (either by replying to this thread, or by posting a comment in the Jira issue) what you think about the proposed "solution" -- Documenting that when upgrading to Solr 1.3, any custom code like this would need to be changed like so... StopFilterFactory f = new StopFilterFactory() f.init(new MapOf the options available, it is our belief that this is: 1) the simplest approach; 2) benefits the majority of users automaticly; 3) adversely affects the fewest number of people; 4) affects those people in a relatively small way (requiring one new line of code). But we do want to verify that the number of people affected is in fact relatively small. https://issues.apache.org/jira/browse/SOLR-594 Thanks. -Hoss
Re: Marking Elevation
: Id like to use elevate.xml and the component to simulate a GSA "KeyMatch" : function (ie type in A get B back always) but it seems the elements are not : marked that they have been elevated. Is there any other way to accomplish : something like that w/o having to plug a Map into my SolrJ code? I don't know much about the QueryElevationComponent but trying it out with the example schema, i notice that the debug data does contain a "queryBoosting" section that seems to indicate which documents (by ID) where elevated. This seems like the kind of information that would be useful in the response even when debugQuery=false ... so i would suggest opening a Jira issue to request that (attaching a patch would increase the likelyhood of it being changed ... i assume it should be a fairly trivial patch) -Hoss
Re: Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3?
We use it out of the box. Our extensions are new filters or new request handlers, all configured through the XML files. wunder On 6/13/08 11:15 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > The Solr Developers would like some feedback from the user community > regarding some changes that have been made to StopFilterFactory, > SynonymFilterFactory, and EnglishProterFilterFactory since Solr 1.2 which > breaks backwards compatibility in situations where client Java code > directly construction and initializes instances of these classes. > > These changes do *NOT* affect Solr users who use Solr "out of the box". > > The only people who might possibly be impacted by these changes are users > who write custom Java code using the Solr APIs and directly construct > instances (instead of getting them from an IndexSchema object) using > code such as this > StopFilterFactory f = new StopFilterFactory() > f.init(new Map // now do something with f > > If this does not apply to you, you can safely ignore this thread. > > If this does apply to you, please review SOLR-594 and the mailing list > threads linked to from that issue and let us know (either by replying to > this thread, or by posting a comment in the Jira issue) what you think > about the proposed "solution" -- Documenting that when upgrading to Solr > 1.3, any custom code like this would need to be changed like so... > StopFilterFactory f = new StopFilterFactory() > f.init(new Map f.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader()); > // now do something with f > > Of the options available, it is our belief that this is: 1) the simplest > approach; 2) benefits the majority of users automaticly; 3) adversely > affects the fewest number of people; 4) affects those people in a > relatively small way (requiring one new line of code). But we do want > to verify that the number of people affected is in fact relatively small. > > https://issues.apache.org/jira/browse/SOLR-594 > > Thanks. > > > -Hoss >
QueryElevationComponent and forceElevation=true ?
I don't know much about QueryElevationComponent but perusing the wiki docs and trying it out with teh example configs i noticed that this URL didn't produce the output i expected... http://localhost:8983/solr/elevate?q=ipod&fl=id,price&sort=price+asc&forceElevation=true&enableElevation=true&debugQuery=true ..as far as i can tell, the forceElevation=true should cause "MA147LL/A" to always apear at the top, regardless of the sort -- but that doesn't seem to be the case. Am I reading the docs wrong, or is this a bug? -Hoss
Re: Memory problems when highlight with not very big index
Thanks for your fast answer, I think i tried to put default size to 0 and the problems persist but i will probe it on Monday again. The part that i can't understand very well is why if i desactivate highlighting the memory doesnt grows. It only uses doc cache if highlighting is used or if content retrieve is activated? Thnx Rober. 2008/6/13 Yonik Seeley <[EMAIL PROTECTED]>: > On Fri, Jun 13, 2008 at 1:07 PM, Roberto Nieto <[EMAIL PROTECTED]> > wrote: > > It´s possible to only > > allocate the 10 first results to make the snippet of only those results > and > > use less memory? > > That's how it currently works. > > But there is a Document cache to make things more efficient. > If you have large documents, you might want to decrease this from it's > default size (see solrconfig.xml) which is currently 512. Perhaps > move it down to 60 (which would allow for 6 concurrent requests of 10 > docs each w/o re-fetching the doc between highlighting and response > writing). > > -Yonik >
Re: Memory problems when highlight with not very big index
On Fri, Jun 13, 2008 at 3:30 PM, Roberto Nieto <[EMAIL PROTECTED]> wrote: > The part that i can't understand very well is why if i desactivate > highlighting the memory doesnt grows. > It only uses doc cache if highlighting is used or if content retrieve is > activated? Perhaps you are highlighting some fields that you normally don't return? What is "fl" vs "hl.fl"? -Yonik
Re: Problems using multicore
: I am getting problems running Solr-1.3-trunk with multicores. : : My multicore.xml file is: : : : : : : I have solr.home pointing the directory containing it. what exactly is that directory? what does "ls -al" in that direcory show you? what UID is weblogic running as? when you first start up your servlet container, you should see an INFO log message that starts with "looking for multicore.xml" -- what does it say? what do the log messages arround it (before and after it) say? what messages are logged before you get this exception? : java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in : classpath or '/var/opt/subacatalog/core/conf/', what is "/var/opt/subacatalog/core/" ? (is that your solr.home? the directory that multicore.xml is in?) -Hoss
Re: Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3?
FWIW - I have no problem with the change. Thanks, Brian - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, June 13, 2008 11:38:27 AM Subject: Re: Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3? We use it out of the box. Our extensions are new filters or new request handlers, all configured through the XML files. wunder On 6/13/08 11:15 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > The Solr Developers would like some feedback from the user community > regarding some changes that have been made to StopFilterFactory, > SynonymFilterFactory, and EnglishProterFilterFactory since Solr 1.2 which > breaks backwards compatibility in situations where client Java code > directly construction and initializes instances of these classes. > > These changes do *NOT* affect Solr users who use Solr "out of the box". > > The only people who might possibly be impacted by these changes are users > who write custom Java code using the Solr APIs and directly construct > instances (instead of getting them from an IndexSchema object) using > code such as this > StopFilterFactory f = new StopFilterFactory() > f.init(new Map // now do something with f > > If this does not apply to you, you can safely ignore this thread. > > If this does apply to you, please review SOLR-594 and the mailing list > threads linked to from that issue and let us know (either by replying to > this thread, or by posting a comment in the Jira issue) what you think > about the proposed "solution" -- Documenting that when upgrading to Solr > 1.3, any custom code like this would need to be changed like so... > StopFilterFactory f = new StopFilterFactory() > f.init(new Map f.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader()); > // now do something with f > > Of the options available, it is our belief that this is: 1) the simplest > approach; 2) benefits the majority of users automaticly; 3) adversely > affects the fewest number of people; 4) affects those people in a > relatively small way (requiring one new line of code). But we do want > to verify that the number of people affected is in fact relatively small. > > https://issues.apache.org/jira/browse/SOLR-594 > > Thanks. > > > -Hoss >
Re: Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3?
Same here. I took a look at the options you from the dev list and seems to me (3) user education should be fine. Thanks for all the great work. Brendan On Jun 13, 2008, at 4:37 PM, Brian Johnson wrote: FWIW - I have no problem with the change. Thanks, Brian - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, June 13, 2008 11:38:27 AM Subject: Re: Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3? We use it out of the box. Our extensions are new filters or new request handlers, all configured through the XML files. wunder On 6/13/08 11:15 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: The Solr Developers would like some feedback from the user community regarding some changes that have been made to StopFilterFactory, SynonymFilterFactory, and EnglishProterFilterFactory since Solr 1.2 which breaks backwards compatibility in situations where client Java code directly construction and initializes instances of these classes. These changes do *NOT* affect Solr users who use Solr "out of the box". The only people who might possibly be impacted by these changes are users who write custom Java code using the Solr APIs and directly construct instances (instead of getting them from an IndexSchema object) using code such as this StopFilterFactory f = new StopFilterFactory() f.init(new MapIf this does apply to you, please review SOLR-594 and the mailing list threads linked to from that issue and let us know (either by replying to this thread, or by posting a comment in the Jira issue) what you think about the proposed "solution" -- Documenting that when upgrading to Solr 1.3, any custom code like this would need to be changed like so... StopFilterFactory f = new StopFilterFactory() f.init(new Map f.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader()); // now do something with f Of the options available, it is our belief that this is: 1) the simplest approach; 2) benefits the majority of users automaticly; 3) adversely affects the fewest number of people; 4) affects those people in a relatively small way (requiring one new line of code). But we do want to verify that the number of people affected is in fact relatively small. https://issues.apache.org/jira/browse/SOLR-594 Thanks. -Hoss
Re: how to get the count of returned results
: I am using a servlet filter to alter the query parms sent to the solr , : now I have a problem where I want to take some extra action (like drop some : query terms or filters ) if the current query returns no reults. FWIW: you're probably better off implementing this as a plugin -- either as a SearchComponent or a whole RequestHandler : How can i efficiently get the count of the results for a specified query , : so that i can take an informed decision in the servlet filter. SolrIndexSearcher.getDocSet(yourQuery).size(); -Hoss
Re: scaling / sharding questions
Sorry for not keeping this thread alive, lets see what we can do... One option I've thought of for 'resharding' would splitting an index into two by just copying it, the deleting 1/2 the documents from one, doing a commit, and delete the other 1/2 from the other index and commit. That is: 1) Take original index 2) copy to b1 and b2 3) delete docs from b1 that match a particular query A 4) delete docs from b2 that do not match a particular query A 5) commit b1 and b2 Has anyone tried something like that? As for how to know where each document is stored, generally we're considering unique_document_id % N. If we rebalance we change N and redistribute, but that probably will take too much time.That makes us move more towards a staggered age based approach where the most recent docs filter down to "permanent" indexes based upon time. Another thought we've had recently is to have many many many physical shards, on the indexing writer side, but then merge groups of them into logical shards which are snapshotted to reader solrs' on a frequent basis. I haven't done any testing along these lines, but logically it seems like an idea worth pursuing. enjoy, -jeremy On Fri, Jun 06, 2008 at 03:14:10PM +0200, Marcus Herou wrote: > Cool sharding technique. > > We as well are thinking of howto "move" docs from one index to another > because we need to re-balance the docs when we add new nodes to the cluster. > We do only store id's in the index otherwise we could have moved stuff > around with IndexReader.document(x) or so. Luke (http://www.getopt.org/luke/) > is able to reconstruct the indexed Document data so it should be doable. > However I'm thinking of actually just delete the docs from the old index and > add new Documents to the new node. It would be cool to not waste cpu cycles > by reindexing already indexed stuff but... > > And we as well will have data amounts in the range you are talking about. We > perhaps could share ideas ? > > How do you plan to store where each document is located ? I mean you > probably need to store info about the Document and it's location somewhere > perhaps in a clustered DB ? We will probably go for HBase for this. > > I think the number of documents is less important than the actual data size > (just speculating). We currently search 10M (will get much much larger) > indexed blog entries on one machine where the JVM has 1G heap, the index > size is 3G and response times are still quite fast. This is a readonly node > though and is updated every morning with a freshly optimized index. Someone > told me that you probably need twice the RAM if you plan to both index and > search at the same time. If I were you I would just test to index X entries > of your data and then start to search in the index with lower JVM settings > each round and when response times get too slow or you hit OOE then you get > a rough estimate of the bare minimum X RAM needed for Y entries. > > I think we will do with something like 2G per 50M docs but I will need to > test it out. > > If you get an answer in this matter please let me know. > > Kindly > > //Marcus > > > On Fri, Jun 6, 2008 at 7:21 AM, Jeremy Hinegardner <[EMAIL PROTECTED]> > wrote: > > > Hi all, > > > > This may be a bit rambling, but let see how it goes. I'm not a Lucene or > > Solr > > guru by any means, I have been prototyping with solr and understanding how > > all > > the pieces and parts fit together. > > > > We are migrating our current document storage infrastructure to a decent > > sized > > solr cluster, using 1.3-snapshots right now. Eventually this will be in > > the > > billion+ documents, with about 1M new documents added per day. > > > > Our main sticking point right now is that a significant number of our > > documents > > will be updated, at least once, but possibly more than once. The > > volatility of > > a document decreases over time. > > > > With this in mind, we've been considering using a cascading series of shard > > clusters. That is : > > > > 1) a cluster of shards holding recent data ( most recent week or two ) > > smaller > >indexes that take a small amount of time to commit updates and optimise, > >since this will hold the most volatile documents. > > > > 2) Following that another cluster of shards that holds some relatively > > recent > >( 3-6 months ? ), but not super volatile, documents, these are items > > that > >could potentially receive updates, but generally not. > > > > 3) A final set of 'archive' shards holding the final resting place for > >documents. These would not receive updates. These would be online for > >searching and analysis "forever". > > > > We are not sure if this is the best way to go, but it is the approach we > > are > > leaning toward right now. I would like some feedback from the folks here > > if you > > think that is a reasonable approach. > > > > One of the other things I'm wondering about is how to manipulate indexe
Re: what is null value behavior in function queries?
: is there some way to change this default value (preferably from outside via : a properties file or something similar) It's based on the FieldCache, which thinking about it a bit: the defualts come from the default values of int[], float[], etc... I seem to recall a discussion in the past about the possibility of adding an option to FieldCache to allow people to specify a default -- but it doesn't exist now. FieldCache would need to support that before FieldCache based ValueSources could support it the safe bet is to index your data with "default" options on all the fields you care about for this type of use case. if you want the "default" in function query cases but not in other cases (range queries, etc...) you'll need to fields (copyField should take care of this) -Hoss
Re: scaling / sharding questions
Hi, I agree, there is definitely no generic answer, the best sources I can find so far, relating to performance are: http://wiki.apache.org/solr/SolrPerformanceData http://wiki.apache.org/solr/SolrPerformanceFactors http://wiki.apache.org/lucene-java/ImproveIndexingSpeed http://wiki.apache.org/lucene-java/ImproveSearchingSpeed http://lucene.apache.org/java/docs/benchmarks.html Although most of the items discussed on these pages relate directly to speeding up searching and indexing. The relationship I am looking for is how does index size relate to searching and indexing, that particular question doesn't appear to be answered. If no one has any information on that front I guess I'll just have to dive in and figure it out :-). As for storing the fields, our initial testing is showing that we get better performance overall by storing the data in Solr and returning it with the results instead of using the results to go look up the original documents elsewhere. Is there something I am missing here? enjoy, -jeremy On Fri, Jun 06, 2008 at 09:01:14AM -0700, Otis Gospodnetic wrote: > Hola, > > That's a pretty big an open question, but here is some info. > > Jeremy's sharding approach sounds OK. We did something similar at Technorati, > where a document/blog timestamp was the main sharding factor. You can't > really move individual docs without reindexing (i.e. delete docX from shard1 > and index docX to shard2), unless all your fields are stored, which you will > not want to do with data volumes that you are describing. > > > As for how much can be handled by a single machine, this is a FAQ and we > really need to put it on Lucene/Solr FAQ wiki page if it's not there already. > The answer is this depends on many factors (size of index, # of concurrent > searches, complexity of queries, number of searchers, type of disk, amount of > RAM, cache settings, # of CPUs...) > > The questions are right, it's just that there is no single non-generic answer. > > Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > - Original Message > > From: Marcus Herou <[EMAIL PROTECTED]> To: > > solr-user@lucene.apache.org; [EMAIL PROTECTED] Sent: Friday, June 6, > > 2008 9:14:10 AM Subject: Re: scaling / sharding questions > > > > Cool sharding technique. > > > > We as well are thinking of howto "move" docs from one index to another > > because we need to re-balance the docs when we add new nodes to the cluster. > > We do only store id's in the index otherwise we could have moved stuff > > around with IndexReader.document(x) or so. Luke > > (http://www.getopt.org/luke/) is able to reconstruct the indexed Document > > data so it should be doable. However I'm thinking of actually just delete > > the docs from the old index and add new Documents to the new node. It would > > be cool to not waste cpu cycles by reindexing already indexed stuff but... > > > > And we as well will have data amounts in the range you are talking about. We > > perhaps could share ideas ? > > > > How do you plan to store where each document is located ? I mean you > > probably need to store info about the Document and it's location somewhere > > perhaps in a clustered DB ? We will probably go for HBase for this. > > > > I think the number of documents is less important than the actual data size > > (just speculating). We currently search 10M (will get much much larger) > > indexed blog entries on one machine where the JVM has 1G heap, the index > > size is 3G and response times are still quite fast. This is a readonly node > > though and is updated every morning with a freshly optimized index. Someone > > told me that you probably need twice the RAM if you plan to both index and > > search at the same time. If I were you I would just test to index X entries > > of your data and then start to search in the index with lower JVM settings > > each round and when response times get too slow or you hit OOE then you get > > a rough estimate of the bare minimum X RAM needed for Y entries. > > > > I think we will do with something like 2G per 50M docs but I will need to > > test it out. > > > > If you get an answer in this matter please let me know. > > > > Kindly > > > > //Marcus > > > > > > On Fri, Jun 6, 2008 at 7:21 AM, Jeremy Hinegardner wrote: > > > > > Hi all, > > > > > > This may be a bit rambling, but let see how it goes. I'm not a Lucene or > > > Solr guru by any means, I have been prototyping with solr and > > > understanding how all the pieces and parts fit together. > > > > > > We are migrating our current document storage infrastructure to a decent > > > sized solr cluster, using 1.3-snapshots right now. Eventually this will > > > be in the billion+ documents, with about 1M new documents added per day. > > > > > > Our main sticking point right now is that a significant number of our > > > documents will be updated, at least once, but possibly more than once. > > > The volatility of a do
RE: scaling / sharding questions
Yes, I've done this split-by-delete several times. The halved index still uses as much disk space until you optimize it. As to splitting policy: we use an MD5 signature as our unique ID. This has the lovely property that we can wildcard. 'contentid:f*' denotes 1/16 of the whole index. This 1/16 is a very random sample of the whole index. We use this for several things. If we use this for shards, we have a query that matches a shard's contents. The Solr/Lucene syntax does not support modular arithmetic,and so it will not let you query a subset that matches one of your shards. We also found that searching a few smaller indexes via the Solr 1.3 Distributed Search feature is actually faster than searching one large index, YMMV. So for us, a large pile of shards will be optimal anyway, so we have to need "rebalance". It sounds like you're not storing the data in a backing store, but are storing all data in the index itself. We have found this "challenging". Cheers, Lance Norskog -Original Message- From: Jeremy Hinegardner [mailto:[EMAIL PROTECTED] Sent: Friday, June 13, 2008 3:36 PM To: solr-user@lucene.apache.org Subject: Re: scaling / sharding questions Sorry for not keeping this thread alive, lets see what we can do... One option I've thought of for 'resharding' would splitting an index into two by just copying it, the deleting 1/2 the documents from one, doing a commit, and delete the other 1/2 from the other index and commit. That is: 1) Take original index 2) copy to b1 and b2 3) delete docs from b1 that match a particular query A 4) delete docs from b2 that do not match a particular query A 5) commit b1 and b2 Has anyone tried something like that? As for how to know where each document is stored, generally we're considering unique_document_id % N. If we rebalance we change N and redistribute, but that probably will take too much time.That makes us move more towards a staggered age based approach where the most recent docs filter down to "permanent" indexes based upon time. Another thought we've had recently is to have many many many physical shards, on the indexing writer side, but then merge groups of them into logical shards which are snapshotted to reader solrs' on a frequent basis. I haven't done any testing along these lines, but logically it seems like an idea worth pursuing. enjoy, -jeremy On Fri, Jun 06, 2008 at 03:14:10PM +0200, Marcus Herou wrote: > Cool sharding technique. > > We as well are thinking of howto "move" docs from one index to another > because we need to re-balance the docs when we add new nodes to the cluster. > We do only store id's in the index otherwise we could have moved stuff > around with IndexReader.document(x) or so. Luke > (http://www.getopt.org/luke/) is able to reconstruct the indexed Document data so it should be doable. > However I'm thinking of actually just delete the docs from the old > index and add new Documents to the new node. It would be cool to not > waste cpu cycles by reindexing already indexed stuff but... > > And we as well will have data amounts in the range you are talking > about. We perhaps could share ideas ? > > How do you plan to store where each document is located ? I mean you > probably need to store info about the Document and it's location > somewhere perhaps in a clustered DB ? We will probably go for HBase for this. > > I think the number of documents is less important than the actual data > size (just speculating). We currently search 10M (will get much much > larger) indexed blog entries on one machine where the JVM has 1G heap, > the index size is 3G and response times are still quite fast. This is > a readonly node though and is updated every morning with a freshly > optimized index. Someone told me that you probably need twice the RAM > if you plan to both index and search at the same time. If I were you I > would just test to index X entries of your data and then start to > search in the index with lower JVM settings each round and when > response times get too slow or you hit OOE then you get a rough estimate of the bare minimum X RAM needed for Y entries. > > I think we will do with something like 2G per 50M docs but I will need > to test it out. > > If you get an answer in this matter please let me know. > > Kindly > > //Marcus > > > On Fri, Jun 6, 2008 at 7:21 AM, Jeremy Hinegardner > <[EMAIL PROTECTED]> > wrote: > > > Hi all, > > > > This may be a bit rambling, but let see how it goes. I'm not a > > Lucene or Solr guru by any means, I have been prototyping with solr > > and understanding how all the pieces and parts fit together. > > > > We are migrating our current document storage infrastructure to a > > decent sized solr cluster, using 1.3-snapshots right now. > > Eventually this will be in the > > billion+ documents, with about 1M new documents added per day. > > > > Our main sticking point right now is that a significant number of > > our documen
Geo question: Need distance from a lat/lon from SOLR query
Hi all, I'm a newbie to this group. What is the best way to build a SOLR service with geo-location? Would that be locallucene? Also what are the issues with integrating locallucene with SOLR? Is there a better solution altogether? Thanks, Rich
Re: Geo question: Need distance from a lat/lon from SOLR query
I think locallucene is the thing to user and watch (you didn't hear that here first). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rich Rein <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Cc: [EMAIL PROTECTED] > Sent: Friday, June 13, 2008 8:48:10 PM > Subject: Geo question: Need distance from a lat/lon from SOLR query > > Hi all, > > I'm a newbie to this group. > > > > What is the best way to build a SOLR service with geo-location? Would that > be locallucene? > > > > Also what are the issues with integrating locallucene with SOLR? > > > > Is there a better solution altogether? > > > > Thanks, > > Rich
Re: Geo question: Need distance from a lat/lon from SOLR query
There's a solr integration already available -- LocalSolr http://locallucene.wiki.sourceforge.net/ http://locallucene.wiki.sourceforge.net/LocalSolr+(R1.5) On Sat, Jun 14, 2008 at 6:18 AM, Rich Rein <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm a newbie to this group. > > > > What is the best way to build a SOLR service with geo-location? Would that > be locallucene? > > > > Also what are the issues with integrating locallucene with SOLR? > > > > Is there a better solution altogether? > > > > Thanks, > > Rich > > > > -- Regards, Shalin Shekhar Mangar.