Text search within facets?
Hello, Is it possible to do a text search within facets? Something that will return me what words solr used to gather my results and how many of those results were found. For example, if I have the following field: and it has docs that contain something like english bulldog french bulldog bichon frise If I search for "english bulldog" and facet on "dog", I will get the following: 135 23 12 But I really want only the ones that contain the words "english" and "bulldog" like 135 23 Thanks for your help! -- View this message in context: http://old.nabble.com/Text-search-within-facets--tp27560090p27560090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reindex data without restarting server
Hi, Thanks ! This is very useful :) :) On Fri, Feb 12, 2010 at 7:55 AM, Joe Calderon wrote: > if you use the core model via solr.xml you can reload a core without having > to to restart the servlet container, > http://wiki.apache.org/solr/CoreAdmin > > On 02/11/2010 02:40 PM, Emad Mushtaq wrote: > >> Hi, >> >> I would like to know if there is a way of reindexing data without >> restarting >> the server. Lets say I make a change in the schema file. That would >> require >> me to reindex data. Is there a solution to this ? >> >> >> > > -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
EmbeddedSolrServer vs CommonsHttpSolrServer
Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = "http://localhost:8983/solr";; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. > 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery("video"); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println("Found: " + docs.getNumFound()); System.out.println("Start: " + docs.getStart()); System.out.println("Max Score: " + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* > */ SolrConfig solrConfig = new SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", null); IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, "/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, coreDescriptor); coreContainer.register("", core, false); /* < */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Many thanks in advance for the support and the great work realized with all the lucene/solr projects. Dino. --
inconsistency between analysis.jsp and actual search
Hi I am indexing the name "FC St. Gallen" using the following type: Which according to analysis.jsp gets split into: f | fc | s | st | g | ga | gal | gall | galle | gallen So far so good. Now if I search for "fc st.gallen" according to analysis.jsp it will search for: fc | st | gallen But when I do a dismax search using the following handler: dismax explicit 10 name firstname email^0.5 telefon^0.5 city^0.6 street^0.6 id,type,name,firstname,zipcode,city,street,urlizedname I do not get a match. Looking at the debug of the query I can see that its actually splitting the query into "fc" and "st gallen": fc st.gallen fc st.gallen +((DisjunctionMaxQuery((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6 | name:fc)) DisjunctionMaxQuery((telefon:"st gallen"^0.5 | firstname:"st gallen" | email:"st gallen"^0.5 | street:"st gallen"^0.6 | city:"st gallen"^0.6 | name:"st gallen")))~2) () +(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6 | name:fc) (telefon:"st gallen"^0.5 | firstname:"st gallen" | email:"st gallen"^0.5 | street:"st gallen"^0.6 | city:"st gallen"^0.6 | name:"st gallen"))~2) () Whats going on there? regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: inconsistency between analysis.jsp and actual search
> Which according to analysis.jsp gets split into: > f | fc | s | st | g | ga | gal | gall | galle | gallen > > So far so good. > > Now if I search for "fc st.gallen" according to > analysis.jsp it will search for: > fc | st | gallen > > But when I do a dismax search using the following handler: > class="solr.SearchHandler" default="true"> > > name="defType">dismax > name="echoParams">explicit > name="rows">10 > name > firstname email^0.5 telefon^0.5 city^0.6 > street^0.6 > name="fl">id,type,name,firstname,zipcode,city,street,urlizedname > > > > I do not get a match. > Looking at the debug of the query I can see that its > actually splitting the query into "fc" and "st gallen": > fc st.gallen > fc st.gallen > > +((DisjunctionMaxQuery((telefon:fc^0.5 | firstname:fc | > email:fc^0.5 | street:fc^0.6 | city:fc^0.6 | name:fc)) > DisjunctionMaxQuery((telefon:"st gallen"^0.5 | firstname:"st > gallen" | email:"st gallen"^0.5 | street:"st gallen"^0.6 | > city:"st gallen"^0.6 | name:"st gallen")))~2) () > > > +(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | > street:fc^0.6 | city:fc^0.6 | name:fc) (telefon:"st > gallen"^0.5 | firstname:"st gallen" | email:"st gallen"^0.5 > | street:"st gallen"^0.6 | city:"st gallen"^0.6 | name:"st > gallen"))~2) () > > > Whats going on there? analysis.jsp does not do actual query parsing. just shows produced tokens step by step in analysis (charfilter, tokenizer, tokenfilter) phase. "admin/analysis.jsp page will show you how your field is processed while indexing and while querying, and if a particular query matches." [1] [1]http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml ${solr.data.dir:./solr/data} we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = "http://localhost:8983/solr";; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. > 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery("video"); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println("Found : " + docs.getNumFound()); System.out.println("Start : " + docs.getStart()); System.out.println("Max Score: " + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* > */ SolrConfig solrConfig = new SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", null); IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, "/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, coreDescriptor); coreContainer.register("", core, false); /* < */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Many thanks in advance for the support and the great work realized with all the lucene/solr projects. Dino. --
Local Solr Inconsistent results for radius
Hello, I have a question related to local solr. For certain locations (latitude, longitude), the spatial search does not work. Here is the query I try to make which gives me no results: q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73. 060547&radius=450 However if I make the same query with radius=449, it gives me results. Here is part of my solrconfig.xml containing startTier and endTier: latitude longitude 9 17 What do I need to do to fix this problem? -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: inconsistency between analysis.jsp and actual search
On 12.02.2010, at 11:17, Ahmet Arslan wrote: > analysis.jsp does not do actual query parsing. just shows produced tokens > step by step in analysis (charfilter, tokenizer, tokenfilter) phase. > "admin/analysis.jsp page will show you how your field is processed while > indexing and while querying, and if a particular query matches." [1] > > [1]http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F I see, thats good to know. Maybe even something that should be noted in the analysis.jsp page itself. Anyways so how can I get "st.gallen" split into two terms at query time? ... It seems I should probably use the solr.StandardTokenizerFactory anyways, but for this case it wouldnt help either. regards, Lukas Kahwe Smith m...@pooteeweet.org
optimize is taking too much time
hi in my solr u have 1,42,45,223 records having some 50GB . Now when iam loading a new record and when its trying optimize the docs its taking 2 much memory and time can any body please tell do we have any property in solr to get rid of this. Thanks in advance -- View this message in context: http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
Yes you are right. [code.2] works fine by commenting out the following lines on solrconfig.xml Is it correct this different behaviour from EmbeddedSolrServer ? Or it can be considered a low priority bug? Thanks for you prompt reply! Dino. -- Da: Ron Chan A: solr-user@lucene.apache.org Inviato: Ven 12 febbraio 2010, 11:14:58 Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml ${solr.data.dir:./solr/data} we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = "http://localhost:8983/solr";; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. > 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery("video"); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println("Found : " + docs.getNumFound()); System.out.println("Start : " + docs.getStart()); System.out.println("Max Score: " + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* > */ SolrConfig solrConfig = new SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", null); IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, "/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, coreDescriptor); coreContainer.register("", core, false); /* < */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Many thanks in advance for the support and the great work realized with all the lucene/solr projects. Dino. --
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
When using EmbeddedSolrServer, you could simply set the solr.data.dir system property or launch your process from the same working directory where you are launching the HTTP version of Solr. Either of those should also work to alleviate this issue. Erik On Feb 12, 2010, at 5:36 AM, dcdmailbox-i...@yahoo.it wrote: Yes you are right. [code.2] works fine by commenting out the following lines on solrconfig.xml Is it correct this different behaviour from EmbeddedSolrServer ? Or it can be considered a low priority bug? Thanks for you prompt reply! Dino. -- Da: Ron Chan A: solr-user@lucene.apache.org Inviato: Ven 12 febbraio 2010, 11:14:58 Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml ${solr.data.dir:./solr/data} we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = "http://localhost:8983/solr";; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. > 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache- solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery("video"); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println("Found : " + docs.getNumFound()); System.out.println("Start : " + docs.getStart()); System.out.println("Max Score: " + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache- solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* > */ SolrConfig solrConfig = new SolrConfig("/WORKSPACE/bin/apache- solr-1.4.0/example/solr", "solrconfig.xml", null); IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, "/WORKSPACE/bin/apache-solr-1.4.0/ example/solr/data", solrConfig, indexSchema, coreDescriptor); coreContainer.register("", core, false); /* < */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Many thanks in advance for the support and the great work realized with all the luce
Re: EmbeddedSolrServer vs CommonsHttpSolrServer
don't think this is a bug, the default behaviour is for /data to sit under Solr home there should be no need to use this parameter unless it is special case not sure why it is like this in the example - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 10:36:41 AM Subject: Re: EmbeddedSolrServer vs CommonsHttpSolrServer Yes you are right. [code.2] works fine by commenting out the following lines on solrconfig.xml Is it correct this different behaviour from EmbeddedSolrServer ? Or it can be considered a low priority bug? Thanks for you prompt reply! Dino. -- Da: Ron Chan A: solr-user@lucene.apache.org Inviato: Ven 12 febbraio 2010, 11:14:58 Oggetto: Re: EmbeddedSolrServer vs CommonsHttpSolrServer I suspect this has something to do with the dataDir setting in the example 's solrconfig.xml ${solr.data.dir:./solr/data} we use the example's solrconfig.xml as the base for our deployments and always comment this out the default of having conf and data sitting under the solr home works well - Original Message - From: dcdmailbox-i...@yahoo.it To: solr-user@lucene.apache.org Sent: Friday, 12 February, 2010 8:30:57 AM Subject: EmbeddedSolrServer vs CommonsHttpSolrServer Hi all, I am new to solr/solrj. I correctly started up the server example given in the distribution (apache-solr-1.4.0\example\solr), populated the index with test data set, and successfully tested with http query string via browser (es. http://localhost:8983/solr/select/?indent=on&q=video&fl=name,id) I am trying to set up solrj clients using both CommonsHttpSolrServer and EmbeddedSolrServer. My examples are with single core configuration. Here below the method used for CommonsHttpSolrServer initialization: [code.1] public SolrServer getCommonsHttpSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { String url = "http://localhost:8983/solr";; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); server.setSoTimeout(1000); // socket read timeout server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); // defaults to false // allowCompression defaults to false. // Server side must support gzip or deflate for this to have any effect. server.setAllowCompression(true); server.setMaxRetries(1); // defaults to 0. > 1 not recommended. return server; } Here below the method used for EmbeddedSolrServer initialization (provided in the wiki section): [code.2] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, ""); return server; } Here below the common code used to query the server: [code.3] SolrServer server = mintIdxMain.getEmbeddedSolrServer(); //SolrServer server = mintIdxMain.getCommonsHttpSolrServer(); SolrQuery query = new SolrQuery("video"); QueryResponse rsp = server.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println("Found : " + docs.getNumFound()); System.out.println("Start : " + docs.getStart()); System.out.println("Max Score: " + docs.getMaxScore()); CommonsHttpSolrServer gives correct results whereas EmbeddedSolrServer gives always no results. What's wrong with the initialization and/or the configuration of the EmbeddedSolrServer? CoreContainer.Initializer() seems to not recognize the single core from solrconfig.xml... If I modify [code.2] with the following code, it seems to work. I manually added only explicit Core Container registration. Is [code.4] the correct way? [code.4] public SolrServer getEmbeddedSolrServer() throws IOException, ParserConfigurationException, SAXException, SolrServerException { System.setProperty("solr.solr.home", "/WORKSPACE/bin/apache-solr-1.4.0/example/solr"); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); CoreContainer coreContainer = initializer.initialize(); /* > */ SolrConfig solrConfig = new SolrConfig("/WORKSPACE/bin/apache-solr-1.4.0/example/solr", "solrconfig.xml", null); IndexSchema indexSchema = new IndexSchema(solrConfig, "schema.xml", null); CoreDescriptor coreDescriptor = new CoreDescriptor(coreContainer, "", solrConfig.getResourceLoader().getInstanceDir()); SolrCore core = new SolrCore(null, "/WORKSPACE/bin/apache-solr-1.4.0/example/solr/data", solrConfig, indexSchema, coreDescriptor); coreContainer.register("", core, false); /* < */ EmbeddedSolrServer server = new EmbeddedSolrServer(coreContain
Good literature on search basics
Does anyone know good literature(web resources, books etc) on basics of search? I do have Solr 1.4 and Lucene books but wanted to go in more details on basics. Thanks, -- View this message in context: http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html Sent from the Solr - User mailing list archive at Nabble.com.
persistent cache
Does Solr use some sort of a persistent cache? I do this 10 times in a loop: * start solr * create a core * execute warmup query * execute query with sort fields * stop solr Executing the query with sort fields takes 5-20 times longer the first iteration than the other 9 iterations. For instance I have a query 'hockey' with one date sort field. That takes 768 ms in the first iteration of the loop. The next 9 iterations the query takes 52 ms. The solr and jetty server really stops in each iteration so the RAM must be emptied. So the only way I can think of why this happens is because there is some persistent cache that survives the solr restarts. Is this the case? Or why could this be? /Tim
Re: persistent cache
2010/2/12 Tim Terlegård > Does Solr use some sort of a persistent cache? > > I do this 10 times in a loop: > * start solr > * create a core > * execute warmup query > * execute query with sort fields > * stop solr > > Executing the query with sort fields takes 5-20 times longer the first > iteration than the other 9 iterations. For instance I have a query > 'hockey' with one date sort field. That takes 768 ms in the first > iteration of the loop. The next 9 iterations the query takes 52 ms. > The solr and jetty server really stops in each iteration so the RAM > must be emptied. So the only way I can think of why this happens is > because there is some persistent cache that survives the solr > restarts. Is this the case? Or why could this be? > > Solr does not have a persistent cache. That is the operating system's file cache at work. -- Regards, Shalin Shekhar Mangar.
Re: Dismax phrase queries
On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > I'd like to boost an exact phrase match such as q="video poker" over > q=video poker. How would I do this using dismax? > > I tried pre-processing video poker into, video poker "video poker" > however that just gets munged by dismax into "video poker video > poker"... Which is wrong. > > Have you tried the pf parameter? -- Regards, Shalin Shekhar Mangar.
Re: spellcheck
I try to config spellcheck, but I still have this problem: Config: solr.FileBasedSpellChecker file spellings.txt UTF-8 ./spellcheckerFile false false 1 true file spellcheck Maybe I have this result because I work with dictionary? For request 'popular' I still get 'populars', but in dictionary I have popular and populars! -- View this message in context: http://old.nabble.com/spellcheck-tp27527425p27562959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Local Solr Inconsistent results for radius
Hi Emad, I had the same issue ( http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it seems that this happens only on eastern areas of the world. Try inverting the sign of all your longitudes, or translate all your longitudes to the west. Cheers, Mauricio On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq wrote: > Hello, > > I have a question related to local solr. For certain locations (latitude, > longitude), the spatial search does not work. Here is the query I try to > make which gives me no results: > > q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73. > 060547&radius=450 > > However if I make the same query with radius=449, it gives me results. > > Here is part of my solrconfig.xml containing startTier and endTier: > > > class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory"> >latitude >longitude > >9 >17 > > > > > > What do I need to do to fix this problem? > > > -- > Muhammad Emad Mushtaq > http://www.emadmushtaq.com/ >
Re: inconsistency between analysis.jsp and actual search
> Anyways so how can I get "st.gallen" split into two terms > at query time? As you mentioned in your first mail, query st.gallen is already broken into two terms/words. But query parser constructs a phrase query. There was an disscussion about this behaviour earlier. http://www.lucidimagination.com/search/document/d41bc0ef422b9238/understanding_the_query_parser#85db37e69ef29dba
Fwd: indexing: issue with default values
in the schema.xml I have fileds with int type and default value exp: but when a document has no value for the field "postal_code" at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ Error 500 HTTP ERROR: 500For input string: "" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) 04 any help? thx
Re: persistent cache
2010/2/12 Shalin Shekhar Mangar : > 2010/2/12 Tim Terlegård > >> Does Solr use some sort of a persistent cache? >> > Solr does not have a persistent cache. That is the operating system's file > cache at work. Aha, that's very interesting and seems to make sense. So is the primary goal of warmup queries to allow the operating system to cache all the files in the data/index directory? Because I think the difference (768ms vs 52ms) is pretty big. I just do one warmup query and get 52 ms response on a 40 million documents index. I think that's pretty nice performance without tinkering with the caches at all. The only tinkering that seems to be needed is this operating system file caching. What's the best way to make sure that my warmup queries have cached all the files? And does a file cache have the complete file in memory? I guess it can get tough to get my 100GB index into the 16GB memory. /Tim
Re: Good literature on search basics
See http://markmail.org/thread/z5sq2jr2a6eayth4 On 12 February 2010 12:14, javaxmlsoapdev wrote: > > Does anyone know good literature(web resources, books etc) on basics of > search? I do have Solr 1.4 and Lucene books but wanted to go in more > details > on basics. > > Thanks, > -- > View this message in context: > http://old.nabble.com/Good-literature-on-search-basics-tp27562021p27562021.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: indexing: issue with default values
When a document has no value, are you still sending a postal_code field in your post to Solr? Seems like you are. Erik On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: in the schema.xml I have fileds with int type and default value exp: stored="true" default="0"/> but when a document has no value for the field "postal_code" at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ Error 500 HTTP ERROR: 500For input string: "" java.lang.NumberFormatException: For input string: "" at java .lang .NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org .apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 246) at org .apache .solr .update .processor .RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org .apache .solr .handler .ContentStreamHandlerBase .handleRequestBody(ContentStreamHandlerBase.java:54) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 365) at org .mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org .mortbay .jetty .handler .ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org .mortbay .jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442) 04 any help? thx
Re: Dismax phrase queries
Was going to post that I more or less figured it out. Dismax handles this automatically with the ps parameter, which is different than the bs parameter... On Fri, Feb 12, 2010 at 3:48 AM, Shalin Shekhar Mangar wrote: > On Fri, Feb 12, 2010 at 6:06 AM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> I'd like to boost an exact phrase match such as q="video poker" over >> q=video poker. How would I do this using dismax? >> >> I tried pre-processing video poker into, video poker "video poker" >> however that just gets munged by dismax into "video poker video >> poker"... Which is wrong. >> >> > Have you tried the pf parameter? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: indexing: issue with default values
yes, sometimes the document has postal_code with no values , i still post it to solr 2010/2/12 Erik Hatcher > When a document has no value, are you still sending a postal_code field in > your post to Solr? Seems like you are. > >Erik > > > On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: > > in the schema.xml I have fileds with int type and default value >> exp: > default="0"/> >> but when a document has no value for the field "postal_code" >> at indexing, I get the following error: >> >> Posting file Immo.xml to http://localhost:8983/solr/update/ >> >> >> >> Error 500 >> >> HTTP ERROR: 500For input string: "" >> >> java.lang.NumberFormatException: For input string: "" >> at >> >> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) >> at java.lang.Integer.parseInt(Integer.java:470) >> at java.lang.Integer.parseInt(Integer.java:499) >> at org.apache.solr.schema.TrieField.createField(TrieField.java:416) >> at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) >> at >> >> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) >> at >> >> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) >> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) >> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) >> at >> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) >> at >> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >> at >> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) >> at >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) >> at >> >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) >> at >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) >> at >> >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >> at >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) >> at >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) >> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) >> at >> >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) >> at >> >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) >> at >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) >> at org.mortbay.jetty.Server.handle(Server.java:285) >> at >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) >> at >> >> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) >> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) >> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) >> at >> >> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) >> at >> >> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) >> >> >> >> >> >> >> >> 0> name="QTime">4 >> >> >> any help? thx >> > >
Re: Collating results from multiple indexes
Really? The last time I looked at AIE, I am pretty sure there was Solr core msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at Lucene level or on top of multiple Solr cores or what? -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote: > Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I > *think* they are trying to patent this. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > - Original Message >> From: Jan Høydahl / Cominvent >> To: solr-user@lucene.apache.org >> Sent: Mon, February 8, 2010 3:33:41 PM >> Subject: Re: Collating results from multiple indexes >> >> Hi, >> >> There is no JOIN functionality in Solr. The common solution is either to >> accept >> the high volume update churn, or to add client side code to build a "join" >> layer >> on top of the two indices. I know that Attivio (www.attivio.com) have built >> some >> kind of JOIN functionality on top of Solr in their AIE product, but do not >> know >> the details or the actual performance. >> >> Why not open a JIRA issue, if there is no such already, to request this as a >> feature? >> >> -- >> Jan Høydahl - search architect >> Cominvent AS - www.cominvent.com >> >> On 25. jan. 2010, at 22.01, Aaron McKee wrote: >> >>> >>> Is there any somewhat convenient way to collate/integrate fields from >>> separate >> indices during result writing, if the indices use the same unique keys? >> Basically, some sort of cross-index JOIN? >>> >>> As a bit of background, I have a rather heavyweight dataset of every US >> business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours >> to >> fully index on a decent box). Given the size and relatively stability of the >> dataset, I generally only update this monthly. However, I have separate >> advertising-related datasets that need to be updated either hourly or daily >> (e.g. today's coupon, click revenue remaining, etc.) . These advertiser >> feeds >> reference the same keyspace that I use in the main index, but are otherwise >> significantly lighter weight. Importing and indexing them discretely only >> takes >> a couple minutes. Given that Solr/Lucene doesn't support field updating, >> without >> having to drop and re-add an entire document, it doesn't seem practical to >> integrate this data into the main index (the system would be under a >> constant >> state of churn, if we did document re-inserts, and the performance impact >> would >> probably be debilitating). It may be nice if this data could participate in >> filtering (e.g. only show advertisers), but it doesn't need to participate >> in >> scoring/ranking. >>> >>> I'm guessing that someone else has had a similar need, at some point? I >>> can >> have our front-end query the smaller indices separately, using the keys >> returned >> by the primary index, but would prefer to avoid the extra sequential >> roundtrips. >> I'm hoping to also avoid a coding solution, if only to avoid the maintenance >> overhead as we drop in new builds of Solr, but that's also feasible. >>> >>> Thank you for your insight, >>> Aaron >>> >
Re: indexing: issue with default values
That would be the problem then, I believe. Simply don't post a value to get the default value to work. Erik On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote: yes, sometimes the document has postal_code with no values , i still post it to solr 2010/2/12 Erik Hatcher When a document has no value, are you still sending a postal_code field in your post to Solr? Seems like you are. Erik On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: in the schema.xml I have fileds with int type and default value exp: stored="true" default="0"/> but when a document has no value for the field "postal_code" at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ Error 500 HTTP ERROR: 500For input string: "" java.lang.NumberFormatException: For input string: "" at java .lang .NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org .apache .solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org .apache .solr .update .processor .RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java: 139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org .apache .solr .handler .ContentStreamHandlerBase .handleRequestBody(ContentStreamHandlerBase.java:54) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org .mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 365) at org .mortbay .jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org .mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 181) at org .mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 405) at org .mortbay .jetty .handler .ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org .mortbay .jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org .mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442) 04 any help? thx
Re: indexing: issue with default values
thanx Eric, that was very helpfull 2010/2/12 Erik Hatcher > That would be the problem then, I believe. Simply don't post a value to > get the default value to work. > >Erik > > > On Feb 12, 2010, at 10:18 AM, nabil rabhi wrote: > > yes, sometimes the document has postal_code with no values , i still post >> it >> to solr >> 2010/2/12 Erik Hatcher >> >> When a document has no value, are you still sending a postal_code field >>> in >>> your post to Solr? Seems like you are. >>> >>> Erik >>> >>> >>> On Feb 12, 2010, at 8:12 AM, nabil rabhi wrote: >>> >>> in the schema.xml I have fileds with int type and default value >>> exp: >>> default="0"/> but when a document has no value for the field "postal_code" at indexing, I get the following error: Posting file Immo.xml to http://localhost:8983/solr/update/ Error 500 HTTP ERROR: 500For input string: "" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.solr.schema.TrieField.createField(TrieField.java:416) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:94) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) 0>>> name="QTime">4 any help? thx >>> >>> >
Re: persistent cache
One solution is to add the persistent cache with memcache at the application layer. -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com On 2/12/10 5:19 AM, Tim Terlegård wrote: 2010/2/12 Shalin Shekhar Mangar: 2010/2/12 Tim Terlegård Does Solr use some sort of a persistent cache? Solr does not have a persistent cache. That is the operating system's file cache at work. Aha, that's very interesting and seems to make sense. So is the primary goal of warmup queries to allow the operating system to cache all the files in the data/index directory? Because I think the difference (768ms vs 52ms) is pretty big. I just do one warmup query and get 52 ms response on a 40 million documents index. I think that's pretty nice performance without tinkering with the caches at all. The only tinkering that seems to be needed is this operating system file caching. What's the best way to make sure that my warmup queries have cached all the files? And does a file cache have the complete file in memory? I guess it can get tough to get my 100GB index into the 16GB memory. /Tim -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com
Re: Text search within facets?
> For example, if I have the following field: > > stored="true"/> > > and it has docs that contain something like > > english bulldog > french bulldog > bichon frise > > If I search for "english bulldog" and facet on "dog", I > will get the > following: > > 135 > 23 > 12 Thats strange. The query "english bulldog" should return only english bulldog since type of dog is string which is not tokenized. What is your default search field defined in schema.xml? Can you try &q=dog:"english bulldog"&facet=true&facet.field=dog&facet.mincount=1
expire/delete documents
HiIs there a way for solr or lucene to expire documents based on a field in a document. Let's say that I have a createTime field whose type is date, can i set a policy in schema.xml for solr to delete the documents older than X days?Thank you
Re: Local Solr Inconsistent results for radius
Hello Mauricio, Do you know why such a problem occurs. Has it to do with certain latitudes, longitudes. If so why is it happening. Is it a bug in local solr? On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer < mauricioschef...@gmail.com> wrote: > Hi Emad, > > I had the same issue ( > http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it > seems that this happens only on eastern areas of the world. Try inverting > the sign of all your longitudes, or translate all your longitudes to the > west. > > Cheers, > Mauricio > > On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq > wrote: > > > Hello, > > > > I have a question related to local solr. For certain locations (latitude, > > longitude), the spatial search does not work. Here is the query I try to > > make which gives me no results: > > > > q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73. > > 060547&radius=450 > > > > However if I make the same query with radius=449, it gives me results. > > > > Here is part of my solrconfig.xml containing startTier and endTier: > > > > > > > class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory"> > >latitude > >longitude > > > >9 > >17 > > > > > > > > > > > > What do I need to do to fix this problem? > > > > > > -- > > Muhammad Emad Mushtaq > > http://www.emadmushtaq.com/ > > > -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: expire/delete documents
You could easily have a scheduled job that ran delete by query to remove posts older than a certain date... On Fri, Feb 12, 2010 at 13:00, Matthieu Labour wrote: > HiIs there a way for solr or lucene to expire documents based on a field in a > document. Let's say that I have a createTime field whose type is date, can i > set a policy in schema.xml for solr to delete the documents older than X > days?Thank you > > >
Re: Deleting spelll checker index
HI Guys Opening this thread again. I need to get around this issue. i have a spellcheck field defined and i am copying two fileds make and model to this field i have buildoncommit and buildonoptimize set to true hence when i index data and try to search for a work accod i get back suggestion accord since model is also being copied. I stop the sorl server removed the copy filed for model. now i only copy make to the spellText field and started solr server. i refreshed the dictiaonry by issuring the following command. spellcheck.build=true&spellcheck.dictionary=default So i hope it should rebuild by dictionary, bu the strange thing is that it still gives a suggestion for accrd. I have to reindex data again and then it wont offer me suggestion which is the correct behavour. How can i create the dictionary again by changing my schema and issuing a command spellcheck.build=true&spellcheck.dictionary=default i cant afford to reindex data everytime. Any answer ASAP will be appreciated Thanks darniz darniz wrote: > > Then i assume the easiest way is to delete the directory itself. > > darniz > > > hossman wrote: >> >> >> : We are using Index based spell checker. >> : i was wondering with the help of any url parameters can we delete the >> spell >> : check index directory. >> >> I don't think so. >> >> You might be able to configure two differnet spell check components that >> point at the same directory -- one hat builds off of a real field, and >> one >> that builds off of an (empty) text field (using FileBasedSpellChecker) .. >> then you could trigger a rebuild of an empty spell checking index using >> the second component. >> >> But i've never tried it so i have no idea if it would work. >> >> >> -Hoss >> >> >> > > -- View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27567465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Local Solr Inconsistent results for radius
Yes, it seems to be a bug, at least with the code you and I are using. If you don't need to search across the whole globe, try translating your longitudes as I suggested. On Fri, Feb 12, 2010 at 3:04 PM, Emad Mushtaq wrote: > Hello Mauricio, > > Do you know why such a problem occurs. Has it to do with certain latitudes, > longitudes. If so why is it happening. Is it a bug in local solr? > > On Fri, Feb 12, 2010 at 5:50 PM, Mauricio Scheffer < > mauricioschef...@gmail.com> wrote: > > > Hi Emad, > > > > I had the same issue ( > > http://old.nabble.com/Spatial---Local-Solr-radius-td26943608.html ), it > > seems that this happens only on eastern areas of the world. Try inverting > > the sign of all your longitudes, or translate all your longitudes to the > > west. > > > > Cheers, > > Mauricio > > > > On Fri, Feb 12, 2010 at 7:22 AM, Emad Mushtaq > > wrote: > > > > > Hello, > > > > > > I have a question related to local solr. For certain locations > (latitude, > > > longitude), the spatial search does not work. Here is the query I try > to > > > make which gives me no results: > > > > > > q=*&qt=geo&sort=geo_distance asc&lat=33.718151&long=73. > > > 060547&radius=450 > > > > > > However if I make the same query with radius=449, it gives me results. > > > > > > Here is part of my solrconfig.xml containing startTier and endTier: > > > > > > > > > > > class="com.pjaol.search.solr.update.LocalUpdateProcessorFactory"> > > >latitude > > >longitude > > > > > >9 > > >17 > > > > > > > > > > > > > > > > > > What do I need to do to fix this problem? > > > > > > > > > -- > > > Muhammad Emad Mushtaq > > > http://www.emadmushtaq.com/ > > > > > > > > > -- > Muhammad Emad Mushtaq > http://www.emadmushtaq.com/ >
Re: persistent cache
Hi Tim, We generally run about 1600 cache-warming queries to warm up the OS disk cache and the Solr caches when we mount a new index. Do you have/expect phrase queries? If you don't, then you don't need to get any position information into your OS disk cache. Our position information takes about 85% of the total index size (*prx files). So with a 100GB index, your *frq files might only be 15-20GB and you could probably get more than half of that in 16GB of memory. If you have limited memory and a large index, then you need to choose cache warming queries carefully as once the cache is full, further queries will start evicting older data from the cache. The tradeoff is to populate the cache with data that would require the most disk access if the data was not in the cache versus populating the cache based on your best guess of what queries your users will execute. A good overview of the issues is the paper by Baeza-Yates ( http://doi.acm.org/10.1145/1277741.125 The Impact of Caching on Search Engines ) Tom Burton-West Digital Library Production Service University of Michigan Library -- View this message in context: http://old.nabble.com/persistent-cache-tp27562126p27567840.html Sent from the Solr - User mailing list archive at Nabble.com.
Has anyone prepared a general purpose synonyms.txt for search engines
Hi, I was wondering if anyone has prepared a synonyms.txt for general purpose search engines, that can be shared. If not could you refer me to places where such a synonym list or thesaurus can be found. Synonyms for search engines are different from the regular thesaurus. Any help would be highly appreciated. Thanks. -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: Has anyone prepared a general purpose synonyms.txt for search engines
Hi, at openthesaurus.org or .com you can find a mysql version of synonyms you just have to join it to fit the synonym schema of solr yourself. Am 12.02.2010 um 20:03 schrieb Emad Mushtaq: > Hi, > > I was wondering if anyone has prepared a synonyms.txt for general purpose > search engines, that can be shared. If not could you refer me to places > where such a synonym list or thesaurus can be found. Synonyms for search > engines are different from the regular thesaurus. Any help would be highly > appreciated. Thanks. > > -- > Muhammad Emad Mushtaq > http://www.emadmushtaq.com/ Mit freundlichen Grüßen, Julian Hille
Re: Encountering a roadblock with my Solr schema design...use dedupe?
Hi all, I am the author of the article referenced in this thread and after reading it again, I can understand where there might have been confusion and my apologies on that. I have edited the article to indicate that a deduplication component is in the works and referenced SOLR-236. The article can still be found at http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics My only question after reading this thread is what does a user purchase? A product identified by a SKU? If that's the case then certainly indexing by SKU is the way to go and then using the field collapse (the query time deduplication) should work. Also keep in mind that in my example, I was talking about the *exact* same product located in different locations which could yield a bad user experience if they were all shown on the same search result page. In your case, each SKU is a unique (purchasable) product so collapsing by product id is nice but would not doing so degrade the user experience? If I searched for a green shirt and got S,M,L (all product ID 3) is that bad? Hope that helps some Amit On Sat, Jan 16, 2010 at 3:43 PM, David MARTIN wrote: > I'm really interested in reading the answer to this thread as my problem is > rather the same. Maybe my main difference is the huge SKU number per > product > I may have. > > > David > > On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor > wrote: > > > > > Hoss, > > > > Would you suggest using dedup for my use case; and if so, do you know of > a > > working example I can reference? > > > > I don't have an issue using the patched version of Solr, but I'd much > > rather > > use the GA version. > > > > -Kelly > > > > > > > > hossman wrote: > > > > > > > > > : Dedupe is completely the wrong word. Deduping is something else > > > : entirely - it is about trying not to index the same document twice. > > > > > > Dedup can also certainly be used with field collapsing -- that was one > of > > > the initial use cases identified for the > SignatureUpdateProcessorFactory > > > ... you can compute an 'expensive' signature when adding a document, > > index > > > it, and then FieldCollapse on that signature field. > > > > > > This gives you "query time deduplication" based on a value computed > when > > > indexing (the canonical example is multiple urls refrenceing the "same" > > > content but with slightly differnet boilerplate markup. You can use a > > > Signature class that recognizes the boilerplate and computes an > identical > > > signature value for each URL whose content is "the same" but still > index > > > all of the URLs and their content as distinct documents ... so use > cases > > > where people only "distinct" URLs work using field collapse but by > > default > > > all matching documents can still be returned and searches on text in > the > > > boilerplate markup also still work. > > > > > > > > > -Hoss > > > > > > > > > > > > > -- > > View this message in context: > > > http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > >
Re: Has anyone prepared a general purpose synonyms.txt for search engines
Wow thanks!! You all are awesome! :D :D On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille wrote: > Hi, > > at openthesaurus.org or .com you can find a mysql version of synonyms you > just have to join it to fit the synonym schema of solr yourself. > > > Am 12.02.2010 um 20:03 schrieb Emad Mushtaq: > > > Hi, > > > > I was wondering if anyone has prepared a synonyms.txt for general purpose > > search engines, that can be shared. If not could you refer me to places > > where such a synonym list or thesaurus can be found. Synonyms for search > > engines are different from the regular thesaurus. Any help would be > highly > > appreciated. Thanks. > > > > -- > > Muhammad Emad Mushtaq > > http://www.emadmushtaq.com/ > > Mit freundlichen Grüßen, > Julian Hille > > > -- Muhammad Emad Mushtaq http://www.emadmushtaq.com/
Re: Has anyone prepared a general purpose synonyms.txt for search engines
Hi, Your welcome. Thats something google came up with some weeks ago :) Am 12.02.2010 um 20:42 schrieb Emad Mushtaq: > Wow thanks!! You all are awesome! :D :D > > On Sat, Feb 13, 2010 at 12:32 AM, Julian Hille wrote: > >> Hi, >> >> at openthesaurus.org or .com you can find a mysql version of synonyms you >> just have to join it to fit the synonym schema of solr yourself. >> >> >> Am 12.02.2010 um 20:03 schrieb Emad Mushtaq: >> >>> Hi, >>> >>> I was wondering if anyone has prepared a synonyms.txt for general purpose >>> search engines, that can be shared. If not could you refer me to places >>> where such a synonym list or thesaurus can be found. Synonyms for search >>> engines are different from the regular thesaurus. Any help would be >> highly >>> appreciated. Thanks. >>> >>> -- >>> Muhammad Emad Mushtaq >>> http://www.emadmushtaq.com/ >> >> Mit freundlichen Grüßen, >> Julian Hille >> >> >> > > > -- > Muhammad Emad Mushtaq > http://www.emadmushtaq.com/ Mit freundlichen Grüßen, Julian Hille --- NetImpact KG Altonaer Straße 8 20357 Hamburg Tel: 040 / 6738363 2 Mail: jul...@netimpact.de Geschäftsführer: Tarek Müller
Re: implementing profanity detector
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll wrote: > > Otherwise, I'd do it via copy fields. Your first field is your main field > and is analyzed as before. Your second field does the profanity detection > and simply outputs a single token at the end, safe/unsafe. > > How long are your documents? The extra copy field is extra work, but in this > case it should be fast as you should be able to create a pretty streamlined > analyzer chain for the second task. > The documents are web page text, so they shouldn't be more than 10-20k generally. Would something like this do the trick? @Override public boolean incrementToken() throws IOException { while (input.incrementToken()) { if (profanities.contains(termAtt.termBuffer(), 0, termAtt.termLength())) { termAtt.setTermBuffer("y", 0, 1); return false; } } termAtt.setTermBuffer("n", 0, 1); return false; } mike
For caches, any reason to not set initialSize and size to the same value?
If I've done a lot of research and have a very good idea of where my cache sizes are having monitored the stats right before commits, is there any reason why I wouldn't just set the initialSize and size counts to the same values? Is there any reason to set a smaller initialSize if I know reliably that where my limit will almost always be? -Jay
Re: For caches, any reason to not set initialSize and size to the same value?
On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill wrote: > If I've done a lot of research and have a very good idea of where my cache > sizes are having monitored the stats right before commits, is there any > reason why I wouldn't just set the initialSize and size counts to the same > values? Is there any reason to set a smaller initialSize if I know reliably > that where my limit will almost always be? Probably not much... The only savings will be the 8 bytes (on a 64 bit proc) per unused array slot (in the HashMap). Maybe we should consider removing the initialSize param from the example config to reduce the amount of stuff a user needs to think about. -Yonik http://www.lucidimagination.com
reloading sharedlib folder
when using solr.xml, you can specify a sharedlib directory to share among cores, is it possible to reload the classes in this dir without having to restart the servlet container? it would be useful to be able to make changes to those classes on the fly or be able to drop in new plugins
RE: For caches, any reason to not set initialSize and size to the same value?
I always use initial size = max size, just to avoid Arrays.copyOf()... Initial (default) capacity for HashMap is 16, when it is not enough - array copy to new 32-element array, then to 64, ... - too much wasted space! (same for ConcurrentHashMap) Excuse me if I didn't understand the question... -Fuad http://www.tokenizer.ca > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: February-12-10 6:30 PM > To: solr-user@lucene.apache.org > Subject: Re: For caches, any reason to not set initialSize and size to > the same value? > > On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill > wrote: > > If I've done a lot of research and have a very good idea of where my > cache > > sizes are having monitored the stats right before commits, is there > any > > reason why I wouldn't just set the initialSize and size counts to the > same > > values? Is there any reason to set a smaller initialSize if I know > reliably > > that where my limit will almost always be? > > Probably not much... > The only savings will be the 8 bytes (on a 64 bit proc) per unused > array slot (in the HashMap). > Maybe we should consider removing the initialSize param from the > example config to reduce the amount of stuff a user needs to think > about. > > -Yonik > http://www.lucidimagination.com
RE: For caches, any reason to not set initialSize and size to the same value?
Funny, Arrays.copy() for HashMap... but something similar... Anyway, I use same values for initial size and max size, to be safe... and to have OOP at startup :) > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: February-12-10 6:55 PM > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Subject: RE: For caches, any reason to not set initialSize and size to > the same value? > > I always use initial size = max size, > just to avoid Arrays.copyOf()... > > Initial (default) capacity for HashMap is 16, when it is not enough - > array > copy to new 32-element array, then to 64, ... > - too much wasted space! (same for ConcurrentHashMap) > > Excuse me if I didn't understand the question... > > -Fuad > http://www.tokenizer.ca > > > > > -Original Message- > > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > > Seeley > > Sent: February-12-10 6:30 PM > > To: solr-user@lucene.apache.org > > Subject: Re: For caches, any reason to not set initialSize and size to > > the same value? > > > > On Fri, Feb 12, 2010 at 5:23 PM, Jay Hill > > wrote: > > > If I've done a lot of research and have a very good idea of where my > > cache > > > sizes are having monitored the stats right before commits, is there > > any > > > reason why I wouldn't just set the initialSize and size counts to > the > > same > > > values? Is there any reason to set a smaller initialSize if I know > > reliably > > > that where my limit will almost always be? > > > > Probably not much... > > The only savings will be the 8 bytes (on a 64 bit proc) per unused > > array slot (in the HashMap). > > Maybe we should consider removing the initialSize param from the > > example config to reduce the amount of stuff a user needs to think > > about. > > > > -Yonik > > http://www.lucidimagination.com >
Re: Deleting spelll checker index
Any update on this Do you guys want to rephrase my question, if its not clear. Thanks darniz darniz wrote: > > HI Guys > Opening this thread again. > I need to get around this issue. > i have a spellcheck field defined and i am copying two fileds make and > model to this field > > > i have buildoncommit and buildonoptimize set to true hence when i index > data and try to search for a work accod i get back suggestion accord since > model is also being copied. > I stop the sorl server removed the copy filed for model. now i only copy > make to the spellText field and started solr server. > i refreshed the dictiaonry by issuring the following command. > spellcheck.build=true&spellcheck.dictionary=default > So i hope it should rebuild by dictionary, bu the strange thing is that it > still gives a suggestion for accrd. > I have to reindex data again and then it wont offer me suggestion which is > the correct behavour. > > How can i create the dictionary again by changing my schema and issuing a > command > spellcheck.build=true&spellcheck.dictionary=default > > i cant afford to reindex data everytime. > > Any answer ASAP will be appreciated > > Thanks > darniz > > > > > > > > > > darniz wrote: >> >> Then i assume the easiest way is to delete the directory itself. >> >> darniz >> >> >> hossman wrote: >>> >>> >>> : We are using Index based spell checker. >>> : i was wondering with the help of any url parameters can we delete the >>> spell >>> : check index directory. >>> >>> I don't think so. >>> >>> You might be able to configure two differnet spell check components that >>> point at the same directory -- one hat builds off of a real field, and >>> one >>> that builds off of an (empty) text field (using FileBasedSpellChecker) >>> .. >>> then you could trigger a rebuild of an empty spell checking index using >>> the second component. >>> >>> But i've never tried it so i have no idea if it would work. >>> >>> >>> -Hoss >>> >>> >>> >> >> > > -- View this message in context: http://old.nabble.com/Deleting-spelll-checker-index-tp27376823p27570613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: implementing profanity detector
: Otherwise, I'd do it via copy fields. Your first field is your main : field and is analyzed as before. Your second field does the profanity : detection and simply outputs a single token at the end, safe/unsafe. you don't even need custom code for this ... copyFiled all your text into a 'has_profanity' field where you use a suitable Tokenizer followed by the KeepWordsTokenFilter that only keeps profane words and then a PatternReplaceTokenFilter that matches .* and replaces it with "HELL_YEA" ... now a search for "is_profane:HELL_YEA" finds all profane docs, with the added bonus that the scores are based on how many profane words occur in the doc. it could be used as a filter query (probably negated) as needed. -Hoss
Re: expire/delete documents
: You could easily have a scheduled job that ran delete by query to : remove posts older than a certain date... or since you specificly asked about delteing anything older then X days (in this example i'm assuming x=7)... createTime:[NOW-7DAYS TO *] -Hoss
migrating from solr 1.3 to 1.4
Hi there, I'm trying to migrate from solr 1.3 to solr 1.4 and I've few issues. Initially my localsolr was throwing NullPointer exception and I fixed it by changing type of lat and lng to 'tdouble'. But now I'm not able to update index. When I try to update index it throws out error saying - Feb 12, 2010 2:14:11 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Feb 12, 2010 2:14:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchFieldError: log at com.pjaol.search.solr.update.LocalUpdaterProcessor.processAdd(LocalUpdateProcessorFactory.java:138) I tried searching on net, but none of post regarding this issue is answered. Have anyone come across this issue? Thanks, Sachin.
cannot match on phrase queries
I am seeing this in several of my fields. I have something like "Samsung X150" or "Nokia BH-212". And my query will not match on X150 or BH-212. So, my query is something like +model:(Samsung X150). Through debugQuery, I see that this gets converted to +(model:samsung model:"x 150"). It matches on Samsung, but not X150. A simple query like model:BH-212 simply fails. model:BH212 also fails. The only query that seems to work is model:(BH 212). Here is the schema for that field: Any ideas? According to the analyzer, I would expect the phrase "BH-212" to match on "bh" and "212". Or am I missing something? Also, is there anyway to tell the parser to not convert "X150" into a phrase query. I have some cases when it would be more useful to turn it into +(X 150).
Re: Solr 1.4: Full import FileNotFoundException
: I have noticed that when I run concurrent full-imports using DIH in Solr : 1.4, the index ends up getting corrupted. I see the following in the log I'm fairly confident that concurrent imports won't work -- but it shouldn't corrupt your index -- even if the DIH didn't actively check for this type of situation, the underlying Lucene LockFactory should ensure that one of the inports "wins" ... you'll need to tell us what kind of Filesystem you are using, and show us the relevent settings from your solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, etc...) At worst you should get a lock time out exception. : But I looked at: : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : and was under the impression that this issue was fixed in Solr 1.4. ...right, attempting to run two concurrent imports with DIH should cause the second one to abort immediatley. -Hoss
Re: Cannot get like exact searching to work
: > Can your query consist of more than one words? : : Yes, and I expect it almost always will (the query string is coming : from a search box on a website). ... : Actually it won't. The data I am indexing has extra spaces in front : and is capitalized. I really need to be able to filter it through the : lowercase and trim filter without tokenizing it. ... : >> The idea is that a "phrase" match would be boosted over the : >> normal : >> token matches and would show up first in the listing. Let This is starting to smell like an XY Problem... http://people.apache.org/~hossman/#xyproblem ...you mentioned wanting prefix type queries to work, but that seems to be based on your initial approach of using an "exact" (ie: untokenized) field for your matches -- all of your examples seem to want matching at a "word" level, not partial words. If your ultimate goal is just that "exact' matches score higher then documents containing all fo the same words in a differnet order (which should score higher then docs only containing a few of the words) then i think you are just making things harder for yourself then you really need ... "defType=dismax" should be able to solve all of your problems -- just specify the field(s) you want to search in the qf and pf params and documents with all the "words" in a phrase will appear first. -Hoss
Interesting stuff; Solr as a syslog store.
Hey everyone, I don't actually have a question, but I just thought I'd share something really cool that I did with Solr for our company. We run a good amount of servers, well into the several hundreds, and naturally we need a way to centralize all of the system logs. For a while we used a commercial solution to centralize and search our logs, but they wanted to charge us tens of thousands of dollars for just one gigabyte/day more of indexed data. So I said forget it, I'll write my own solution! We already use Solr for some of our other backend searching systems, so I came up with an idea to index all of our logs to Solr. I wrote a daemon in perl that listens on the syslog port, and pointed every single system's syslog to forward to this single server. From there, this daemon will write to a Solr indexing server after parsing them into fields, such as date/time, host, program, pid, text, etc. I then wrote a cool javascript/ajax web front end for Solr searching, and bam. Real time searching of all of our syslogs from a web interface, for no cost! Just thought this would be a neat story to share with you all. I've really grown to love Solr, it's something else! Thanks, -Antonio
Re: sorting
:title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8 :title^1.2 contentEN^0.8 contentIT^0.8 contentDE^0.8 FWIW: I don't think you understand what the "bf" param is for ... it's not analogous to qf and pf, it's for expressing a list of boost functions -- a function can be a simple field name, but that typically only makes sense if it's numeric. that *may* be causing your problem, if the function parser is attempting to generate the FieldCache for your content fields. : now, solr is complaining about some sorting issues on content* as they "solr is complaining" is relaly vauge... please explain *exactly* what the error message is, where you see it, what the full stack trace looks like if there is one, and what you did to trigger te error (ie: did it happen on startup? did it happen when you executed a query? what was the full URL of hte query? -Hoss
Re: sorting
: that *may* be causing your problem, if the function parser is attempting : to generate the FieldCache for your content fields. Yep ... that's it ... if you use a barefield name as a function, and that field name is not numeric, the result is an OrdFieldSource shiceh uses the FieldCache. I opened a bug to improve the error message... https://issues.apache.org/jira/browse/SOLR-1771 -Hoss
RE: expire/delete documents
> or since you specificly asked about delteing anything older > then X days (in this example i'm assuming x=7)... > > createTime:[NOW-7DAYS TO *] createTime:[* TO NOW-7DAYS]
Re: How to reindex data without restarting server
: if you use the core model via solr.xml you can reload a core without having to : to restart the servlet container, : http://wiki.apache.org/solr/CoreAdmin For making a schema change, the steps would be: - create a "new_core" with the new schema - reindex all the docs into "new_core" - "SWAP" "old_core" and "new_core" so all the old URLs now point at the new core with the new schema. -Hoss
Re: Deleting spelll checker index
: Any update on this Patience my friend ... 5 hours after you send an email isn't long enough to wait before asking for "any update on this" -- it's just increasing the volume of mail everyone gets and distracting people from actual bugs/issues. FWIW: this doesn't really seem directly related to the thread you initially started about Deleting the spell checker index -- what you're asking about now is rebuilding the spellchecker index... : > I stop the sorl server removed the copy filed for model. now i only copy : > make to the spellText field and started solr server. : > i refreshed the dictiaonry by issuring the following command. : > spellcheck.build=true&spellcheck.dictionary=default : > So i hope it should rebuild by dictionary, bu the strange thing is that it : > still gives a suggestion for accrd. that's because removing the copyField declaration doens't change anything about the values that have already been copied to the "spellText" field -- rebuilding your spellcheker index is just re-reading the same indexed values from that field. : > How can i create the dictionary again by changing my schema and issuing a : > command : > spellcheck.build=true&spellcheck.dictionary=default it's just not possible. a schema change like that doesn't magicly undo all of the values that were already copied. -Hoss
Re: cannot match on phrase queries
It appears that omitTermFreqAndPositions is indeed the culprit. I assume it has to do with the fact that the index parsing of BH-212 puts multiple terms in the same position. From: Kevin Osborn To: Solr Sent: Fri, February 12, 2010 5:28:08 PM Subject: cannot match on phrase queries I am seeing this in several of my fields. I have something like "Samsung X150" or "Nokia BH-212". And my query will not match on X150 or BH-212. So, my query is something like +model:(Samsung X150). Through debugQuery, I see that this gets converted to +(model:samsung model:"x 150"). It matches on Samsung, but not X150. A simple query like model:BH-212 simply fails. model:BH212 also fails. The only query that seems to work is model:(BH 212). Here is the schema for that field: Any ideas? According to the analyzer, I would expect the phrase "BH-212" to match on "bh" and "212". Or am I missing something? Also, is there anyway to tell the parser to not convert "X150" into a phrase query. I have some cases when it would be more useful to turn it into +(X 150).
Re: Solr 1.4: Full import FileNotFoundException
concurrent imports are not allowed in DIH, unless u setup multiple DIH instances On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter wrote: > > : I have noticed that when I run concurrent full-imports using DIH in Solr > : 1.4, the index ends up getting corrupted. I see the following in the log > > I'm fairly confident that concurrent imports won't work -- but it > shouldn't corrupt your index -- even if the DIH didn't actively check for > this type of situation, the underlying Lucene LockFactory should ensure > that one of the inports "wins" ... you'll need to tell us what kind of > Filesystem you are using, and show us the relevent settings from your > solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, > etc...) > > At worst you should get a lock time out exception. > > : But I looked at: > : > http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html > : > : and was under the impression that this issue was fixed in Solr 1.4. > > ...right, attempting to run two concurrent imports with DIH should cause > the second one to abort immediatley. > > > > > -Hoss > > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4: Full import FileNotFoundException
: concurrent imports are not allowed in DIH, unless u setup multiple DIH instances Right, but that's not the issue -- the question is wether attemping to do so might be causing index corruption (either because of a bug or because of some possibly really odd config we currently know nothing about) : > : I have noticed that when I run concurrent full-imports using DIH in Solr : > : 1.4, the index ends up getting corrupted. I see the following in the log : > : > I'm fairly confident that concurrent imports won't work -- but it : > shouldn't corrupt your index -- even if the DIH didn't actively check for : > this type of situation, the underlying Lucene LockFactory should ensure : > that one of the inports "wins" ... you'll need to tell us what kind of : > Filesystem you are using, and show us the relevent settings from your : > solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, : > etc...) : > : > At worst you should get a lock time out exception. : > : > : But I looked at: : > : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : > : : > : and was under the impression that this issue was fixed in Solr 1.4. : > : > ...right, attempting to run two concurrent imports with DIH should cause : > the second one to abort immediatley. : > : > : > : > : > -Hoss : > : > : : : : -- : - : Noble Paul | Systems Architect| AOL | http://aol.com : -Hoss
parsing strings into phrase queries
Right now if I have the query model:(Nokia BH-212V), the parser turns this into +(model:nokia model:"bh 212 v"). The problem is that I might have a model called Nokia BH-212, so this is completely missed. In my case, I would like my query to be +(model:nokia model:bh model:212 model:v). This is my schema for the field:
Re: Interesting stuff; Solr as a syslog store.
Am 13.02.2010 um 03:02 schrieb Antonio Lobato: > Just thought this would be a neat story to share with you all. I've really > grown to love Solr, it's something else! Hi Antonio, Great. Would you also share the source code somewhere! May the Source be with you. Thanks. Olivier