Re: Standard request with functional query
: Thanks for the response, but how would make recency a factor on : scoring documents with the standard request handler. : The query (title:iphone OR bodytext:iphone OR title:firmware OR : bodytext:firmware) AND _val_:"ord(dateCreated)"^0.1 : seems to do something very similar to just sorting by dateCreated : rather than having dateCreated being a part of the score. you have to look at the score explanations (debugQuery=true) and decide what boost is appropriate. there are no magic numbers that work for everyone. : : Thanks, : Sammy : : n Thu, Dec 4, 2008 at 1:35 PM, Sammy Yu wrote: : > Hi guys, : >I have a standard query that searches across multiple text fields such as : > q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware : > : > This comes back with documents that have iphone and firmware (I know I : > can use dismax handler but it seems to be really slow), which is : > great. Now I want to give some more weight to more recent documents : > (there is a dateCreated field in each document). : > : > So I've modified the query as such: : > (title:iphone OR bodytext:iphone OR title:firmware OR : > bodytext:firmware) AND _val_:"ord(dateCreated)"^0.1 : > URLencoded to q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3A"ord(dateCreated)"^0.1 : > : > However, the results are not as one would expects. The first few : > documents only come back with the word iphone and appears to be sorted : > by date created. It seems to completely ignore the score and use the : > dateCreated field for the score. : > : > On a not directly related issue it seems like if you put the weight : > within the double quotes: : > (title:iphone OR bodytext:iphone OR title:firmware OR : > bodytext:firmware) AND _val_:"ord(dateCreated)^0.1" : > : > the parser complains: : > org.apache.lucene.queryParser.ParseException: Cannot parse : > '(title:iphone OR bodytext:iphone OR title:firmware OR : > bodytext:firmware) AND _val_:"ord(dateCreated)^0.1"': Expected ',' at : > position 16 in 'ord(dateCreated)^0.1' : > : > Thanks, : > Sammy : > : -Hoss
Re: [ANNOUNCE] Solr Logo Contest Results
Congratulations Michiel.Lukas On Thu, Dec 18, 2008 at 3:44 AM, Matt Mitchell wrote: > Love it! Congratulations Michiel. > > Matt > > On Wed, Dec 17, 2008 at 9:15 PM, Chris Hostetter > wrote: > > > (replies to solr-user please) > > > > On behalf of the Solr Committers, I'm happy to announce that we the Solr > > Logo Contest is officially concluded. (Woot!) > > > > And the Winner Is... > > > > > https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg > > ...by Michiel > > > > We ran into a few hiccups during the contest making it take longer then > > intended, but the result was a thorough process in which everyone went > above > > and beyond to ensure that the final choice best reflected the wishes of > the > > community. > > > > You can expect to see the new logo appear on the site (and in the Solr > app) > > in the next few weeks. > > > > Congrats Michiel! > > > > > > -Hoss > > > > > -- http://blog.lukas-vlcek.com/
Solrj - Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList
Hi all, I used the sample code given below and tried to run with all the relevant jars. I receive the exception written below. package test.general; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.response.UpdateResponse; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.client.solrj.response.FacetField; import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.common.params.SolrParams; import java.io.IOException; import java.util.Collection; import java.util.HashSet; import java.util.Random; import java.util.List; /** * Connect to Solr and issue a query */ public class SolrJExample { public static final String [] CATEGORIES = {"a", "b", "c", "d"}; public static void main(String[] args) throws IOException, SolrServerException { SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/solr/update";); Random rand = new Random(); //Index some documents Collection docs = new HashSet(); for (int i = 0; i < 10; i++) { SolrInputDocument doc = new SolrInputDocument(); doc.addField("link", "http://non-existent-url.foo/"; + i + ".html"); doc.addField("source", "Blog #" + i); doc.addField("source-link", "http://non-existent-url.foo/index.html";); doc.addField("subject", "Subject: " + i); doc.addField("title", "Title: " + i); doc.addField("content", "This is the " + i + "(th|nd|rd) piece of content."); doc.addField("category", CATEGORIES[rand.nextInt(CATEGORIES.length)]); doc.addField("rating", i); System.out.println("Doc[" + i + "] is " + doc); docs.add(doc); } UpdateResponse response = server.add(docs); System.out.println("Response: " + response); //Make the documents available for search server.commit(); //create the query SolrQuery query = new SolrQuery("content:piece"); //indicate we want facets query.setFacet(true); //indicate what field to facet on query.addFacetField("category"); //we only want facets that have at least one entry query.setFacetMinCount(1); //run the query QueryResponse results = server.query(query); System.out.println("Query Results: " + results); //print out the facets List facets = results.getFacetFields(); for (FacetField facet : facets) { System.out.println("Facet:" + facet); } } } The exception : Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList at org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar yResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:385) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.jav a:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at test.general.SolrJExample.main(SolrJExample.java:48) Can someone help me out. Regards, Sajith Vimukthi Weerakoon Associate Software Engineer | ZONE24X7 | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 | http://www.zone24x7.com
Solrj - Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList
Hi all, I used the sample code given below and tried to run with all the relevant jars. I receive the exception written below. package test.general; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.response.UpdateResponse; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.client.solrj.response.FacetField; import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.common.params.SolrParams; import java.io.IOException; import java.util.Collection; import java.util.HashSet; import java.util.Random; import java.util.List; /** * Connect to Solr and issue a query */ public class SolrJExample { public static final String [] CATEGORIES = {"a", "b", "c", "d"}; public static void main(String[] args) throws IOException, SolrServerException { SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/solr/update";); Random rand = new Random(); //Index some documents Collection docs = new HashSet(); for (int i = 0; i < 10; i++) { SolrInputDocument doc = new SolrInputDocument(); doc.addField("link", "http://non-existent-url.foo/"; + i + ".html"); doc.addField("source", "Blog #" + i); doc.addField("source-link", "http://non-existent-url.foo/index.html";); doc.addField("subject", "Subject: " + i); doc.addField("title", "Title: " + i); doc.addField("content", "This is the " + i + "(th|nd|rd) piece of content."); doc.addField("category", CATEGORIES[rand.nextInt(CATEGORIES.length)]); doc.addField("rating", i); System.out.println("Doc[" + i + "] is " + doc); docs.add(doc); } UpdateResponse response = server.add(docs); System.out.println("Response: " + response); //Make the documents available for search server.commit(); //create the query SolrQuery query = new SolrQuery("content:piece"); //indicate we want facets query.setFacet(true); //indicate what field to facet on query.addFacetField("category"); //we only want facets that have at least one entry query.setFacetMinCount(1); //run the query QueryResponse results = server.query(query); System.out.println("Query Results: " + results); //print out the facets List facets = results.getFacetFields(); for (FacetField facet : facets) { System.out.println("Facet:" + facet); } } } The exception : Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList at org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar yResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:385) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.jav a:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at test.general.SolrJExample.main(SolrJExample.java:48) Can someone help me out. Regards, Sajith Vimukthi Weerakoon Associate Software Engineer | ZONE24X7 | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 | http://www.zone24x7.com
Re: date facets doubt
has anyone experienced this problem? Can't find an explanation... Thanks in advance Marc Sturlese wrote: > > Hey there, > > 1.- I am trying to use date facets but I am facing a trouble. I want to > use the same field to do 2 facet classification. I want to show the count > of the docs of the last week and the counts od the docs of the last month. > What I am doing is: > > > source_date > NOW/DAY-1MONTH > NOW/DAY > +1MONTH > > > source_date > NOW/DAY-7DAY > NOW/DAY > +7DAY > > What i am getting as result is 2 facect result that are exactly the same > (the result is just the first facet showed two times) > > > > 45 > +1MONTH > 2008-12-17T00:00:00Z > > > 45 > +1MONTH > 2008-12-17T00:00:00Z > > > > I supose I am doing somenthing wrong in the sintax... any advice? > Thanks in advance > > -- View this message in context: http://www.nabble.com/date-facets-doubt-tp21050107p21069438.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj - Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.solr.common.util.NamedList
which version of the server are you using? SolrJ documenttaion says that the binary format works only with Solr1.3 On Thu, Dec 18, 2008 at 2:49 PM, Sajith Vimukthi wrote: > > > Hi all, > > > > I used the sample code given below and tried to run with all the relevant > jars. I receive the exception written below. > > > > package test.general; > > > > import org.apache.solr.client.solrj.SolrServer; > > import org.apache.solr.client.solrj.SolrServerException; > > import org.apache.solr.client.solrj.SolrQuery; > > import org.apache.solr.client.solrj.response.UpdateResponse; > > import org.apache.solr.client.solrj.response.QueryResponse; > > import org.apache.solr.client.solrj.response.FacetField; > > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; > > import org.apache.solr.common.SolrInputDocument; > > import org.apache.solr.common.params.SolrParams; > > > > > > > > import java.io.IOException; > > import java.util.Collection; > > import java.util.HashSet; > > import java.util.Random; > > import java.util.List; > > > > /** > > * Connect to Solr and issue a query > > */ > > public class SolrJExample { > > > > public static final String [] CATEGORIES = {"a", "b", "c", "d"}; > > > > public static void main(String[] args) throws IOException, > SolrServerException { > >SolrServer server = new > CommonsHttpSolrServer("http://localhost:8080/solr/update";); > >Random rand = new Random(); > > > > > >//Index some documents > >Collection docs = new HashSet(); > >for (int i = 0; i < 10; i++) { > > SolrInputDocument doc = new SolrInputDocument(); > > doc.addField("link", "http://non-existent-url.foo/"; + i + ".html"); > > doc.addField("source", "Blog #" + i); > > doc.addField("source-link", "http://non-existent-url.foo/index.html";); > > doc.addField("subject", "Subject: " + i); > > doc.addField("title", "Title: " + i); > > doc.addField("content", "This is the " + i + "(th|nd|rd) piece of > content."); > > doc.addField("category", CATEGORIES[rand.nextInt(CATEGORIES.length)]); > > doc.addField("rating", i); > > System.out.println("Doc[" + i + "] is " + doc); > > docs.add(doc); > >} > > > >UpdateResponse response = server.add(docs); > >System.out.println("Response: " + response); > >//Make the documents available for search > >server.commit(); > >//create the query > >SolrQuery query = new SolrQuery("content:piece"); > >//indicate we want facets > >query.setFacet(true); > >//indicate what field to facet on > >query.addFacetField("category"); > >//we only want facets that have at least one entry > >query.setFacetMinCount(1); > >//run the query > >QueryResponse results = server.query(query); > >System.out.println("Query Results: " + results); > >//print out the facets > >List facets = results.getFacetFields(); > >for (FacetField facet : facets) { > > System.out.println("Facet:" + facet); > >} > > > > > > } > > > > } > > > > > > The exception : > > > > Exception in thread "main" java.lang.ClassCastException: java.lang.Long > cannot be cast to org.apache.solr.common.util.NamedList > > at > org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89) > > at > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar > yResponseParser.java:39) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS > olrServer.java:385) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS > olrServer.java:183) > > at > org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.jav > a:217) > > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) > > at test.general.SolrJExample.main(SolrJExample.java:48) > > > > > > Can someone help me out. > > > > Regards, > > Sajith Vimukthi Weerakoon > > Associate Software Engineer | ZONE24X7 > > | Tel: +94 11 2882390 ext 101 | Fax: +94 11 2878261 | > > http://www.zone24x7.com > > > > -- --Noble Paul
[SolrJ] SolrException: missing content stream
Hi, I'm using SolrJ to index a couple of documents. I do this in batches of 50 docs to safe some machine memory. I call SolrServer#add(Collection) for each batch. For some reason, I get the following exception: org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:114) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:147) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) Any ideas what could be the issue? It actually worked fine when I added only one doc at a time. -Gunnar -- Gunnar Wagenknecht gun...@wagenknecht.org http://wagenknecht.org/
Multi language search help
Hi, I am prototyping lanuage search using solr 1.3 .I have 3 fields in the schema -id,content and language. I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. I use xpdf to convert the content of pdf to text and push the text to solr in the content field. What is the analyzer that i need to use for the above. By using the default text analyzer and posting this content to solr, i am not getting any results. Does solr support stemming for the above languages. Regards Sujatha
Re: [SolrJ] SolrException: missing content stream
are you sure the Collection is not empty? what version are you running? what do the server logs say when you get this error on the client? On Dec 18, 2008, at 6:42 AM, Gunnar Wagenknecht wrote: Hi, I'm using SolrJ to index a couple of documents. I do this in batches of 50 docs to safe some machine memory. I call SolrServer#add(Collection) for each batch. For some reason, I get the following exception: org.apache.solr.common.SolrException: missing content stream at org .apache .solr .handler .XmlUpdateRequestHandler .handleRequestBody(XmlUpdateRequestHandler.java:114) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org .apache .solr .client .solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 147) at org .apache .solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) Any ideas what could be the issue? It actually worked fine when I added only one doc at a time. -Gunnar -- Gunnar Wagenknecht gun...@wagenknecht.org http://wagenknecht.org/
Change in config file (synonym.txt) requires container restart?
Hi, I am using SolrJ client to connect to the Solr 1.3 server and the whole POC (doing a feasibility study ) reside in Tomcat web server. If any change I am making in the synonym.txt file to add the synonym in the file to make it reflect I have to restart the tomcat server. The synonym filter factory that I am using are in both in analyzers for type index and query in schema.xml. Please tell me whether this approach is good or any other way to make the change reflect while searching without restarting of tomcat server. Thanks and Regards, Sagar Khetkade _ Chose your Life Partner? Join MSN Matrimony FREE http://in.msn.com/matrimony
Re: Change in config file (synonym.txt) requires container restart?
Sagar Khetkade wrote: Hi, I am using SolrJ client to connect to the Solr 1.3 server and the whole POC (doing a feasibility study ) reside in Tomcat web server. If any change I am making in the synonym.txt file to add the synonym in the file to make it reflect I have to restart the tomcat server. The synonym filter factory that I am using are in both in analyzers for type index and query in schema.xml. Please tell me whether this approach is good or any other way to make the change reflect while searching without restarting of tomcat server. Thanks and Regards, Sagar Khetkade _ Chose your Life Partner? Join MSN Matrimony FREE http://in.msn.com/matrimony You can also reload the core. - Mark
Re: Get All terms from all documents
I think I'd pin the user down and have him give me the real-world use-cases that require this, then see if there's a more reasonable way to satisfy that use-case. Do they want type-ahead? What is the user of the system going to see? Because, for instance, a drop-down of 10,000 terms is totally useless. Best Erick On Wed, Dec 17, 2008 at 10:02 PM, roberto wrote: > Grant > > It completely crazy do something like this i know, but the customer want´s, > i´m really trying to figure out how to do it in a better way, maybe using > the (auto suggest) filter from solr 1.3 to get all the words starting with > some letter and cache the letter in the client side, out client is going to > be write in swing, what do you guys think? > > Thanks, > > On Wed, Dec 17, 2008 at 8:05 PM, Grant Ingersoll >wrote: > > > All terms from all docs? Really? > > > > At any rate, see http://wiki.apache.org/solr/TermsComponent May need a > > mod to not require any field, but for now you can enter all fields (which > > you can get from LukeRequestHandler) > > > > -Grant > > > > > > > > On Dec 17, 2008, at 2:17 PM, roberto wrote: > > > > Hello, > >> > >> I need to get all terms from all documents to be placed in my interface > >> almost like the facets, how can i do it? > >> > >> thanks > >> > >> -- > >> "Without love, we are birds with broken wings." > >> Morrie > >> > > > > -- > > Grant Ingersoll > > > > Lucene Helpful Hints: > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > > > > > > > > > > > > > -- > "Without love, we are birds with broken wings." > Morrie >
Re: looking for multilanguage indexing best practice/hint
See the CJKAnalyzer for a start, StandardAnalyzer won't help you much. Also, tell us a little more about your requirements. For instance, if a user submits a query in Japanese, do you want to search across documents in the other languages too? And will you want to associate different analyzers with the content from different languages? You really have two options: if you want different analyzers used with the different languages, you probably have to index the content in different fields. That is a Chinese document would have a chinese_content field, a Japanese document would have a japanese_content field etc. Now you can associate a different analyzer with each *_content field. If the same analyzer would work for all three languages, you can just index all the content in a "content" field, and if you need to restrict searching to the language in which the query was submitted, you could always add a clause on the language, e.g. AND language:chinese Hope this helps Erick On Wed, Dec 17, 2008 at 11:15 PM, Sujatha Arun wrote: > Hi, > > I am prototyping lanuage search using solr 1.3 .I have 3 fields in the > schema -id,content and language. > > I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. > > I use xpdf to convert the content of pdf to text and push the text to solr > in the content field. > > What is the analyzer that i need to use for the above. > > By using the default text analyzer and posting this content to solr, i am > not getting any results. > > Does solr support stemmin for the above languages. > > Regards > Sujatha > > > > > On 12/18/08, Feak, Todd wrote: > > > > Don't forget to consider scaling concerns (if there are any). There are > > strong differences in the number of searches we receive for each > > language. We chose to create separate schema and config per language so > > that we can throw servers at a particular language (or set of languages) > > if we needed to. We see 2 orders of magnitude difference between our > > most popular language and our least popular. > > > > -Todd Feak > > > > -Original Message- > > From: Julian Davchev [mailto:j...@drun.net] > > Sent: Wednesday, December 17, 2008 11:31 AM > > To: solr-user@lucene.apache.org > > Subject: looking for multilanguage indexing best practice/hint > > > > Hi, > > From my study on solr and lucene so far it seems that I will use single > > scheme.at least don't see scenario where I'd need more than that. > > So question is how do I approach multilanguage indexing and multilang > > searching. Will it really make sense for just searching word..or rather > > I should supply lang param to search as well. > > > > I see there are those filters and already advised on them but I guess > > question is more of a best practice. > > solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory > > > > So solution I see is using copyField I have same field in different > > langs or something using distinct filter. > > Cheers > > > > > > > > >
Highlighting broken? String index out of range: 35
Hi everyone, it seems that I've run into another problem with my Solr setup. :/ The highlighter just won't highlight anything, no matter which fragmenter or config params I use. Here's an example, taken straight out of the example solrconfig.xml: dismax explicit 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 id,name,price,score 2<-1 5<-2 6<90% 100 *:* text features name 0 name regex Whenever I try to activate the highlighter, it produces an error: http://localhost:8983/solr/select/?q=ipod&version=2.2&start=0&rows=10&indent=on&qt=dismax&hl=true HTTP ERROR: 500 String index out of range: 35 java.lang.StringIndexOutOfBoundsException: String index out of range: 35 at java.lang.String.substring(Unknown Source) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:239) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) That's what happens with the example setup - on my project it simply won't highlight anything at all, no matter what I try. :| Can anyone shed some light on this? -- View this message in context: http://www.nabble.com/Highlighting-broken--String-index-out-of-range%3A-35-tp21073102p21073102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting broken? String index out of range: 35
Alright, I pinned it down, I think... The cause of the error seems to be the "features" field, which has termVectors="true", termPositions="true" and termOffsets="true". The other 2 fields ("name" and "text") work, they have the same type but lack the term*-attributes. When you overwrite the default hl.fl with something like "name text" it works, but add "features" to it and you get the error. Steffen B. wrote: > > Hi everyone, > it seems that I've run into another problem with my Solr setup. :/ The > highlighter just won't highlight anything, no matter which fragmenter or > config params I use. > Here's an example, taken straight out of the example solrconfig.xml: > > > dismax > explicit > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > > text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 > > > ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 > > > id,name,price,score > > > 2<-1 5<-2 6<90% > > 100 > *:* > > text features name > > 0 > > name > regex > > > > Whenever I try to activate the highlighter, it produces an error: > http://localhost:8983/solr/select/?q=ipod&version=2.2&start=0&rows=10&indent=on&qt=dismax&hl=true > > HTTP ERROR: 500 > > String index out of range: 35 > > java.lang.StringIndexOutOfBoundsException: String index out of range: 35 > at java.lang.String.substring(Unknown Source) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:239) > at > org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310) > at > org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > That's what happens with the example setup - on my project it simply won't > highlight anything at all, no matter what I try. :| Can anyone shed some > light on this? > -- View this message in context: http://www.nabble.com/Highlighting-broken--String-index-out-of-range%3A-35-tp21073102p21073356.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr openning many threads
Hello, I can see from a thread dump that Solr opens a lot of threads. How does Solr use these threads? Does exist more than one thread for search in Solr? Does Solr use any type of workManager or are the threads simple java.lang.Thread ? How many concurrent threads does Solr create? How does it manage them? -- Alexander Ramos Jardim
Re: Highlighting broken? String index out of range: 35
I think you are facing this problem: https://issues.apache.org/jira/browse/SOLR-925 I'm just looking the issue to solve it, I'm not sure that I can fix it in my time, though... Koji Steffen B. wrote: Hi everyone, it seems that I've run into another problem with my Solr setup. :/ The highlighter just won't highlight anything, no matter which fragmenter or config params I use. Here's an example, taken straight out of the example solrconfig.xml: dismax explicit 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3 id,name,price,score 2<-1 5<-2 6<90% 100 *:* text features name 0 name regex Whenever I try to activate the highlighter, it produces an error: http://localhost:8983/solr/select/?q=ipod&version=2.2&start=0&rows=10&indent=on&qt=dismax&hl=true HTTP ERROR: 500 String index out of range: 35 java.lang.StringIndexOutOfBoundsException: String index out of range: 35 at java.lang.String.substring(Unknown Source) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:239) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:310) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:83) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:171) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1313) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) That's what happens with the example setup - on my project it simply won't highlight anything at all, no matter what I try. :| Can anyone shed some light on this?
Problem in Date Format in Solr 1.3
Hi I have upgraded from solr lucene 1.2 to solr lucene 1.3. I have coppied all the "" tag of "schema.xml" from the solr 1.2 to solr 1.3 it gives an error.. SEVERE: org.apache.solr.common.SolrException: Invalid Date in Date Math String:'2006-Oct-10T10:06:13Z' can you help me in this problem. with regards Rohit Arora
TermVectorComponent and SolrJ
Hello everyone, I've started to look at TermVectorComponent and I'm experimenting with the use of the component in a sort of "top terms" setting for a given query... Was also looking at mlt and the interestingTerms, but I would like to do a query, get say 10k results, and from those results return a list of "top 10 terms" or something similar... Haven't really thought too much about it yet, but I was wondering if anyone have done any work on making the term vector response available in a simple manner with solrj yet? Or if this is planned? (In the same sense as it is today with facets (response.getFacetFields() etc..). Not that I cant manage to write it myself, but I would recon that more people than me would be interessted in this. I'd be more than happy to contribute if it is wanted, just wanted to check if anyone have started on this already or not. Cheers, Aleks -- Aleksander M. Stensby Senior software developer Integrasco A/S Please consider the environment before printing all or any of this e-mail
Re: Solr openning many threads
On Thu, Dec 18, 2008 at 9:03 AM, Alexander Ramos Jardim wrote: > I can see from a thread dump that Solr opens a lot of threads. > > How does Solr use these threads? Does exist more than one thread for search > in Solr? Does Solr use any type of workManager or are the threads simple > java.lang.Thread ? How many concurrent threads does Solr create? How does it > manage them? Unless distributed search is being used, Solr currently has one single thread executor for background warming. There is a thread-per-request, but that's just the way servlet containers work (Jetty, Tomcat, etc) You can control the max number of threads that are created in the servlet container config. -Yonik
Solr and Autocompletion
Hi, One of things we are looking for is to Autofill the keywords when people start typing. (e.g. Google autofill) Currently we are using the RangeQuery. I read about the PrefixQuery and feel that it might be appropriate for this kind of implementation. Has anyone implemented the autofill feature? If so what do you recommend? Thanks, Raghu
RE: looking for multilanguage indexing best practice/hint
Hi Sujatha. I've developed a search system for 6 different languages and as it was implemented on Solr 1.2 all those languages are part of the same index, using different fields for each so I can have different analyzers for each one. Like: content_chinese content_english content_russian content_arabic I've also defined a language field that I use to be able to separate those on query time. As you are going to implement it using Solr 1.3 I would rather create one core per language and keep my schema simpler without the _language suffix. Each schema (one per language) would have only, say, content which depending on its language will use a proper analyzer and filters. Having a separate core per language is also good as the scores for a language won't be affected by the indexing of documents in other languages. Do you have any requirement for searching in any language, say q=test and this term should be found in any language? If so, you may think of distributed search to combine your results or even to take the same approach I've taken as I couldn't use multi-core. I'm also using the Dismax request handler, that's worth to have a look so you can pre-define some base query parts and also do score boosting behind the scenes. I hope it helps. Regards, Daniel -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: 18 December 2008 04:15 To: solr-user@lucene.apache.org Subject: Re: looking for multilanguage indexing best practice/hint Hi, I am prototyping lanuage search using solr 1.3 .I have 3 fields in the schema -id,content and language. I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. I use xpdf to convert the content of pdf to text and push the text to solr in the content field. What is the analyzer that i need to use for the above. By using the default text analyzer and posting this content to solr, i am not getting any results. Does solr support stemmin for the above languages. Regards Sujatha On 12/18/08, Feak, Todd wrote: > > Don't forget to consider scaling concerns (if there are any). There > are strong differences in the number of searches we receive for each > language. We chose to create separate schema and config per language > so that we can throw servers at a particular language (or set of > languages) if we needed to. We see 2 orders of magnitude difference > between our most popular language and our least popular. > > -Todd Feak > > -Original Message- > From: Julian Davchev [mailto:j...@drun.net] > Sent: Wednesday, December 17, 2008 11:31 AM > To: solr-user@lucene.apache.org > Subject: looking for multilanguage indexing best practice/hint > > Hi, > From my study on solr and lucene so far it seems that I will use > single scheme.at least don't see scenario where I'd need more than that. > So question is how do I approach multilanguage indexing and multilang > searching. Will it really make sense for just searching word..or > rather I should supply lang param to search as well. > > I see there are those filters and already advised on them but I guess > question is more of a best practice. > solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory > > So solution I see is using copyField I have same field in different > langs or something using distinct filter. > Cheers > > > > http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Solr and Autocompletion
lots of options out there Rather then doing a slow query like Prefix, i think its best to index the ngrams so the autocomplete is a fast query. http://www.mail-archive.com/solr-user@lucene.apache.org/msg06776.html On Dec 18, 2008, at 11:56 AM, Kashyap, Raghu wrote: Hi, One of things we are looking for is to Autofill the keywords when people start typing. (e.g. Google autofill) Currently we are using the RangeQuery. I read about the PrefixQuery and feel that it might be appropriate for this kind of implementation. Has anyone implemented the autofill feature? If so what do you recommend? Thanks, Raghu
Re: looking for multilanguage indexing best practice/hint
: Subject: looking for multilanguage indexing best practice/hint : References: <49483388.8030...@drun.net> : <502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com> : <8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com> : In-Reply-To: <8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss
Re: Solr and Autocompletion
: Subject: Solr and Autocompletion : References: <49483388.8030...@drun.net> : <502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com> : <8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com> : <4949537a.3050...@drun.net> : <8599f2e4e80ecc44aee81fa2974ce2bd0c31d...@mail-sd1.ad.soe.sony.com> : <414cb3700812172015y2c0481c3hc6345392d514a...@mail.gmail.com> : <359a92830812180538q424a0744j3be8a109cec81...@mail.gmail.com> : In-Reply-To: <359a92830812180538q424a0744j3be8a109cec81...@mail.gmail.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss
Re: [ANNOUNCE] Solr Logo Contest Results
Good choice! Mathijs Homminga Chris Hostetter wrote: (replies to solr-user please) On behalf of the Solr Committers, I'm happy to announce that we the Solr Logo Contest is officially concluded. (Woot!) And the Winner Is... https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg ...by Michiel We ran into a few hiccups during the contest making it take longer then intended, but the result was a thorough process in which everyone went above and beyond to ensure that the final choice best reflected the wishes of the community. You can expect to see the new logo appear on the site (and in the Solr app) in the next few weeks. Congrats Michiel! -Hoss -- Knowlogy Helperpark 290 C 9723 ZA Groningen +31 (0)50 2103567 http://www.knowlogy.nl mathijs.hommi...@knowlogy.nl +31 (0)6 15312977
Re: Get All terms from all documents
Erick, Thanks for the answer, let me clarify the thing, we would like to have a combobox with the terms to guide the user in the search i mean, if a have thousands of documents and want to tell them how many documents in the base have the particular word, how can i do that? thanks On Thu, Dec 18, 2008 at 11:25 AM, Erick Erickson wrote: > I think I'd pin the user down and have him give me the real-world > use-cases that require this, then see if there's a more reasonable > way to satisfy that use-case. Do they want type-ahead? What > is the user of the system going to see? Because, for instance, > a drop-down of 10,000 terms is totally useless. > > Best > Erick > > On Wed, Dec 17, 2008 at 10:02 PM, roberto wrote: > > > Grant > > > > It completely crazy do something like this i know, but the customer > want´s, > > i´m really trying to figure out how to do it in a better way, maybe using > > the (auto suggest) filter from solr 1.3 to get all the words starting > with > > some letter and cache the letter in the client side, out client is going > to > > be write in swing, what do you guys think? > > > > Thanks, > > > > On Wed, Dec 17, 2008 at 8:05 PM, Grant Ingersoll > >wrote: > > > > > All terms from all docs? Really? > > > > > > At any rate, see http://wiki.apache.org/solr/TermsComponent May need > a > > > mod to not require any field, but for now you can enter all fields > (which > > > you can get from LukeRequestHandler) > > > > > > -Grant > > > > > > > > > > > > On Dec 17, 2008, at 2:17 PM, roberto wrote: > > > > > > Hello, > > >> > > >> I need to get all terms from all documents to be placed in my > interface > > >> almost like the facets, how can i do it? > > >> > > >> thanks > > >> > > >> -- > > >> "Without love, we are birds with broken wings." > > >> Morrie > > >> > > > > > > -- > > > Grant Ingersoll > > > > > > Lucene Helpful Hints: > > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > "Without love, we are birds with broken wings." > > Morrie > > > -- "Without love, we are birds with broken wings." Morrie
Approximate release date for 1.4
Just curious - if we have an approximate target release date for 1.4 / list of milestones / feature sets for the same.
Re: [ANNOUNCE] Solr Logo Contest Results
looks cool :), how about a talking mascot as Jeryl Cook twoenc...@gmail.com On Thu, Dec 18, 2008 at 1:38 PM, Mathijs Homminga wrote: > Good choice! > > Mathijs Homminga > > Chris Hostetter wrote: >> >> (replies to solr-user please) >> >> On behalf of the Solr Committers, I'm happy to announce that we the Solr >> Logo Contest is officially concluded. (Woot!) >> >> And the Winner Is... >> >> https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg >> ...by Michiel >> >> We ran into a few hiccups during the contest making it take longer then >> intended, but the result was a thorough process in which everyone went above >> and beyond to ensure that the final choice best reflected the wishes of the >> community. >> >> You can expect to see the new logo appear on the site (and in the Solr >> app) in the next few weeks. >> >> Congrats Michiel! >> >> >> -Hoss >> > > -- > Knowlogy > Helperpark 290 C > 9723 ZA Groningen > +31 (0)50 2103567 > http://www.knowlogy.nl > > mathijs.hommi...@knowlogy.nl > +31 (0)6 15312977 > > > -- Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ "Whether we bring our enemies to justice, or bring justice to our enemies, justice will be done." --George W. Bush, Address to a Joint Session of Congress and the American People, September 20, 2001
Re: Approximate release date for 1.4
On Thu, Dec 18, 2008 at 2:43 PM, Kay Kay wrote: > Just curious - if we have an approximate target release date for 1.4 / list > of milestones / feature sets for the same. Mid January. Issues included: case-by-case analysis of how ready they are (and obviously affected by committers "scratching their own itch".) -Yonik
Re: looking for multilanguage indexing best practice/hint
Thanks Erick, I think I will go with different language fields as I want to give different stop words, analyzers etc. I might also consider scheme per language so scaling is more flexible as I was already advised but this will really make sense if I have more than one server I guess, else just all other data is duplicated for no reason. We already made decision that language will be passed each time in search so won't make sense to search quert in any lang. As of CJKAnalyzer from first look doesn't seem to be in solr (haven't tried yet) and since I am noob in java will check how it's done. Will definately give a try. Thanks alot for help. Erick Erickson wrote: > See the CJKAnalyzer for a start, StandardAnalyzer won't > help you much. > > Also, tell us a little more about your requirements. For instance, > if a user submits a query in Japanese, do you want to search > across documents in the other languages too? And will you want > to associate different analyzers with the content from different > languages? You really have two options: > > if you want different analyzers used with the different languages, > you probably have to index the content in different fields. That is > a Chinese document would have a chinese_content field, a Japanese > document would have a japanese_content field etc. Now you can > associate a different analyzer with each *_content field. > > If the same analyzer would work for all three languages, you > can just index all the content in a "content" field, and if you > need to restrict searching to the language in which the query > was submitted, you could always add a clause on the > language, e.g. AND language:chinese > > Hope this helps > Erick > > On Wed, Dec 17, 2008 at 11:15 PM, Sujatha Arun wrote: > > >> Hi, >> >> I am prototyping lanuage search using solr 1.3 .I have 3 fields in the >> schema -id,content and language. >> >> I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. >> >> I use xpdf to convert the content of pdf to text and push the text to solr >> in the content field. >> >> What is the analyzer that i need to use for the above. >> >> By using the default text analyzer and posting this content to solr, i am >> not getting any results. >> >> Does solr support stemmin for the above languages. >> >> Regards >> Sujatha >> >> >> >> >> On 12/18/08, Feak, Todd wrote: >> >>> Don't forget to consider scaling concerns (if there are any). There are >>> strong differences in the number of searches we receive for each >>> language. We chose to create separate schema and config per language so >>> that we can throw servers at a particular language (or set of languages) >>> if we needed to. We see 2 orders of magnitude difference between our >>> most popular language and our least popular. >>> >>> -Todd Feak >>> >>> -Original Message- >>> From: Julian Davchev [mailto:j...@drun.net] >>> Sent: Wednesday, December 17, 2008 11:31 AM >>> To: solr-user@lucene.apache.org >>> Subject: looking for multilanguage indexing best practice/hint >>> >>> Hi, >>> From my study on solr and lucene so far it seems that I will use single >>> scheme.at least don't see scenario where I'd need more than that. >>> So question is how do I approach multilanguage indexing and multilang >>> searching. Will it really make sense for just searching word..or rather >>> I should supply lang param to search as well. >>> >>> I see there are those filters and already advised on them but I guess >>> question is more of a best practice. >>> solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory >>> >>> So solution I see is using copyField I have same field in different >>> langs or something using distinct filter. >>> Cheers >>> >>> >>> >>> >>> > >
does this break Solr? dynamicField name="*" type="ignored"
I'm seeing a weird effect with a '*' field. In the example schema.xml, there is a commented out sample: We have this un-commented, and in the schema browser via the admin interface I see that all non-dynamic fields get a type of "ignored". I see this in the Solr admin interface: Field: uid Dynamically Created From Pattern: * Field Type: ignored though the field definition is: Is this a bug in the admin interface, or a problem with using this '*' in the schema? Thanks, Peter -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: does this break Solr? dynamicField name="*" type="ignored"
Looks like it's a bug in the schema browser (i.e. just this display, no the inner workings of Solr). Could you open a JIRA issue for this? -Yonik On Thu, Dec 18, 2008 at 3:20 PM, Peter Wolanin wrote: > I'm seeing a weird effect with a '*' field. In the example > schema.xml, there is a commented out sample: > > > > > We have this un-commented, and in the schema browser via the admin > interface I see that all non-dynamic fields get a type of "ignored". > > I see this in the Solr admin interface: > > Field: uid > Dynamically Created From Pattern: * > Field Type: ignored > > though the field definition is: > > > > Is this a bug in the admin interface, or a problem with using this '*' > in the schema? > > Thanks, > > Peter > > -- > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com >
Re: Partitioning the index
It's more related to how much memory you have on your boxes, how resource intensive your queries are, how many fields you are trying to facet on, what acceptable response times are, etc. Anyway... a single box is normally good for between 5M and 50M docs, but can fall out of that range (both up and down) depending on the specifics. -Yonik On Wed, Dec 17, 2008 at 9:34 PM, s d wrote: > Hi,Is there a recommended index size (on disk, number of documents) for when > to start partitioning it to ensure good response time? > Thanks, > S >
Re: Get All terms from all documents
How do you get the word in the first place? If the combobox is for all words in your index, it's probably completely useless to provide this information because there is too much data to guide the user at all. I mean a list of 10,000 words with some sort of document frequency seems to me to require significant developer work without adding to the user experience at all... If that's the case, I'd really work with your customer and try to persuade them that this is a feature that adds little value, and that there are higher-value features you should do first. But if you really, really require the information, here's what I would recommend: Use TermDocs/TermEnum to traverse your index gathering this data *at index time*. Then create a *very special* document that you also put in your index (stored, but not indexed in this case) that contains an unique field (say frequencies). Upon startup of your searcher, read in this very special document, parse it and create a map of words and frequencies that you use to find the number of documents containing that word. Hope this helps Erick On Thu, Dec 18, 2008 at 1:53 PM, roberto wrote: > Erick, > > Thanks for the answer, let me clarify the thing, we would like to have a > combobox with the terms to guide the user in the search i mean, if a have > thousands of documents and want to tell them how many documents in the base > have the particular word, how can i do that? > > thanks > > On Thu, Dec 18, 2008 at 11:25 AM, Erick Erickson >wrote: > > > I think I'd pin the user down and have him give me the real-world > > use-cases that require this, then see if there's a more reasonable > > way to satisfy that use-case. Do they want type-ahead? What > > is the user of the system going to see? Because, for instance, > > a drop-down of 10,000 terms is totally useless. > > > > Best > > Erick > > > > On Wed, Dec 17, 2008 at 10:02 PM, roberto wrote: > > > > > Grant > > > > > > It completely crazy do something like this i know, but the customer > > want´s, > > > i´m really trying to figure out how to do it in a better way, maybe > using > > > the (auto suggest) filter from solr 1.3 to get all the words starting > > with > > > some letter and cache the letter in the client side, out client is > going > > to > > > be write in swing, what do you guys think? > > > > > > Thanks, > > > > > > On Wed, Dec 17, 2008 at 8:05 PM, Grant Ingersoll > > >wrote: > > > > > > > All terms from all docs? Really? > > > > > > > > At any rate, see http://wiki.apache.org/solr/TermsComponent May > need > > a > > > > mod to not require any field, but for now you can enter all fields > > (which > > > > you can get from LukeRequestHandler) > > > > > > > > -Grant > > > > > > > > > > > > > > > > On Dec 17, 2008, at 2:17 PM, roberto wrote: > > > > > > > > Hello, > > > >> > > > >> I need to get all terms from all documents to be placed in my > > interface > > > >> almost like the facets, how can i do it? > > > >> > > > >> thanks > > > >> > > > >> -- > > > >> "Without love, we are birds with broken wings." > > > >> Morrie > > > >> > > > > > > > > -- > > > > Grant Ingersoll > > > > > > > > Lucene Helpful Hints: > > > > http://wiki.apache.org/lucene-java/BasicsOfPerformance > > > > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > "Without love, we are birds with broken wings." > > > Morrie > > > > > > > > > -- > "Without love, we are birds with broken wings." > Morrie >
Data Import Request Handler problem: Odd performance behaviour for large number of records
Hello, I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9 I am indexing 500k records using the JDBC Data Import Request Handler. Config: Linux openSUSE 10.2 (X86-64) Dual core dual core 64bit Xeon 3GHz Dell blade 8GB RAM java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) 1GB heap for Tomcat DB: MySql on separate but similar server I am finding that the when I do a Full-Import, followed by another Full-import the import takes much longer the second and subsequent times: Run1 = 0:27:31.491 Run2 = 1:14:44:821 Run3 = 1:14:48.316 Run4 = 2:15:12.296 Run5 = 1:37:6.847 (I have run this ~10 times and got roughly the same results). I have also monitored the load on the Solr machine and the databases machine for any other activity that might impact. The final Lucene index size is 923MB. The default clean = 'true', so the index is cleared (emptied) each time, so I am concerned the second run takes 4 times the time of the first run. Am I doing something wrong here? Any help would be appreciated. I have append my data-config.xml thanks, Glen -- -
Re: does this break Solr? dynamicField name="*" type="ignored"
created issue: https://issues.apache.org/jira/browse/SOLR-929 -Peter On Thu, Dec 18, 2008 at 3:32 PM, Yonik Seeley wrote: > Looks like it's a bug in the schema browser (i.e. just this display, > no the inner workings of Solr). > Could you open a JIRA issue for this? > > -Yonik > > > On Thu, Dec 18, 2008 at 3:20 PM, Peter Wolanin > wrote: >> I'm seeing a weird effect with a '*' field. In the example >> schema.xml, there is a commented out sample: >> >> >> >> >> We have this un-commented, and in the schema browser via the admin >> interface I see that all non-dynamic fields get a type of "ignored". >> >> I see this in the Solr admin interface: >> >> Field: uid >> Dynamically Created From Pattern: * >> Field Type: ignored >> >> though the field definition is: >> >> >> >> Is this a bug in the admin interface, or a problem with using this '*' >> in the schema? >> >> Thanks, >> >> Peter >> >> -- >> -- >> Peter M. Wolanin, Ph.D. >> Momentum Specialist, Acquia. Inc. >> peter.wola...@acquia.com >> > -- -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Full reindex needed if termVectors added to fields in schema?
hi, I've successfully added fields to my schema.xml before, and been able to incrementally keep indexing documents with just the new ones picking up the fields. This appears to be similar to the case of not including certain fields in certain documents, as the other documents simply don't have them until they're added. I'm looking into testing a MoreLikeThis implementation, and have read on here that termVectors are needed to make it run acceptably. I'd like to rebuild my index, but that will take some time given the number of documents involved, and I'd like to keep incremental updates running at the same time. The constraint is on the database side not the SOLR indexing side, so improvements to indexing performance aren't my main concern here. So, my question is whether adding termVectors="true" to a couple of schema fields will work similarly to adding new fields, where the updated documents will get the vectors added and the others won't get them but will continue to work, allowing me to rebuild "in the background" while not breaking anything in my existing incremental update/release cycle. I appreciate your help. Eric Kilby -- View this message in context: http://www.nabble.com/Full-reindex-needed-if-termVectors-added-to-fields-in-schema--tp21081315p21081315.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Get All terms from all documents
On 18-Dec-08, at 10:53 AM, roberto wrote: Erick, Thanks for the answer, let me clarify the thing, we would like to have a combobox with the terms to guide the user in the search i mean, if a have thousands of documents and want to tell them how many documents in the base have the particular word, how can i do that? Sounds like you want query autocomplete. The best way to do this (including if you want the box filled with some queries), is to use the query logs, not the documents. -Mike
RE: Change in config file (synonym.txt) requires container restart?
But i am using CommonsHttpSolrServer for Solr server configuation as it is accepts the url. So here how can i reload the core. -Sagar> Date: Thu, 18 Dec 2008 07:55:02 -0500> From: markrmil...@gmail.com> To: solr-user@lucene.apache.org> Subject: Re: Change in config file (synonym.txt) requires container restart?> > Sagar Khetkade wrote:> > Hi,> > > > I am using SolrJ client to connect to the Solr 1.3 server and the whole POC (doing a feasibility study ) reside in Tomcat web server. If any change I am making in the synonym.txt file to add the synonym in the file to make it reflect I have to restart the tomcat server. The synonym filter factory that I am using are in both in analyzers for type index and query in schema.xml. Please tell me whether this approach is good or any other way to make the change reflect while searching without restarting of tomcat server.> > > > Thanks and Regards,> > Sagar Khetkade> > _> > Chose your Life Partner? Join MSN Matrimony FREE> > http://in.msn.com/matrimony> > > You can also reload the core.> > - Mark _ Chose your Life Partner? Join MSN Matrimony FREE http://in.msn.com/matrimony
Re: TermVectorComponent and SolrJ
On Dec 18, 2008, at 10:06 AM, Aleksander M. Stensby wrote: Hello everyone, I've started to look at TermVectorComponent and I'm experimenting with the use of the component in a sort of "top terms" setting for a given query... Was also looking at mlt and the interestingTerms, but I would like to do a query, get say 10k results, and from those results return a list of "top 10 terms" or something similar... Haven't really thought too much about it yet, but I was wondering if anyone have done any work on making the term vector response available in a simple manner with solrj yet? Or if this is planned? (In the same sense as it is today with facets (response.getFacetFields() etc..). Not that I cant manage to write it myself, but I would recon that more people than me would be interessted in this. I'd be more than happy to contribute if it is wanted, just wanted to check if anyone have started on this already or not. I think this would be a welcome contribution. -Grant
Re: Multi language search help
On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote: Hi, I am prototyping lanuage search using solr 1.3 .I have 3 fields in the schema -id,content and language. I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. I use xpdf to convert the content of pdf to text and push the text to solr in the content field. What is the analyzer that i need to use for the above. By using the default text analyzer and posting this content to solr, i am not getting any results. Does solr support stemming for the above languages. I'm not familiar with Foroyo, but there should be tokenizers/analysis available for Chines and Japanese. Are you putting all three languages into the same field? If that is the case, you will need some type of language detection piece that can choose the correct analyzer. How are your users searching? That is, do you know the language they want to search in? If so, then you can have a field for each language. -Grant
Re: Data Import Request Handler problem: Odd performance behaviour for large number of records
DIH does not maintain any state between two runs. So if there is a perf degradation it could be because - Solr Indexing is taking longer after you do a delete *:* - Your RAM is insufficient (your machine is swapping) On Fri, Dec 19, 2008 at 2:51 AM, Glen Newton wrote: > Hello, > > I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9 > > I am indexing 500k records using the JDBC Data Import Request Handler. > > Config: > Linux openSUSE 10.2 (X86-64) > Dual core dual core 64bit Xeon 3GHz Dell blade 8GB RAM > java version "1.6.0_07" > Java(TM) SE Runtime Environment (build 1.6.0_07-b06) > Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) > 1GB heap for Tomcat > DB: MySql on separate but similar server > > I am finding that the when I do a Full-Import, followed by another > Full-import the import takes much longer the second and subsequent > times: > Run1 = 0:27:31.491 > Run2 = 1:14:44:821 > Run3 = 1:14:48.316 > Run4 = 2:15:12.296 > Run5 = 1:37:6.847 > > (I have run this ~10 times and got roughly the same results). I have > also monitored the load on the Solr machine and the databases machine > for any other activity that might impact. > > The final Lucene index size is 923MB. The default clean = 'true', so > the index is cleared (emptied) each time, so I am concerned the second > run takes 4 times the time of the first run. > > Am I doing something wrong here? Any help would be appreciated. > > I have append my data-config.xml > > thanks, > > Glen > > > url="jdbc:mysql://blue01/dartejos" user="USER" password="PASSWD"/> > > > > > > > > > > > > > > > > > > > -- > > - > -- --Noble Paul
Re: Change in config file (synonym.txt) requires container restart?
Please note that a core reload will also stop Solr from serving any search requests in the time it reloads. On Fri, Dec 19, 2008 at 8:24 AM, Sagar Khetkade wrote: > > But i am using CommonsHttpSolrServer for Solr server configuation as it is > accepts the url. So here how can i reload the core. > > -Sagar> Date: Thu, 18 Dec 2008 07:55:02 -0500> From: markrmil...@gmail.com> > To: solr-user@lucene.apache.org> Subject: Re: Change in config file > (synonym.txt) requires container restart?> > Sagar Khetkade wrote:> > Hi,> > > > > I am using SolrJ client to connect to the Solr 1.3 server and the whole > POC (doing a feasibility study ) reside in Tomcat web server. If any change > I am making in the synonym.txt file to add the synonym in the file to make > it reflect I have to restart the tomcat server. The synonym filter factory > that I am using are in both in analyzers for type index and query in > schema.xml. Please tell me whether this approach is good or any other way to > make the change reflect while searching without restarting of tomcat > server.> > > > Thanks and Regards,> > Sagar Khetkade> > > _> > Chose > your Life Partner? Join MSN Matrimony FREE> > http://in.msn.com/matrimony> > > > You can also reload the core.> > - Mark > _ > Chose your Life Partner? Join MSN Matrimony FREE > http://in.msn.com/matrimony > -- Regards, Shalin Shekhar Mangar.
Re: Precisions on solr.xml about cross context forwarding.
: This bothers me too. I find it really strange that Solr's entry-point : is a servlet filter instead of a servlet. it traces back to the need for it to decide when to handle a request and when to let it pass through (to a later filter, a servlet or a JSP) this is the only way legacy support for the /select and /update urls work without forcing people to modify the web.xml; it's how a handler can be registered with the name /admin/foo even though /admin/ resolves to a JSP (and without forcing people to modify the web.xml); and it's what allows us to use the same core path prefixes for both handler requests and the Admin JSPs. : "It is unnecessary, and potentially problematic, to have the SolrDispatchFilter : configured to also filter on forwards. Do not configure : this dispatcher as FORWARD." : : The problem is that if filters do not have this FORWARD thing, then : cross context forwarding doesn't work. : : Is there a workaround to this problem ? You can try adding the FORWARD option, but the risk is that SolrRequestFilter could wind up forwarding to itself infinitely on some requests (depending on your configuration)... http://www.nabble.com/Re%3A-svn-commit%3A-r640449lucene-solr-trunk-src-webapp-src-org-apache-solr-servlet-SolrDispatchFilter.java-p16262766.html -Hoss
Fwd: Distributed Searching - Limitations?
Hi, I am planning to use Solr's distributed searching for my project. But while going through http://wiki.apache.org/solr/DistributedSearch, i found a few limitations with it. Can anyone please explain the 2nd and 3rd points in the limitations sections on the page. The points are: - When duplicate doc IDs are received, Solr chooses the first doc and discards subsequent ones - No distributed idf Thanks. Regards, Pooja