Lucene Query to Solr query
Hello, One little question: is there any utility that can convert core Lucene query (any type e.q. TermQuery etc) to solr query? It's is really a lot of work for me to rewrite existing code. Thanks, Reza -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.
Re: How to index large set data
On Mon, May 25, 2009 at 10:56 AM, nk 11 wrote: > Hello > Interesting thread. One request please, because I don't have much experience > with solr, could you please use full terms and not DIH, RES etc.? nk11. DIH = DataImportHandler RES=? it is unavoidable that we end up using short names because of laziness/lack of time. But if you ever come across one, do not hesitate to ask.we will be more than glad to clarify. > > Thanks :) > > On Mon, May 25, 2009 at 4:44 AM, Jianbin Dai wrote: > >> >> Hi Paul, >> >> Hope you have a great weekend so far. >> I still have a couple of questions you might help me out: >> >> 1. In your earlier email, you said "if possible , you can setup multiple >> DIH say /dataimport1, /dataimport2 etc and split your files and can achieve >> parallelism" >> I am not sure if I understand it right. I put two requesHandler in >> solrconfig.xml, like this >> >> > class="org.apache.solr.handler..dataimport.DataImportHandler"> >> >> ./data-config.xml >> >> >> >> > class="org.apache.solr.handler.dataimport.DataImportHandler"> >> >> ./data-config2.xml >> >> >> >> >> and create data-config.xml and data-config2.xml. >> then I run the command >> http://host:8080/solr/dataimport?command=full-import >> >> But only one data set (the first one) was indexed. Did I get something >> wrong? >> >> >> 2. I noticed that after solr indexed about 8M documents (around two hours), >> it gets very very slow. I use "top" command in linux, and noticed that RES >> is 1g of memory. I did several experiments, every time RES reaches 1g, the >> indexing process becomes extremely slow. Is this memory limit set by JVM? >> And how can I set the JVM memory when I use DIH through web command >> full-import? >> >> Thanks! >> >> >> JB >> >> >> >> >> --- On Fri, 5/22/09, Noble Paul നോബിള് नोब्ळ् >> wrote: >> >> > From: Noble Paul നോബിള് नोब्ळ् >> > Subject: Re: How to index large set data >> > To: "Jianbin Dai" >> > Date: Friday, May 22, 2009, 10:04 PM >> > On Sat, May 23, 2009 at 10:27 AM, >> > Jianbin Dai >> > wrote: >> > > >> > > Hi Pual, but in your previous post, you said "there is >> > already an issue for writing to Solr in multiple threads >> > SOLR-1089". Do you think use solrj alone would be better >> > than DIH? >> > >> > nope >> > you will have to do indexing in multiple threads >> > >> > if possible , you can setup multiple DIH say /dataimport1, >> > /dataimport2 etc and split your files and can achieve >> > parallelism >> > >> > >> > > Thanks and have a good weekend! >> > > >> > > --- On Fri, 5/22/09, Noble Paul നോബിള് >> > नोब्ळ् >> > wrote: >> > > >> > >> no need to use embedded Solrserver.. >> > >> you can use SolrJ with streaming >> > >> in multiple threads >> > >> >> > >> On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai >> > >> > >> wrote: >> > >> > >> > >> > If I do the xml parsing by myself and use >> > embedded >> > >> client to do the push, would it be more efficient >> > than DIH? >> > >> > >> > >> > >> > >> > --- On Fri, 5/22/09, Grant Ingersoll >> > >> wrote: >> > >> > >> > >> >> From: Grant Ingersoll >> > >> >> Subject: Re: How to index large set data >> > >> >> To: solr-user@lucene.apache.org >> > >> >> Date: Friday, May 22, 2009, 5:38 AM >> > >> >> Can you parallelize this? I >> > >> >> don't know that the DIH can handle it, >> > >> >> but having multiple threads sending docs >> > to Solr >> > >> is the >> > >> >> best >> > >> >> performance wise, so maybe you need to >> > look at >> > >> alternatives >> > >> >> to pulling >> > >> >> with DIH and instead use a client to push >> > into >> > >> Solr. >> > >> >> >> > >> >> >> > >> >> On May 22, 2009, at 3:42 AM, Jianbin Dai >> > wrote: >> > >> >> >> > >> >> > >> > >> >> > about 2.8 m total docs were created. >> > only the >> > >> first >> > >> >> run finishes. In >> > >> >> > my 2nd try, it hangs there forever >> > at the end >> > >> of >> > >> >> indexing, (I guess >> > >> >> > right before commit), with cpu usage >> > of 100%. >> > >> Total 5G >> > >> >> (2050) index >> > >> >> > files are created. Now I have two >> > problems: >> > >> >> > 1. why it hangs there and failed? >> > >> >> > 2. how can i speed up the indexing? >> > >> >> > >> > >> >> > >> > >> >> > Here is my solrconfig.xml >> > >> >> > >> > >> >> > >> > >> >> >> > >> >> > false >> > >> >> > >> > >> >> >> > >> >> > 3000 >> > >> >> > >> > >> >> >> > 1000 >> > >> >> > >> > >> >> >> > >> >> > 2147483647 >> > >> >> > >> > >> >> >> > >> >> > 1 >> > >> >> > >> > >> >> >> > >> >> > false >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > --- On Thu, 5/21/09, Noble Paul >> > >> >> നോബിള് नो >> > >> >> > ब्ळ् >> > >> >> wrote: >> > >> >> > >> > >> >> >> From: Noble Paul >> > നോബിള് >> > >> >> नोब्ळ् >> > >> >> >> >> > >> >> >> Subject: Re: How to index large >> > set data >> > >> >> >> To: solr-user@lucene.apache.org >> > >> >> >> Date: Thursday, May 21, 2009, >> > 10:39 PM >> > >> >> >> what is the total no:of docs >> > created >>
Re: Lucene Query to Solr query
If you use SolrJ client to perform searches, does this not work for you? SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(*myLuceneQuery.toString()*); QueryResponse response = mySolrServer.query(solrQuery); Cheers Avlesh On Mon, May 25, 2009 at 12:39 PM, Reza Safari wrote: > Hello, > > One little question: is there any utility that can convert core Lucene > query (any type e.q. TermQuery etc) to solr query? It's is really a lot of > work for me to rewrite existing code. > > Thanks, > Reza > > -- > Reza Safari > LUKKIEN > Copernicuslaan 15 > 6716 BM Ede > > The Netherlands > - > http://www.lukkien.com > t: +31 (0) 318 698000 > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the email by you is prohibited. > > > > > > > > > > > > > > >
Sending Mlt POST request
Hello, I wish to send an Mlt request to Solr and filter the result by a list of values to specific field. The problem is sometimes the list can include thousands of values and it's impossible to send such GET request. Sending this request as POST didn't work well... Is POST supported by mlt? If not, is there suppose to be added in one of the next versions? Or is there a different solution maybe? I will appreciate any help and advice, Thanks, Ohad.
Is it memory leaking in solr?
I am using DIH to do indexing. After I indexed about 8M documents (took about 1hr40m), it used up almost all memory (4GB), and the indexing becomes extremely slow. If I delete all indexing and shutdown tomcat, it still shows over 3gb memory was used. Is it memory leaking? if it is, then the leaking is in solr indexing or DIH? Thanks.
RE: Boolean query in Solr
Hi Erik, This mail just got into my junk so left unread. Well after I made debug query on., and firing the same query I am getting some different vibs from it. Suppose the query is Content: xyz AND Ticket_Id: (123 OR 1234) and here the search query i.e “xyz” was in the stop word list even then the search results were retrieved. Expected was none of the document be retrieved. Here I am getting the documents where “xyz” is not at all there. The debug query was q=Content:xyz+AND+Ticket_Id:4 When I used SolrAdmin I got that the parsed query from solr was omitting the search field (Content) as it is marked as stop word but was taking the Ticket_Id and giving me the results. My question is whether it is possible in Solr to fire the query where I want to retrieve the document having the search field(Content) only in the selected Ticket_Id. Thanks in advance. ~ Sagar > From: e...@ehatchersolutions.com > To: solr-user@lucene.apache.org > Subject: Re: Boolean query in Solr > Date: Tue, 14 Apr 2009 09:33:27 -0400 > > > On Apr 14, 2009, at 5:38 AM, Sagar Khetkade wrote: > > > > > Hi, > > I am using SolrJ and firing the query on Solr indexes. The indexed > > contains three fields viz. > > 1. Document_id (type=integer required= true) > > 2. Ticket Id (type= integer) > > 3. Content (type=text) > > > > Here the query formulation is such that I am having query with “AND” > > clause. So the query, that I am firing on index files look like > > “Content: search query AND Ticket_id:123 Ticket_Id:789)”. > > That query is invalid query parser syntax, with an unopen paren first > of all. I assume that's a typo though. Be careful in how you > construct queries with field selectors. Saying: > > Content:search query > > does NOT necessarily mean that the term "query" is being searched in > the Content field, as that depends on your default field setting for > the query parser. This, however, does use the Content field for both > terms: > > Content:(search query) > > > > I know this type of query is easily fired on lucene indexes. But > > when I am firing the above query I am not getting the required > > result . The result contains the document which does not belongs to > > the ticket id mentioned in the query. > > Please can anyone help me out of this issue. > > What does the query parse to with &debugQuery output? That's mighty > informative info. > > Erik > _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/india/windows/windowslive/
Recover crashed solr index
Hi everyone, I have 8m docs to index, and each doc is around 50kb. The solr crashed in the middle of indexing. error message said that one of the file in the data directory is missing. I don't know why this is happened. So right now I have to find a way to recover the index to avoid re-index. Is there anyone know any tools or method to recover the crashed index? Please help. Thanks a lot. Regards GC
Re: Lucene Query to Solr query
Hmmm, overriding toString() can make wonders. I will try as you suggested. Thanx for quick reply. Gr, Reza On May 25, 2009, at 9:34 AM, Avlesh Singh wrote: If you use SolrJ client to perform searches, does this not work for you? SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(*myLuceneQuery.toString()*); QueryResponse response = mySolrServer.query(solrQuery); Cheers Avlesh On Mon, May 25, 2009 at 12:39 PM, Reza Safari wrote: Hello, One little question: is there any utility that can convert core Lucene query (any type e.q. TermQuery etc) to solr query? It's is really a lot of work for me to rewrite existing code. Thanks, Reza -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited. -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.
R: Filtering query terms
Hi, I tested the new filters' configuration and it works fine. The problem about ISOLatin1AccentFilterFactory was not due to Solr, but to a core-dependent configuration in a Solr multi-core environment. It was only necessary to set to 0 the property 'splitOnCaseChange' in solr.WordDelimiterFilterFactory. Thanks for your support, Marco Marco Branca Consultant Sytel Reply S.r.l. Via Ripamonti, 89 - 20139 Milano Mobile: (+39) 348 2298186 e-mail: m.bra...@reply.it Website: www.reply.eu Da: Ensdorf Ken [ensd...@zoominfo.com] Inviato: venerdì 22 maggio 2009 18.16 A: 'solr-user@lucene.apache.org' Oggetto: RE: Filtering query terms > When I try testing the filter "solr.LowerCaseFilterFactory" I get > different results calling the following urls: > > 1. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > 2. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on In this case, the WordDelimiterFilterFactory is kicking in on your second search, so "APaPa" is split into "APa" and "Pa". You can double-check this by using the analysis tool in the admin UI - http://localhost:8983/solr/admin/analysis.jsp > > Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I > get different results calling the following urls: > > 1. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on > 2. http://[server-ip]:[server-port]/solr/[core- > name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on Not sure what it happening here, but again I would check it with the analysi tool -- The information transmitted is intended for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
Re: Lucene Query to Solr query
You missed the point, Reza. toString *has to be implemented* by all Queryobjects in Lucene. All you have to do is to compose the right Lucene query matching your needs (all combinations of TermQueries, BooleanQueries, RangeQueries etc ..) and just do a luceneQuery.toString() when performing a Solr query. Thinking aloud, does it make sense for the SolrQuery object to take a Lucene Query object? I am suggesting something like this - SolrQuery.setQuery(org.apache.lucene.search.Query luceneQuery) Cheers Avlesh On Mon, May 25, 2009 at 2:32 PM, Reza Safari wrote: > Hmmm, overriding toString() can make wonders. I will try as you suggested. > Thanx for quick reply. > > Gr, Reza > > > On May 25, 2009, at 9:34 AM, Avlesh Singh wrote: > > If you use SolrJ client to perform searches, does this not work for you? >> >> SolrQuery solrQuery = new SolrQuery(); >> solrQuery.setQuery(*myLuceneQuery.toString()*); >> QueryResponse response = mySolrServer.query(solrQuery); >> >> Cheers >> Avlesh >> >> On Mon, May 25, 2009 at 12:39 PM, Reza Safari >> wrote: >> >> Hello, >>> >>> One little question: is there any utility that can convert core Lucene >>> query (any type e.q. TermQuery etc) to solr query? It's is really a lot >>> of >>> work for me to rewrite existing code. >>> >>> Thanks, >>> Reza >>> >>> -- >>> Reza Safari >>> LUKKIEN >>> Copernicuslaan 15 >>> 6716 BM Ede >>> >>> The Netherlands >>> - >>> http://www.lukkien.com >>> t: +31 (0) 318 698000 >>> >>> This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise private information. If you have >>> received it in error, please notify the sender immediately and delete the >>> original. Any other use of the email by you is prohibited. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > -- > Reza Safari > LUKKIEN > Copernicuslaan 15 > 6716 BM Ede > > The Netherlands > - > http://www.lukkien.com > t: +31 (0) 318 698000 > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the email by you is prohibited. > > > > > > > > > > > > > > >
RE: Solr statistics of top searches and results returned
Hi all, I created a script that uses a Solr Search Component, which hooks into the main solr core and catches the searches being done. After this it tokenizes the search and send both the tokenized as well as the original query to another Solr core. I have not written a factory for this, but if required, it shouldn't be so hard to modify the script and code Database support into it. You can find the source here: http://www.ipros.nl/uploads/Stats-component.zip It includes a README, and a schema.xml that should be used. Please let me know you're thoughts. Best, Patrick -Original Message- From: Umar Shah [mailto:u...@wisdomtap.com] Sent: vrijdag 22 mei 2009 10:03 To: solr-user@lucene.apache.org Subject: Re: Solr statistics of top searches and results returned Hi, good feature to have, maintaining top N would also require storing all the search queries done so far and keep updating (or atleast in some time window). having pluggable persistent storage for all time search queries would be great. tell me how can I help? -umar On Fri, May 22, 2009 at 12:21 PM, Shalin Shekhar Mangar wrote: > On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll wrote: > >> >> I think you will want some type of persistence mechanism otherwise >> you will end up consuming a lot of resources keeping track of all the >> query strings, unless I'm missing something. Either a Lucene index >> (Solr core) or the option of embedding a DB. Ideally, it would be >> pluggable such that people could choose their storage mechanism. >> Most people do this kind of thing offline via log analysis as logs can grow >> quite large quite quickly. >> > > For a general case, yes. But I was thinking more of a top 'n' queries > as a running statistic. > > -- > Regards, > Shalin Shekhar Mangar. >
Index size concerns
Salaam, We are using apache-solr to index our files for faster searches, all things happen without a problem, my only concern is the size of the cache. It seems that the trend is that the if I cache 1 GB of files the index goes to 800MB ie we are seeing a 80% cache size. Is this normal or am I missing something in the configuration of solr Thanks and regards, Muhammed Sameer
Re: Getting 404 for MoreLikeThis handler
jlist9 wrote: Thanks. Will that still be the MoreLikeThisRequestHandler? Or the StandardRequestHandler with mlt option? Yes, StandardRequestHandler. MoreLikeThisComponent is available by default. Set mlt=on when you want to get MLT results. Koji
Re: Plugin Not Found
hi jeff , look at these lines in the log May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to '/home/zetasolr/' May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr classloader May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to '/home/zetasolr/cores/zeta-main/' May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader createClassLoader INFO: Reusing parent classloader this means that Solr is just using the webapp class loader instead of its own . which version of Solr are you using? is it possible for you to apply this patch and start if you get a different error mesage? -- On Fri, May 22, 2009 at 8:15 PM, Jeff Newburn wrote: > I have included the configuration and the log for the error on startup. I > does appear it tries to load the lib but then simply can't referene it. > > default="true" > > > explicit > 0.01 > > productId^10.0 > > personality^15.0 > subCategory^20.0 > category^10.0 > productType^8.0 > > brandName^10.0 > realBrandName^9.5 > productNameSearch^20 > > size^1.2 > width^1.0 > heelHeight^1.0 > > productDescription^5.0 > color^6.0 > price^1.0 > > expandedGender^0.5 > > > brandName^5.0 productNameSearch^5.0 productDescription^5.0 > personality^10.0 subCategory^20.0 category^10.0 productType^8.0 > > > productId, productName, price, originalPrice, > brandNameFacet, productRating, imageUrl, productUrl, isNew, onSale > > rord(popularity)^1 > 100% > 1 > 5 > *:* > > > name="mlt.fl">brandNameFacet,productTypeFacet,productName,categoryFacet,subC > ategoryFacet,personalityFacet,colorFacet,heelHeight,expandedGender > 1 > 1 > > > spellcheck > facetcube > > > > > class="com.zappos.solr.FacetCubeComponent"/> > > > LOGS > May 22, 2009 7:38:24 AM org.apache.catalina.startup.SetAllPropertiesRule > begin > WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property > 'maxProcessors' to '500' did not find a matching property. > May 22, 2009 7:38:24 AM org.apache.catalina.startup.SetAllPropertiesRule > begin > WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property > 'maxProcessors' to '500' did not find a matching property. > May 22, 2009 7:38:24 AM org.apache.catalina.core.AprLifecycleListener init > INFO: The APR based Apache Tomcat Native library which allows optimal > performance in production environments was not found on the > java.library.path: /usr/local/apr/lib > May 22, 2009 7:38:24 AM org.apache.tomcat.util.net.NioSelectorPool > getSharedSelector > INFO: Using a shared selector for servlet write/read > May 22, 2009 7:38:24 AM org.apache.coyote.http11.Http11NioProtocol init > INFO: Initializing Coyote HTTP/1.1 on http-8080 > May 22, 2009 7:38:24 AM org.apache.tomcat.util.net.NioSelectorPool > getSharedSelector > INFO: Using a shared selector for servlet write/read > May 22, 2009 7:38:24 AM org.apache.coyote.http11.Http11NioProtocol init > INFO: Initializing Coyote HTTP/1.1 on http-8443 > May 22, 2009 7:38:24 AM org.apache.catalina.startup.Catalina load > INFO: Initialization processed in 1011 ms > May 22, 2009 7:38:24 AM org.apache.catalina.core.StandardService start > INFO: Starting service Catalina > May 22, 2009 7:38:24 AM org.apache.catalina.core.StandardEngine start > INFO: Starting Servlet Engine: Apache Tomcat/6.0.16 > May 22, 2009 7:38:24 AM org.apache.catalina.startup.HostConfig deployWAR > INFO: Deploying web application archive solr.war > May 22, 2009 7:38:25 AM org.apache.solr.servlet.SolrDispatchFilter init > INFO: SolrDispatchFilter.init() > May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader > locateInstanceDir > INFO: No /solr/home in JNDI > May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader > locateInstanceDir > INFO: using system property solr.solr.home: /home/zetasolr > May 22, 2009 7:38:25 AM org.apache.solr.core.CoreContainer$Initializer > initialize > INFO: looking for solr.xml: /home/zetasolr/solr.xml > May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader > INFO: Solr home set to '/home/zetasolr/' > May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader > createClassLoader > INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr > classloader > May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader > INFO: Solr home set to '/home/zetasolr/cores/zeta-main/' > May 22, 2009 7:38:25 AM
Re: Getting 404 for MoreLikeThis handler
That's the standard request handler. You have to create a mapping in solrconfig.xml to the MoreLikeThisHandler (not MoreLikeThis*Request*Handler) in order to use that. It is not mapped in the default example config (at least on trunk). Erik On May 24, 2009, at 11:08 PM, jlist9 wrote: Thanks. Will that still be the MoreLikeThisRequestHandler? Or the StandardRequestHandler with mlt option? Hi, I'm trying out the mlt handler but I'm getting a 404 error. HTTP Status 404 - /solr/mlt solrconfig.xml seem to say that mlt handler is available by default. i wonder if there's anything else I should do before I can use it? I'm using version 1.3. Try /solr/select with mlt=on parameter. Koji
Re: Lucene Query to Solr query
Warning: toString on a Query object is *NOT* guaranteed to be parsable back into the same Query. Don't use Query.toString() in this manner. What you probably want to do is create your own QParserPlugin for Solr that creates the Query however you need from textual parameters from the client. Erik On May 25, 2009, at 5:16 AM, Avlesh Singh wrote: You missed the point, Reza. toString *has to be implemented* by all Queryobjects in Lucene. All you have to do is to compose the right Lucene query matching your needs (all combinations of TermQueries, BooleanQueries, RangeQueries etc ..) and just do a luceneQuery.toString() when performing a Solr query. Thinking aloud, does it make sense for the SolrQuery object to take a Lucene Query object? I am suggesting something like this - SolrQuery.setQuery(org.apache.lucene.search.Query luceneQuery) Cheers Avlesh On Mon, May 25, 2009 at 2:32 PM, Reza Safari wrote: Hmmm, overriding toString() can make wonders. I will try as you suggested. Thanx for quick reply. Gr, Reza On May 25, 2009, at 9:34 AM, Avlesh Singh wrote: If you use SolrJ client to perform searches, does this not work for you? SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(*myLuceneQuery.toString()*); QueryResponse response = mySolrServer.query(solrQuery); Cheers Avlesh On Mon, May 25, 2009 at 12:39 PM, Reza Safari wrote: Hello, One little question: is there any utility that can convert core Lucene query (any type e.q. TermQuery etc) to solr query? It's is really a lot of work for me to rewrite existing code. Thanks, Reza -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited. -- Reza Safari LUKKIEN Copernicuslaan 15 6716 BM Ede The Netherlands - http://www.lukkien.com t: +31 (0) 318 698000 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.
exceptions when using existing index with latest build
Building Solr last night from updated svn, I'm now getting the exception below when I use any fq parameter searching a pre-existing index. So far, I cannot fix it by tweak config files, but I had to delete and re-index. I note that Solr was recently updated to the latest lucene build, so maybe something broke in the index format? here's the relevant part of the trace: org.apache.lucene.index.ReadOnlySegmentReader cannot be cast to org.apache.solr.search.SolrIndexReader java.lang.ClassCastException: org.apache.lucene.index.ReadOnlySegmentReader cannot be cast to org.apache.solr.search.SolrIndexReader at org.apache.solr.search.SortedIntDocSet$2.getDocIdSet(SortedIntDocSet.java:530) at org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:237) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:221) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:150) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1032) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:894) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:337) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: exceptions when using existing index with latest build
Peter - I posted this to the solr-dev list this morning also. The thread to follow is over there. Erik On May 25, 2009, at 9:05 AM, Peter Wolanin wrote: Building Solr last night from updated svn, I'm now getting the exception below when I use any fq parameter searching a pre-existing index. So far, I cannot fix it by tweak config files, but I had to delete and re-index. I note that Solr was recently updated to the latest lucene build, so maybe something broke in the index format? here's the relevant part of the trace: org.apache.lucene.index.ReadOnlySegmentReader cannot be cast to org.apache.solr.search.SolrIndexReader java.lang.ClassCastException: org.apache.lucene.index.ReadOnlySegmentReader cannot be cast to org.apache.solr.search.SolrIndexReader at org.apache.solr.search.SortedIntDocSet $2.getDocIdSet(SortedIntDocSet.java:530) at org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java: 237) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:221) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212) at org.apache.lucene.search.Searcher.search(Searcher.java:150) at org .apache .solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java: 1032) at org .apache .solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:894) at org .apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java: 337) at org .apache .solr.handler.component.QueryComponent.process(QueryComponent.java: 176) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328) -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: More questions about MoreLikeThis
jlist9 wrote: The wiki page (http://wiki.apache.org/solr/MoreLikeThis) says: mlt.fl: The fields to use for similarity. NOTE: if possible, these should have a stored TermVector I didn't set TermVector to true MoreLikeThis with StandardRequestHandler seems to work fine. The first question is, is TermVector only for performance optimization? I think yes. The second question is, afterI changed the mlt.fl fields from both indexed and stored to indexed only, I started to get zero results back. Do mlt.fl fields always need to be stored? Thanks MLT uses termVector if it exists for the field. If termVector is not available, MLT tries to get stored field data. If stored field is not available, MLT does nothing for the field as you were seeing. So mlt.fl fields don't always need to be stored. Koji
problem with Solrits (schow only some of the fields)
Hallo... I have a Problem... I will use Solitas for solr.. But i have a Problem... At the moment, Solitas present All fields in the Results, but i musst change it and present only some, Like id, name, cat and inStock (exampledocuments) i think that is the code to post all fields.. #foreach($fieldname in $doc.fieldNames) $fieldname : #foreach($value in $doc.getFieldValues($fieldname)) $value #end #end #if($params.getBool("debugQuery",false)) toggle explain $response.getExplainMap().get($doc.getFirstValue('id')) #end Maby someone can explain me how i can change the code to get some fields Jörg Agatz
Re: Index size concerns
On Mon, May 25, 2009 at 3:53 PM, Muhammed Sameer wrote: > > We are using apache-solr to index our files for faster searches, all things > happen without a problem, my only concern is the size of the cache. > > It seems that the trend is that the if I cache 1 GB of files the index goes > to 800MB ie we are seeing a 80% cache size. > > Is this normal or am I missing something in the configuration of solr > I'm sorry I do not understand your question. Which files are you talking about? The Solr cache has got nothing to do with files. It caches the query/filter results and solr documents. -- Regards, Shalin Shekhar Mangar.
Re: Lucene Query to Solr query
Point taken, Erik. But, is there really a downside towards using Query.toString() if someone is not using any of the complex Query Subclasses (like a SpanQuery)? Cheers Avlesh On Mon, May 25, 2009 at 5:38 PM, Erik Hatcher wrote: > Warning: toString on a Query object is *NOT* guaranteed to be parsable back > into the same Query. Don't use Query.toString() in this manner. > > What you probably want to do is create your own QParserPlugin for Solr that > creates the Query however you need from textual parameters from the client. > >Erik > > > On May 25, 2009, at 5:16 AM, Avlesh Singh wrote: > > You missed the point, Reza. toString *has to be implemented* by all >> Queryobjects in Lucene. All you have to do is to compose the right >> >> Lucene query >> matching your needs (all combinations of TermQueries, BooleanQueries, >> RangeQueries etc ..) and just do a luceneQuery.toString() when performing >> a >> Solr query. >> >> Thinking aloud, does it make sense for the SolrQuery object to take a >> Lucene >> Query object? >> I am suggesting something like this - >> SolrQuery.setQuery(org.apache.lucene.search.Query >> luceneQuery) >> >> Cheers >> Avlesh >> >> On Mon, May 25, 2009 at 2:32 PM, Reza Safari >> wrote: >> >> Hmmm, overriding toString() can make wonders. I will try as you >>> suggested. >>> Thanx for quick reply. >>> >>> Gr, Reza >>> >>> >>> On May 25, 2009, at 9:34 AM, Avlesh Singh wrote: >>> >>> If you use SolrJ client to perform searches, does this not work for you? >>> SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(*myLuceneQuery.toString()*); QueryResponse response = mySolrServer.query(solrQuery); Cheers Avlesh On Mon, May 25, 2009 at 12:39 PM, Reza Safari wrote: Hello, > > One little question: is there any utility that can convert core Lucene > query (any type e.q. TermQuery etc) to solr query? It's is really a lot > of > work for me to rewrite existing code. > > Thanks, > Reza > > -- > Reza Safari > LUKKIEN > Copernicuslaan 15 > 6716 BM Ede > > The Netherlands > - > http://www.lukkien.com > t: +31 (0) 318 698000 > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. If you have > received it in error, please notify the sender immediately and delete > the > original. Any other use of the email by you is prohibited. > > > > > > > > > > > > > > > > > >>> -- >>> Reza Safari >>> LUKKIEN >>> Copernicuslaan 15 >>> 6716 BM Ede >>> >>> The Netherlands >>> - >>> http://www.lukkien.com >>> t: +31 (0) 318 698000 >>> >>> This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise private information. If you have >>> received it in error, please notify the sender immediately and delete the >>> original. Any other use of the email by you is prohibited. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >
Re: Lucene Query to Solr query
On Mon, May 25, 2009 at 9:16 PM, Avlesh Singh wrote: > Point taken, Erik. But, is there really a downside towards using > Query.toString() if someone is not using any of the complex Query > Subclasses > (like a SpanQuery)? > Well, you will be relying on undocumented behavior that might change in future releases. Also, most (none?) Query objects do not have a parseable toString representation so it may not even work at all. -- Regards, Shalin Shekhar Mangar.
Re: Lucene Query to Solr query
> > Also, most (none?) Query objects do not have a parseable toString > representation so it may not even work at all. > IMO, this behavior is limited to the Subclasses of SpanQuery. Anyways, I understand the general notion here. Cheers Avlesh On Mon, May 25, 2009 at 9:30 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, May 25, 2009 at 9:16 PM, Avlesh Singh wrote: > > > Point taken, Erik. But, is there really a downside towards using > > Query.toString() if someone is not using any of the complex Query > > Subclasses > > (like a SpanQuery)? > > > > Well, you will be relying on undocumented behavior that might change in > future releases. > > Also, most (none?) Query objects do not have a parseable toString > representation so it may not even work at all. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Recover crashed solr index
you can use the lucene jar with solr to invoke the CheckIndex method - this will possibly allow you to recover if you pass the -fix param. You may lose some docs, however, so this is only viable if you can, for example, query to check what's missing. The command looks like (from the root of the solr svn checkout): java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex [path to index directory] For example, to check the example index: java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex example/solr/data/index/ -Peter On Mon, May 25, 2009 at 4:42 AM, Wang Guangchen wrote: > Hi everyone, > > I have 8m docs to index, and each doc is around 50kb. The solr crashed in > the middle of indexing. error message said that one of the file in the data > directory is missing. I don't know why this is happened. > > So right now I have to find a way to recover the index to avoid re-index. Is > there anyone know any tools or method to recover the crashed index? Please > help. > > Thanks a lot. > > Regards > GC > -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Is it memory leaking in solr?
Again, indexing becomes extremely slow after indexed 8m documents (about 25G of original file size). Here is the memory usage info of my computer. Does this have anything to do with tomcat setting? Thanks. top - 08:09:53 up 7:22, 1 user, load average: 1.03, 1.01, 1.00 Tasks: 78 total, 2 running, 76 sleeping, 0 stopped, 0 zombie Cpu(s): 49.9%us, 0.2%sy, 0.0%ni, 49.8%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4044776k total, 3960740k used,84036k free,42196k buffers Swap: 2031608k total, 84k used, 2031524k free, 2729892k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 3322 root 21 0 1357m 1.0g 11m S 100 27.0 397:51.74 java --- On Mon, 5/25/09, Jianbin Dai wrote: > From: Jianbin Dai > Subject: Is it memory leaking in solr? > To: solr-user@lucene.apache.org, noble.p...@gmail.com > Date: Monday, May 25, 2009, 1:27 AM > > I am using DIH to do indexing. After I indexed about 8M > documents (took about 1hr40m), it used up almost all memory > (4GB), and the indexing becomes extremely slow. If I delete > all indexing and shutdown tomcat, it still shows over 3gb > memory was used. Is it memory leaking? if it is, then the > leaking is in solr indexing or DIH? Thanks. > > > > >
Re: Index size concerns
Salaam, Sorry for this here is the big picture Actually we use solr to index all the mails that come to us so that we can allow for faster look ups. We have seen that after our mail server accepts say a GB of mails the index size goes upto 800MB I hope that this time I am clear in conveying the problem What I wanted to know is that is this index size normal ? Regards, Muhammed Sameer --- On Mon, 5/25/09, Shalin Shekhar Mangar wrote: > From: Shalin Shekhar Mangar > Subject: Re: Index size concerns > To: solr-user@lucene.apache.org > Date: Monday, May 25, 2009, 11:19 AM > On Mon, May 25, 2009 at 3:53 PM, > Muhammed Sameer wrote: > > > > > We are using apache-solr to index our files for faster > searches, all things > > happen without a problem, my only concern is the size > of the cache. > > > > It seems that the trend is that the if I cache 1 GB of > files the index goes > > to 800MB ie we are seeing a 80% cache size. > > > > Is this normal or am I missing something in the > configuration of solr > > > > I'm sorry I do not understand your question. Which files > are you talking > about? The Solr cache has got nothing to do with files. It > caches the > query/filter results and solr documents. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: grouping response docs together
Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Matt - you may also want to detect near duplicates at index time: > > http://wiki.apache.org/solr/Deduplication > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Matt Mitchell > > To: solr-user@lucene.apache.org > > Sent: Friday, May 15, 2009 6:52:48 PM > > Subject: grouping response docs together > > > > Is there a built-in mechanism for grouping similar documents together in > the > > response? I'd like to make it look like there is only one document with > > multiple "hits". > > > > Matt > >
Re: grouping response docs together
Hello Matt, the patch should work with trunk and after a small fix with 1.3 too (see my comment in SOLR-236). I just made a successful build to be sure. Do you see any error messages? Thomas Matt Mitchell schrieb: Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple "hits". Matt
issues with shards
hello, im using solr 1.3, and having some problems when i search with the shards parameters, for example: *shards=localhost:9090/isearch* *im using 9090 as the default port i get this error: NFO: Filter queries (object): [null] > 25/05/2009 17:06:33 org.apache.solr.core.SolrCore execute > INFO: webapp=null path=null > params={facet.zeros=false&facet=true&hl.autofield.excluderegex=.*(?:_blob)$&facet.limit=200&hl.simple.pre=&hl.autofields=true&ling=none&hl=true&fl=id,score&allcats=0&hl.autofield.regex=^(?:(?:show)|(?:ctrl))##.%2B&hl.simple.post=&hl.merge=false&fsv=true&fq=*:*&hl.fragsize=100&hl.fl=&IDENTIFICADOR_ACESSO=&wt=javabin&rows=10&hl.snippets=3&start=0&q=content:(cesar)&idAcl=&hl.notags=true&isShard=true} > hits=46 status=0 QTime=297 > 25/05/2009 17:06:33 org.apache.solr.common.SolrException log > SEVERE: java.lang.RuntimeException: This is a binary writer , Cannot write > to a characterstream > at > org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.java:48) > at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:89) > at org.apache.solr.servlet.SolrServlet.doPost(SolrServlet.java:65) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:637) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:433) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > > 25/05/2009 17:06:33 org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Internal Server Error > > Internal Server Error > > request: http://localhost:9090/isearch/select ive been searching for this error for some days (*SEVERE: java.lang.RuntimeException: This is a binary writer , Cannot write to a characterstream*) without much luck, and have no idea how to fix it :( any help is apreciated :) im posting the full log below if anyone is interested, thanks > 25/05/2009 17:06:24 com.aileader.isearch.ui.InitISearchServlet init > INFO: InitISearchServlet.init() > 25/05/2009 17:06:24 org.apache.solr.core.SolrResourceLoader > locateInstanceDir > INFO: No /solr/home in JNDI > 25/05/2009 17:06:24 org.apache.solr.core.SolrResourceLoader > locateInstanceDir > INFO: solr home defaulted to 'solr/' (could not find system property or > JNDI) > 25/05/2009 17:06:24 org.apache.solr.core.SolrResourceLoader > INFO: Solr home set to 'solr/' > 25/05/2009 17:06:24 org.apache.solr.core.SolrResourceLoader > createClassLoader > INFO: Reusing parent classloader > 25/05/2009 17:06:24 org.apache.solr.core.SolrConfig > INFO: Loaded SolrConfig: solrconfig.xml > 25/05/2009 17:06:24 org.apache.solr.core.SolrCore > INFO: Opening new SolrCore at solr/, dataDir=./solr/data/ > 25/05/2009 17:06:24 org.apache.solr.schema.IndexSchema readSchema > INFO: Reading Solr Schema > 25/05/2009 17:06:24 org.apache.solr.schema.IndexSchema readSchema > INFO: Schema name=default > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created string: org.apache.solr.schema.StrField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created boolean: org.apache.solr.schema.BoolField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created integer: org.apache.solr.schema.IntField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created long: org.apache.solr.schema.LongField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created float: org.apache.solr.schema.FloatField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created double: org.apache.solr.schema.DoubleField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: created sint: org.apache.solr.schema.SortableIntField > 25/05/2009 17:06:24 org.apache.solr.util.plugin.AbstractPluginLoader load > INFO: creat
Re: highlighting performance
Thanks Otis. I added termVector="true" for those fields, but there isn't a noticeable difference. So, just to be a little more clear, the dynamic fields I'm adding... there might be hundreds. Do you see this as a problem? Thanks, Matt On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Matt, > > I believe indexing those fields that you will use for highlighting with > term vectors enabled will make things faster (and your index a bit bigger). > > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Matt Mitchell > > To: solr-user@lucene.apache.org > > Sent: Friday, May 15, 2009 5:08:23 PM > > Subject: highlighting performance > > > > Hi, > > > > I'm experimenting with highlighting and am noticing a big drop in > > performance with my setup. I have documents that use quite a few dynamic > > fields (20-30). The fields are multiValued stored/indexed text fields, > each > > with a few paragraphs worth of text. My hl.fl param is set to *_t > > > > What kinds of things can I tweak to make this faster? Is it because I'm > > highlighting so many different fields? > > > > Thanks, > > Matt > >
Re: Lucene Query to Solr query
On Mon, May 25, 2009 at 3:09 AM, Reza Safari wrote: > One little question: is there any utility that can convert core Lucene query > (any type e.q. TermQuery etc) to solr query? It's is really a lot of work > for me to rewrite existing code. Solr internal APIs take Lucene query types. I guess perhaps you mean transforming a Lucene query into a parameter for the external HTTP API? new TermQuery(new Term("foo","bar")) would be transformed to q=foo:bar -Yonik http://www.lucidimagination.com
Re: grouping response docs together
Hi Thomas, In a 5-24-09 nightly build, I applied the patch: cd apache-solr-nightly patch -p0 < ~/Projects/apache-solr-patches/SOLR-236_collapsing.patch patching file src/common/org/apache/solr/common/params/CollapseParams.java patching file src/java/org/apache/solr/handler/component/CollapseComponent.java patching file src/java/org/apache/solr/search/CollapseFilter.java patching file src/java/org/apache/solr/search/NegatedDocSet.java patching file src/java/org/apache/solr/search/SolrIndexSearcher.java Hunk #1 succeeded at 1444 (offset -39 lines). patching file src/test/org/apache/solr/search/TestDocSet.java Hunk #1 succeeded at 134 (offset 42 lines). ... and got this when running "ant dist" docs: [mkdir] Created dir: /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/dist/doc [java] Exception in thread "main" java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main [java] at JsRun.main(Unknown Source) BUILD FAILED /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:338: The following error occurred while executing this line: /Users/mwm4n/Downloads/apache-solr-nightly/common-build.xml:215: The following error occurred while executing this line: /Users/mwm4n/Downloads/apache-solr-nightly/contrib/javascript/build.xml:74: Java returned: 1 Not sure what any of that means, but the "ant dist" task worked fine before the patch. Any ideas? Thanks, Matt On Mon, May 25, 2009 at 3:59 PM, Thomas Traeger wrote: > Hello Matt, > > the patch should work with trunk and after a small fix with 1.3 too (see > my comment in SOLR-236). I just made a successful build to be sure. > > Do you see any error messages? > > Thomas > > Matt Mitchell schrieb: > > Thanks guys. I looked at the dedup stuff, but the documents I'm adding >> aren't really duplicates. They're very similar, but different. >> >> I checked out the field collapsing feature patch, applied the patch but >> can't get it to build successfully. Will this patch work with a nightly >> build? >> >> Thanks! >> >> On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic < >> otis_gospodne...@yahoo.com> wrote: >> >> Matt - you may also want to detect near duplicates at index time: >>> >>> http://wiki.apache.org/solr/Deduplication >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message >>> From: Matt Mitchell To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in >>> the >>> response? I'd like to make it look like there is only one document with multiple "hits". Matt >>> >>> >> >
Shuffling results
Hi I'm responsible for the search engine at yaymicro.com. yaymicro.com is a microstock agency (sells images). We are using the excellent solr search engine, but I have a problem with series of similar images showing up. I'll try to explain: A search for dog for example http://yaymicro.com/search.action?search.search=dog&x=0&y=0&search.first=true very often results in variations of the images of the same motive close to each other. This is logical, but unwanted behaviour, since we would love to show our customers more variations in search result. So my question is quite simple, is there a way to configure solr to put some "randomness" in the search result? To shuffle the result, not completly, but "a bit" to avoid such series of similar images. Any respons would be highly appreciated Bjorn CTO of YayMicro -- View this message in context: http://www.nabble.com/Shuffling-results-tp23715563p23715563.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with Solrits (schow only some of the fields)
On May 25, 2009, at 11:15 AM, Jörg Agatz wrote: I will use Solitas for solr.. Yay! Our first customer ;) At the moment, Solitas present All fields in the Results, but i musst change it and present only some, Like id, name, cat and inStock (exampledocuments) i think that is the code to post all fields.. #foreach($fieldname in $doc.fieldNames) $fieldname : #foreach($value in $doc.getFieldValues($fieldname)) $value #end #end Right - this is just generic code to show all stored fields from the document (TODO: and really should be adjusted to be fl parameter aware). Maby someone can explain me how i can change the code to get some field Sure... first, this page describes the objects you have available in the template (Velocity) context: http://wiki.apache.org/solr/VelocityResponseWriter - you can link off to javadocs from there to see more about what each object provides in terms of getters and such. There is a $response. From the default browse.vm template, $doc is a single item in an iteration over $response.results $doc is an org.apache.solr.common.SolrDocument (http://lucene.apache.org/solr/api/org/apache/solr/common/SolrDocument.html ) So from a $doc, you can do this: $doc.getFirstValue("name") getFirstValue is used when it is known to be a single valued field (experiment with the other getters on SolrDocument to see how they work with various fields). In the custom templates I have been using, I define a macro to make this even easier - you can define it in VM_global_library.vm in conf/ velocity to make it global to all templates: #macro(field $f)$!{esc.html($doc.getFirstValue($f))}#end Now you can use #field("name") instead, making templates much cleaner. The $!{esc.html(...)} bit is there to HTML escape the field value, otherwise it leaves the possibility of malformed rendering or even the possibility of a JavaScript injection vulnerability. The explanation point is a Velocity templating feature to not render anything if the value is null, otherwise it'll literally render "$ {}" in the output. Perhaps more than you were asking for, but I wanted to be thorough since this is a feature of Solr I'd like to see get some more, ahem, visibility. Erik
Re: Shuffling results
If simply getting random results (matching your query) from Solr is your requirement, then a dynamic RandomSortField is what you need. Details here - http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html Cheers Avlesh On Tue, May 26, 2009 at 6:54 AM, yaymicro_bjorn wrote: > > Hi > > I'm responsible for the search engine at yaymicro.com. yaymicro.com is a > microstock agency (sells images). We are using the excellent solr search > engine, but I have a problem with series of similar images showing up. I'll > try to explain: > > A search for dog for example > > http://yaymicro.com/search.action?search.search=dog&x=0&y=0&search.first=true > > very often results in variations of the images of the same motive close to > each other. This is logical, but unwanted behaviour, since we would love to > show our customers more variations in search result. > > So my question is quite simple, is there a way to configure solr to put > some > "randomness" in the search result? To shuffle the result, not completly, > but > "a bit" to avoid such series of similar images. > > Any respons would be highly appreciated > > Bjorn > CTO of YayMicro > > > > -- > View this message in context: > http://www.nabble.com/Shuffling-results-tp23715563p23715563.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Shuffling results
Hi Avlesh No, as I was trying to explain, I obviously don't want a totally random result. I just want to mix it up a "little". Is there a way to achieve this with solr? Bjorn Avlesh Singh wrote: > > If simply getting random results (matching your query) from Solr is your > requirement, then a dynamic RandomSortField is what you need. Details here > - > http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html > > Cheers > Avlesh > > On Tue, May 26, 2009 at 6:54 AM, yaymicro_bjorn > wrote: > >> >> Hi >> >> I'm responsible for the search engine at yaymicro.com. yaymicro.com is a >> microstock agency (sells images). We are using the excellent solr search >> engine, but I have a problem with series of similar images showing up. >> I'll >> try to explain: >> >> A search for dog for example >> >> http://yaymicro.com/search.action?search.search=dog&x=0&y=0&search.first=true >> >> very often results in variations of the images of the same motive close >> to >> each other. This is logical, but unwanted behaviour, since we would love >> to >> show our customers more variations in search result. >> >> So my question is quite simple, is there a way to configure solr to put >> some >> "randomness" in the search result? To shuffle the result, not completly, >> but >> "a bit" to avoid such series of similar images. >> >> Any respons would be highly appreciated >> >> Bjorn >> CTO of YayMicro >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Shuffling-results-tp23715563p23715563.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Shuffling-results-tp23715563p23716312.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shuffling results
> > I just want to mix it up a "little" > Sounds very subjective and open. Give this a thought - You can try multi-field sort with first sort being on the score (so that all the more relevant results ones appear first), and second being a sort on the random field (which shuffles the order of results with the same score). In Solr, you can do multi-field sorting like this - sort=+[,+]... Cheers Avlesh On Tue, May 26, 2009 at 8:59 AM, yaymicro_bjorn wrote: > > Hi Avlesh > > No, as I was trying to explain, I obviously don't want a totally random > result. I just want to mix it up a "little". Is there a way to achieve this > with solr? > > Bjorn > > > Avlesh Singh wrote: > > > > If simply getting random results (matching your query) from Solr is your > > requirement, then a dynamic RandomSortField is what you need. Details > here > > - > > > http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html > > > > Cheers > > Avlesh > > > > On Tue, May 26, 2009 at 6:54 AM, yaymicro_bjorn > > wrote: > > > >> > >> Hi > >> > >> I'm responsible for the search engine at yaymicro.com. yaymicro.com is > a > >> microstock agency (sells images). We are using the excellent solr search > >> engine, but I have a problem with series of similar images showing up. > >> I'll > >> try to explain: > >> > >> A search for dog for example > >> > >> > http://yaymicro.com/search.action?search.search=dog&x=0&y=0&search.first=true > >> > >> very often results in variations of the images of the same motive close > >> to > >> each other. This is logical, but unwanted behaviour, since we would love > >> to > >> show our customers more variations in search result. > >> > >> So my question is quite simple, is there a way to configure solr to put > >> some > >> "randomness" in the search result? To shuffle the result, not completly, > >> but > >> "a bit" to avoid such series of similar images. > >> > >> Any respons would be highly appreciated > >> > >> Bjorn > >> CTO of YayMicro > >> > >> > >> > >> -- > >> View this message in context: > >> http://www.nabble.com/Shuffling-results-tp23715563p23715563.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/Shuffling-results-tp23715563p23716312.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: More questions about MoreLikeThis
Thanks. That explains it! I'll set termVector to true and give it a try again. On Mon, May 25, 2009 at 7:41 AM, Koji Sekiguchi wrote: > MLT uses termVector if it exists for the field. If termVector is not > available, > MLT tries to get stored field data. If stored field is not available, MLT > does nothing for the field as you were seeing. > > So mlt.fl fields don't always need to be stored.
Re: Recover crashed solr index
Hi peter, Thank you very much for your quick reply. I tried the CheckIndex method. It can't work on my crashed index. In the error message, it says the segments file in the directory is missing. and when I use the -fix param, new segments file still can't be write. I even try the CheckIndex without the assertion, it still can't work. Do you know why this is happening ? Does it mean that the segment file can't be rewrite at all? Btw, i am using the nightly build solr. following is the error messages: [r...@localhost lib]# java -cp lucene-core-2.9-dev.jar org.apache.lucene.index.CheckIndex -fix /solr/example/data/index/ NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled Opening index @ /solr/example/data/index/ ERROR: could not read any segments file in directory java.io.FileNotFoundException: /solr/example /data/index/segments_cje (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:630) at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:660) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:566) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:560) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:224) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:292) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:688) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:289) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:258) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:678) WARNING: 0 documents will be lost NOTE: will write new segments file in 5 seconds; this will remove 0 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... Exception in thread "main" java.lang.NullPointerException at org.apache.lucene.index.CheckIndex.fixIndex(CheckIndex.java:556) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:697) Regards GC On Tue, May 26, 2009 at 12:49 AM, Peter Wolanin wrote: > you can use the lucene jar with solr to invoke the CheckIndex method - > this will possibly allow you to recover if you pass the -fix param. > > You may lose some docs, however, so this is only viable if you can, > for example, query to check what's missing. > > The command looks like (from the root of the solr svn checkout): > > java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar > org.apache.lucene.index.CheckIndex [path to index directory] > > For example, to check the example index: > > java -ea:org.apache.lucene -cp lib/lucene-core-2.9-dev.jar > org.apache.lucene.index.CheckIndex example/solr/data/index/ > > -Peter > > On Mon, May 25, 2009 at 4:42 AM, Wang Guangchen > wrote: > > Hi everyone, > > > > I have 8m docs to index, and each doc is around 50kb. The solr crashed in > > the middle of indexing. error message said that one of the file in the > data > > directory is missing. I don't know why this is happened. > > > > So right now I have to find a way to recover the index to avoid re-index. > Is > > there anyone know any tools or method to recover the crashed index? > Please > > help. > > > > Thanks a lot. > > > > Regards > > GC > > > > > > -- > Peter M. Wolanin, Ph.D. > Momentum Specialist, Acquia. Inc. > peter.wola...@acquia.com >
commit question
If I add 10 document to solrServer as in solrServer.addIndex(docs) ( Using Embedded ) and then I commit and commit fails for for some reason. Then can I retry this commit lets say after some time or the added documents are lost?? -- View this message in context: http://www.nabble.com/commit-question-tp23717415p23717415.html Sent from the Solr - User mailing list archive at Nabble.com.