Re: Term Dictionary + scoring
Grant, thank you for the link to the wiki. TermsComponent was unknown to me until now. It sounds good! > Generally, this clickthrough tracking is tied to the query, so you need a > layer above just popularity. You >need popularity per query (or in all > likelihood a subset of the queries, since you likely only care about this > >where you have a certain level of clickthroughs/queries). Yes, that's true, but how can I realize that? Saving all the queries which ever leads to a click in a field, together with the clickthroughrate sounds not clean. Okay, I could try to retrieve the values per query but it sounds really greedy. What did you mean with "layer"? Thank you Mitch -- View this message in context: http://old.nabble.com/Term-Dictionary-%2B-scoring-tp27174862p27187981.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: OverlappingFileLockException when using startup
There is no issue here, we had patched our solr to include SOLR-1595 and our webapps directory contained two wars. With only a single war file there is no issue with this replication handler. thanks for the quick response. Joe > From: isjust...@hotmail.com > To: solr-user@lucene.apache.org > Subject: RE: OverlappingFileLockException when using name="replicateAfter">startup > Date: Fri, 15 Jan 2010 18:36:13 -0600 > > > I am using the example solrconfig.xml with only a few changes. Mainly the > replication section for the master has been changed. I am not using any > plugins that I am aware of. Here is my replication section: > > > > > > startup > optimize > > > > > > If this is valid, then I will open a jira issue. > > > > Thanks, > > Joe > > > Date: Fri, 15 Jan 2010 19:06:15 -0500 > > Subject: Re: OverlappingFileLockException when using > name="replicateAfter">startup > > From: yo...@lucidimagination.com > > To: solr-user@lucene.apache.org > > > > Interesting... this should be impossible. > > Unless there is a bug in Lucene's NativeFSLock (and it doesn't look > > like it), the only way I see that this could happen is if there were > > multiple instances of that class loaded in different classloaders. > > Are you using any kind of plugins? > > > > Could you open a JIRA issue for this? > > > > -Yonik > > http://www.lucidimagination.com > > > > > > > > On Fri, Jan 15, 2010 at 5:50 PM, Joe Kessel wrote: > > > > > > I have an instance of Solr that won't start since I have added the > > > replication startup. I am using Solr 1.4 > > > and only see this with my index that contains 200k documents with a total > > > size of 400MB. Removing the replicate after startup and the instance > > > starts without error. We found that we needed replicate after startup as > > > there was no version information on the master after restarting the > > > instance. Is there something special that needs to be done when using > > > replicate after startup? Or is this a bug? > > > > > > > > > > > > below is the solr portion of the stacktrace. > > > > > > > > > > > > Thanks, > > > > > > Joe > > > > > > > > > > > > INFO: QuerySenderListener sending requests to searc...@5a425eb9 main > > > Jan 15, 2010 5:29:46 PM org.apache.solr.common.SolrException log > > > SEVERE: java.nio.channels.OverlappingFileLockException > > > at > > > sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1170) > > > at > > > sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1072) > > > at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:878) > > > at java.nio.channels.FileChannel.tryLock(FileChannel.java:962) > > > at > > > org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233) > > > at org.apache.lucene.store.Lock.obtain(Lock.java:73) > > > at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545) > > > at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402) > > > at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190) > > > at > > > org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) > > > at > > > org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) > > > at > > > org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376) > > > at > > > org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:845) > > > at > > > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:486) > > > at org.apache.solr.core.SolrCore.(SolrCore.java:588) > > > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) > > > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) > > > at > > > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) > > > at > > > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) > > > at > > > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) > > > > > > _ > > > Hotmail: Powerful Free email with security by Microsoft. > > > http://clk.atdmt.com/GBL/go/196390710/direct/01/ > > _ > Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. > http://clk.atdmt.com/GBL/go/196390709/direct/01/ _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/196390707/direct/01/
Re: How to start using Solr
Glad that I found this thread, I was searching for this issue throughout the forum. Does this mean that if I host my website on a Virtual Private Server, will it be okay if I ask my hosting provider for Windows Server (since the website was developed using asp.net) and Apache Tomcat installed? Are there any other requirements that I should ask for. Thank you. -- View this message in context: http://old.nabble.com/How-to-start-using-Solr-tp25738958p27189121.html Sent from the Solr - User mailing list archive at Nabble.com.
Fundamental questions of how to build up solr for huge portals
Hello! Our team wants to use solr for an community portal built up out of 3 and more sub portals. We are unsure in which way we sould build up the whole architecture, because we have more than one portal and we want to make them all connected and searchable by solr. Could some experts help us on these questions? - whats the best way to use solr to get the best performance for an huge portal with >5000 users that might expense fastly? - which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL based. But we want to make solr as best as it could be in all ways (performace, accesibility, way of good programming, using the whole features of lucene - like tagging, facetting and so on...) We are thankful of every suggestions :) Thanks, Peter
Re: Fundamental questions of how to build up solr for huge portals
Hello Peter, well, I am no expert on Solr, but what you want to do sounds like a case for several SolrCores [1]. I am thinking of one core per portal and one super-core to search over all portals. This would be redundant and several information will be stored twice or more times. Another way would be to build one super-index. In your schema you have to define a field (let's call it "portal") to set to which portal it's "row" belongs. If you are searching for content from the news portal, you have to facet portal:news and so on. Just some thoughts. Kind regards from Germany Mitch [1]http://wiki.apache.org/solr/CoreAdmin Peter Gabriel wrote: > > Hello! > > Our team wants to use solr for an community portal built up out of 3 and > more sub portals. We are unsure in which way we sould build up the whole > architecture, because we have more than one portal and we want to make > them all connected and searchable by solr. Could some experts help us on > these questions? > > - whats the best way to use solr to get the best performance for an huge > portal with >5000 users that might expense fastly? > - which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL > based. But we want to make solr as best as it could be in all ways > (performace, accesibility, way of good programming, using the whole > features of lucene - like tagging, facetting and so on...) > > > We are thankful of every suggestions :) > > Thanks, > Peter > > -- View this message in context: http://old.nabble.com/Fundamental-questions-of-how-to-build-up-solr-for-huge-portals-tp27189739p27189905.html Sent from the Solr - User mailing list archive at Nabble.com.
java heap space error when faceting
I have an index with more than 6 million docs. All is well, until I turn on faceting and specify a facet.field. There is only about unique 20 values for this particular facet throughout the entire index. I was able to make things a little better by using facet.method=enum. That seems to work, until I add another facet.field to the request, which is another facet that doesn't have that many unique values. I utlimately end up running out of heap space memory. I should also mention that in every case, the "rows" param is set to 0. I've thrown as much memory as I can at the JVM (+3G for start-up and max), tweaked filter cache settings etc.. I can't seem to get this error to go away. Anyone have any tips to throw my way? -- using a recent nighlty build of solr 1.5 dev and Jetty as my servlet container. Thanks! Matt
Re: OverlappingFileLockException when using startup
On Sat, Jan 16, 2010 at 7:38 AM, Joe Kessel wrote: > There is no issue here, we had patched our solr to include SOLR-1595 and our > webapps directory contained two wars. With only a single war file there is > no issue with this replication handler. Thanks Joe, for now I've added a note to the example solrconfig.xml about "native" not working for multiple solr webapps in the same JVM. -Yonik http://www.lucidimagination.com
Re: java heap space error when faceting
On Sat, Jan 16, 2010 at 10:01 AM, Matt Mitchell wrote: > I have an index with more than 6 million docs. All is well, until I turn on > faceting and specify a facet.field. There is only about unique 20 values for > this particular facet throughout the entire index. Hmmm, that doesn't sound right... unless you're already near max memory usage due to other things. Is this a single-valued or multi-valued field? If multi-valued, how many values does each document have on average? -Yonik http://www.lucidimagination.com
Re: java heap space error when faceting
These are single valued fields. Strings and integers. Is there more specific info I could post to help diagnose what might be happening? Thanks! Matt On Sat, Jan 16, 2010 at 10:42 AM, Yonik Seeley wrote: > On Sat, Jan 16, 2010 at 10:01 AM, Matt Mitchell > wrote: > > I have an index with more than 6 million docs. All is well, until I turn > on > > faceting and specify a facet.field. There is only about unique 20 values > for > > this particular facet throughout the entire index. > > Hmmm, that doesn't sound right... unless you're already near max > memory usage due to other things. > Is this a single-valued or multi-valued field? If multi-valued, how > many values does each document have on average? > > -Yonik > http://www.lucidimagination.com >
Re: java heap space error when faceting
On Sat, Jan 16, 2010 at 11:04 AM, Matt Mitchell wrote: > These are single valued fields. Strings and integers. Is there more specific > info I could post to help diagnose what might be happening? Faceting on either should currently take ~24MB (6M docs @ 4 bytes per doc + size_of_unique_values) With that small number of values, facet.enum may be faster in general (and take up less room: 6M/8*20 or 15MB). But you certainly shouldn't be running out of space with the heap sizes you mentioned. Perhaps look at the stats.jsp page in the admin and see what's listed in the fieldCache? And verify that your heap is really as big as you think it is. You can also use something like jconsole that ships with the JDK to manually do a GC and check out how much of the heap is in use before you try to facet. -Yonik http://www.lucidimagination.com
Re: Stripping Punctuation in a fieldType
: Subject: Stripping Punctuation in a fieldType : In-Reply-To: <27179780.p...@talk.nabble.com> : References: : : <27178423.p...@talk.nabble.com> <27179780.p...@talk.nabble.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Index Courruption after replication by new Solr 1.4 Replication
: Subject: Index Courruption after replication by new Solr 1.4 Replication : References: <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com> : <667725.5147...@web52905.mail.re2.yahoo.com> : <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com> : <359a92831001151042n73a47daby46ee728a86bb...@mail.gmail.com> : <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com> : <359a92831001151131o10f71619se49d66bea6fe5...@mail.gmail.com> : <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com> : In-Reply-To: <3ca90cc651ae3f4baedf8a5b78639c8c038a1...@mail02.tveyes.com> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: Errors when registering MBeans
: MBeans. I have tried to deploy it without generating MBeans but with : no luck. First off, a quick solution: if you don't care about using JMX to monitor Solr, just completley remove the "" config option from solrconfig.xml. that should eliminate all attempts by Solr to register MBeans at all. If you do care about JMX or are interested in helping diagnose this further... : org.apache.solr.core.JmxMonitoredMap put Failed to register info bean: : searcher : : javax.management.InstanceAlreadyExistsException: : solr:cell=WC_default_cell,type=searcher,node=WC_default_node,process=server1,id=org.apache.solr.search.SolrIndexSearcher ...at first glance, this seems like it *might* be because the "current" index searcher is in fact tracked twice by Solr: once using a unique name, and once using a generic name ("searcher" i believe) ... however i've never seen this cause a problem with any JmxMBeanServers before -- You can definitely get problems if you attempt to registered the same bean with the same name more then once, but unique names aren't suppose to be a problem. A quick skim of google results for InstanceAlreadyExistsException seems to bear this out, and even if there was a disconnect between your (IBM) MBeanServer impl and Solr's use of JMX o this point, it still wouldn't explain the rest of these errors below. Could you try using some JMX tools to query your servlet container to see what is/isn't registered? ... : org.apache.solr.core.JmxMonitoredMap put Failed to register info bean: : fieldValueCache : : javax.management.InstanceAlreadyExistsException: : solr:cell=WC_default_cell,type=fieldValueCache,node=WC_default_node,process=server1,id=org.apache.solr.search.FastLRUCache : org.apache.solr.core.JmxMonitoredMap put Failed to register info bean: : filterCache : : javax.management.InstanceAlreadyExistsException: : solr:cell=WC_default_cell,type=filterCache,node=WC_default_node,process=server1,id=org.apache.solr.search.FastLRUCache : [1/15/10 10:15:04:897 CET] 046e JmxMonitoredM W : org.apache.solr.core.JmxMonitoredMap put Failed to register info bean: : queryResultCache : : javax.management.InstanceAlreadyExistsException: : solr:cell=WC_default_cell,type=queryResultCache,node=WC_default_node,process=server1,id=org.apache.solr.search.LRUCache : [1/15/10 10:15:04:897 CET] 046e JmxMonitoredM W : org.apache.solr.core.JmxMonitoredMap put Failed to register info bean: : documentCache : : javax.management.InstanceAlreadyExistsException: : solr:cell=WC_default_cell,type=documentCache,node=WC_default_node,process=server1,id=org.apache.solr.search.LRUCache -Hoss
Re: only use sorting when there's no "q" is "*:*"?
: It uses the doc insertion order by default. Strictly speaking: it sorts by score, and when multiple docs have identical scores, the secondary sorting is undefined (as an implementation detail it is _usually_ doc insertion order, but that's not really garunteed. As for your original question... : > > > Is it possible to set up Solr such that when there's : > > no query (client would send : > > > in "*:*" for "q"), Solr would sort results (basically : > > all the documents) by date : > > > or some other criterion. why not use: sort = score desc, myDateField asc -Hoss
Re: java heap space error when faceting
I'm embarrassed (but hugely relieved) to say that, the script I had for starting Jetty had a bug in the way it set java options! So, my heap start/max was always set at the default. I did end up using jconsole and learned quite a bit from that too. Thanks for your help Yonik :) Matt On Sat, Jan 16, 2010 at 11:13 AM, Yonik Seeley wrote: > On Sat, Jan 16, 2010 at 11:04 AM, Matt Mitchell > wrote: > > These are single valued fields. Strings and integers. Is there more > specific > > info I could post to help diagnose what might be happening? > > Faceting on either should currently take ~24MB (6M docs @ 4 bytes per > doc + size_of_unique_values) > With that small number of values, facet.enum may be faster in general > (and take up less room: 6M/8*20 or 15MB). > But you certainly shouldn't be running out of space with the heap > sizes you mentioned. > > Perhaps look at the stats.jsp page in the admin and see what's listed > in the fieldCache? > And verify that your heap is really as big as you think it is. > You can also use something like jconsole that ships with the JDK to > manually do a GC and check out how much of the heap is in use before > you try to facet. > > -Yonik > http://www.lucidimagination.com >
Re: How to start using Solr
Java 1.6. Also decide if you need 32-bit java (limited to 2G of jvm) or 64-bit. Some kind of log file rolling or size control. On Sat, Jan 16, 2010 at 4:56 AM, nfire wrote: > > > Glad that I found this thread, I was searching for this issue throughout the > forum. > > Does this mean that if I host my website on a Virtual Private Server, will > it be okay if I ask my hosting provider for Windows Server (since the > website was developed using asp.net) and Apache Tomcat installed? Are there > any other requirements that I should ask for. > > Thank you. > -- > View this message in context: > http://old.nabble.com/How-to-start-using-Solr-tp25738958p27189121.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Fundamental questions of how to build up solr for huge portals
Hi! Your question is quite general in nature, therefore here are only a few initial remarks on how to get started: If you want to have a global search over all of your portals it might be best to start with one Solr instance and access it from all the portals. If you plan to build collections that are special to one or another portal you can do so during index-time: Just mark the indexed object in a dedicated field of the index. If you provide query handlers for each of the portals you can control the behaviour of the search based on the respective portal. You may than use query filters to filter results based on the portal. So much for the erer side. For your question about which client (language) to use: Since Solr is able to generate responses for a number of client platforms you may want to consult http://wiki.apache.org/solr/IntegratingSolr for additional information. I like to use a very lightweight solution using Java Script with the query responses from Solr being delivered via JSON. Since you can do this also for PHP clients, you might want to give it a try. Regards, Sven --On Samstag, 16. Januar 2010 15:16 +0100 Peter wrote: Hello! Our team wants to use solr for an community portal built up out of 3 and more sub portals. We are unsure in which way we sould build up the whole architecture, because we have more than one portal and we want to make them all connected and searchable by solr. Could some experts help us on these questions? - whats the best way to use solr to get the best performance for an huge portal with >5000 users that might expense fastly? - which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL based. But we want to make solr as best as it could be in all ways (performace, accesibility, way of good programming, using the whole features of lucene - like tagging, facetting and so on...) We are thankful of every suggestions :) Thanks, Peter -- kippdata informationstechnologie GmbH Sven Maurmann Tel: 0228 98549 -12 Bornheimer Str. 33a Fax: 0228 98549 -50 D-53111 Bonnsven.maurm...@kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann
Re: Encountering a roadblock with my Solr schema design...use dedupe?
I'm really interested in reading the answer to this thread as my problem is rather the same. Maybe my main difference is the huge SKU number per product I may have. David On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor wrote: > > Hoss, > > Would you suggest using dedup for my use case; and if so, do you know of a > working example I can reference? > > I don't have an issue using the patched version of Solr, but I'd much > rather > use the GA version. > > -Kelly > > > > hossman wrote: > > > > > > : Dedupe is completely the wrong word. Deduping is something else > > : entirely - it is about trying not to index the same document twice. > > > > Dedup can also certainly be used with field collapsing -- that was one of > > the initial use cases identified for the SignatureUpdateProcessorFactory > > ... you can compute an 'expensive' signature when adding a document, > index > > it, and then FieldCollapse on that signature field. > > > > This gives you "query time deduplication" based on a value computed when > > indexing (the canonical example is multiple urls refrenceing the "same" > > content but with slightly differnet boilerplate markup. You can use a > > Signature class that recognizes the boilerplate and computes an identical > > signature value for each URL whose content is "the same" but still index > > all of the URLs and their content as distinct documents ... so use cases > > where people only "distinct" URLs work using field collapse but by > default > > all matching documents can still be returned and searches on text in the > > boilerplate markup also still work. > > > > > > -Hoss > > > > > > > > -- > View this message in context: > http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: recent query execution cache in Solr
: Yes, it's the cache. But not document/query/filter cache, but http : cache. Yes, you can disable it in solrconfig.xml Specificly: it is (probably) your browser cache, as Solr doesn't cache anything between restarts. Info about disabling (or changing the rules for) HTTP caching can be found here... http://wiki.apache.org/solr/SolrConfigXml#HTTP_Caching -Hoss