Solr Size Estimator (JIRA#3435) . . .
Hi, In working through some updates for the Solr Size Estimator, I have found a number of gaps in the Solr Wiki. I've Google'd to a fair degree on each of these and either found nothing or an insufficient explanation. In particular, for each of the following I'm looking for: A) An explanation of what it is B) How to use it or estimate its size Topics: 1) fieldValueCache 2) RamBufferSize 3) Transient Factor 4) Average number of Bytes per Term 5) Cache Key Average Size (Bytes) 6) Avgerage QueryResultKey size (in bytes) Appreciate any input, so I can update the Solr Wiki as needed. C
Faceting is not Using Field Value Cache . . ?
Seeing something odd going on with faceting . . . we execute facets with every query and yet the fieldValueCache is not being used: name: fieldValueCache class: org.apache.solr.search.FastLRUCache version: 1.0 description: Concurrent LRU Cache(maxSize=1, initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false) stats: lookups : 0 hits : 0 hitratio : 0.00 inserts : 0 evictions : 0 size : 0 warmupTime : 0 cumulative_lookups : 0 cumulative_hits : 0 cumulative_hitratio : 0.00 cumulative_inserts : 0 cumulative_evictions : 0 I was under the impression the fieldValueCache was an implicit cache (if you don't define it, it will still exist). We are running Solr v3.3 (and NOT using {!cache=false}). Thoughts?
FastVectorHighlighter ignoring fragmenter parameter . . .
Got the FVH to work in Solr 3.1 (or at least I presume I have given I can see multi-color highlighting in the output.) But I am not able to get it to recognize the "regex" fragmenter. I get no change in output if I specify the fragmenter. In fact, I can even enter bogus names for the fragmenter and get no change in the output. Grateful for any suggestions. Settings and output below. Christopher *Query* http://localhost:8983/solr/10k-Fragments/select? q=content%3Aliquidity &rows=100 &fl=id%2Ccontent &qt=standard &hl.fl=content &hl.useFastVectorHighlighter=true &hl=true &hl.fragmentsBuilder=colored &hl.fragmenter=regex *Response* (Abbreviated) - 0 47 - id,content true content:liquidity regex1text content colored standard true 100 . . . - - - ᆘ Liquidity is a measure of a bank's ability to fund loans and withdrawals of deposits in a cost-ef . . . *Field listing in schema.xml* *Highlighter listing in solrconfig.xml* 100 70 0.5 [-\w ,/\n\"']{20,200}
Re: FastVectorHighlighter ignoring fragmenter parameter . . .
Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom extensions of *FragmentsBuilder <http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FragmentsBuilder.html> & **FragListBuilder <http://lucene.apache.org/java/3_0_1/api/contrib-fast-vector-highlighter/org/apache/lucene/search/vectorhighlight/FragListBuilder.html> *interfaces to take in and apply the regex. I would be happy to contribute back what I create. Appreciate whatever guidance you can offer, Christopher On 2:59 PM, Koji Sekiguchi wrote: (10/12/05 5:53), CRB wrote: Got the FVH to work in Solr 3.1 (or at least I presume I have given I can see multi-color highlighting in the output.) But I am not able to get it to recognize the "regex" fragmenter. I get no change in output if I specify the fragmenter. In fact, I can even enter bogus names for the fragmenter and get no change in the output. Grateful for any suggestions. Settings and output below. Christopher *Query* http://localhost:8983/solr/10k-Fragments/select? q=content%3Aliquidity &rows=100 &fl=id%2Ccontent &qt=standard &hl.fl=content &hl.useFastVectorHighlighter=true &hl=true &hl.fragmentsBuilder=colored &hl.fragmenter=regex Christopher, Because algorithm of FVH is totally different from (traditional) highlighter, FVH doesn't see hl.fragmenter and hl.formatter, but see hl.fragListBuilder and hl.fragmentsBuilder instead. I think your settings and request/response looks good except hl.fragmenter=regex. FVH simply ignores the parameter. Koji
Using Saxon 9 as a response writer with Solr 3.1 . . ?
Has anyone been able to get Saxon 9 working with Solr3.1? I was following the wiki page (http://wiki.apache.org/solr/XsltResponseWriter), placing all the saxon-*.jars are in Jetty's lib/ext folder and start with java -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl -jar start.jar But get an ugly dump of errors from Jetty: 2010-12-06 13:29:16.515::WARN: failed SolrRequestFilter java.lang.NoSuchMethodError: net.sf.saxon.dom.DOMEnvelope.getInstance()Lnet/sf/saxon/dom/DOMEnvelope; at net.sf.saxon.java.JavaPlatform.initialize(JavaPlatform.java:43) at net.sf.saxon.Configuration.init(Configuration.java:392) at net.sf.saxon.Configuration.(Configuration.java:311) at net.sf.saxon.xpath.XPathFactoryImpl.makeConfiguration(XPathFactoryImpl.java:41) at net.sf.saxon.xpath.XPathFactoryImpl.(XPathFactoryImpl.java:26) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) at javax.xml.xpath.XPathFactoryFinder.loadFromService(Unknown Source) at javax.xml.xpath.XPathFactoryFinder._newFactory(Unknown Source) at javax.xml.xpath.XPathFactoryFinder.newFactory(Unknown Source) at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) at javax.xml.xpath.XPathFactory.newInstance(Unknown Source) at org.apache.solr.core.Config.(Config.java:50) at org.apache.solr.servlet.SolrDispatchFilter.(SolrDispatchFilter.java:68) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) at org.mortbay.jetty.servlet.Holder.newInstance(Holder.java:153) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:94) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115)
Re: FastVectorHighlighter ignoring fragmenter parameter . . .
Koji, Thank you for the reply. Being something of a novice with Solr, I would be grateful if you could clarify my next steps. I infer from your reply that there is no current implementation yet contributed for the FVH similar to the regex fragmenter. Thus I need to write my own custom extensions of FragmentsBuilder & FragListBuilder interfaces to take in and apply the regex. I would be happy to contribute back what I create. Appreciate whatever guidance you can offer, Christopher
Function Query Syntax?
We have documents which are comprised of: - A short list of terms (about 1 to 5 terms per document) - An estimate of the probability of the terms occurrence (stored as tint) For each term in the index, we would like to get the result of the following function: (our estimate of the probability/100) x (a term's Document Frequency) So if the term "fox" occurred in 7 documents, the desired query result would look something like: fox 7 23 1.61 We can find a number of examples for using function queries to alter scoring or sorting results, but can not find any that show how to get the value of actual function result back.
edismax - Handling collocations mapped to a single token . . ?
We are trying to get edismax to handle collocations mapped to a single token. To do so we need to manipulate the "chunks" (as Hoss referred to them in http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/) generated by the dismax parser. We have numerous collocations (terms of speech which do not directly relate to the constituent words that make up the saying). For example, at index time "real estate" is mapped to "real_estate" to avoid it colliding with searches for "estate" or "real value". So we need the "chunks" to reflect this mapping of multi-word phrases to a single token that is done during indexing (via the synonym filter). In an ideal world, we would just list the queryAnalyzerFieldType that should be used in pre-processing the query string before it is divided into "chunks" (similar to what is done with the SpellChecker Compoenent). But our impression thus far is that we are off the reservation and will need to hack away at org.apache.solr.search.ExtendedDismaxQParser.splitIntoClauses(String, boolean). Is it correct that the only pre-processing by dismax is on stopwords? Is it correct to be able to limit customization to splitIntoClauses(String, boolean) to handle this? Regards, Christopher