Re: Dynamically calculated range facet
I know this is an old thread, but I am not sure if this thread concluded with any concrete result. Martin, I see you started some changes to Solr Code in "RangeRequestHandler". Did you ever release them as a patch? Has anybody been working on this issue. I read through most of Hossman's posts and I understand the necessity to make this feature as generic as possible, but for now I am only interested in using it for simple things like prices and percent discount etc. I just want to know if anybody has 1) released any code as patch which touches this issue 2) created a Jira issue which I can read and contribute to 3) added something to wiki Otherwise, I would be happy to start this on jira with links to all the posts I have found on this issue. Thanks Martin Grotzke wrote: > > On Tue, 2007-06-26 at 23:22 -0700, Chris Hostetter wrote: >> : So if it would be possible to go over each item in the search result >> : I could check the price field and define my ranges for the specific >> : query on solr side and return the price ranges as a facet. >> >> : Otherwise, what would be a good starting point to plug in such >> : functionality into solr? >> >> if you relaly want to do statistical distributions, one way to avoid >> doing >> all of this work on the client side (and needing to pull back all of hte >> prices from all of hte matches) would be to write a custom request >> handler >> that subclasses whichever on you currently use and does this computation >> on the server side -- where it has lower level access to the data and >> doesn't need to stream it over the wire. FieldCache in particular would >> come in handy. > Now we want to have fun with statistics and calculation, and I just set > up a new project with a dependency on apache-solr-1.2.0. I started a > RangeRequestHandler extending StandardRequestHandler, but I don't really > see where to plug in. Most probably it's the handleRequestBody, but > there's a lot of stuff in StandardRequestHandler.handleRequestBody that > I do not want to repeat... > > To ask a question: how could I get each document of the result to > check the price and do some calculation at the end? > Logging that to stdout would be fine at first, afterwards I would > like to add a new facet to the result with some info. > > Thanx in advance, > cheers, > Martin > > >> >> it occurs to me that even though there may not be a way to dynamicly >> create facet ranges that can apply usefully on any numeric field, we >> could >> add generic support to the request handlers for optionally fetching some >> basic statistics about a DocSet for clients that want them (either for >> building ranges, or for any other purpose) >> >> min, max, mean, median, mode, midrange ... those should all be easy to >> compute using the ValueSource from the field type (it would be nice if >> FieldType's had some way of indicating which DocValues function can best >> manage the field type, but we can always assume float or have an option >> for dictating it ... people might want a float mean for an int field >> anyway) >> >> i suppose even stddev could be computed fairly easily ... there's a >> formula for that that works well in a single pass over a bunch of values >> right? >> >> >> >> >> -Hoss >> > -- > Martin Grotzke > http://www.javakaffee.de/blog/ > > > -- View this message in context: http://www.nabble.com/Dynamically-calculated-range-facet-tp11314725p22113426.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: grouping response docs together
Collapse component may be of interest to you https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel On Fri, May 15, 2009 at 3:52 PM, Matt Mitchell wrote: > Is there a built-in mechanism for grouping similar documents together in > the > response? I'd like to make it look like there is only one document with > multiple "hits". > > Matt >
Indexing CSV without HTTP
Hi Everyone, We are indexing quite a lot of data using update/csv handler. For reasons I can't get into right now, I can't implement a DIH since I can only access the DB using Stored Procs and stored proc support in DIH is not yet available. Indexing takes about 3 hours and I don't want to tax the server too much during indexing so I came up with a two server solution. Indexing server to index the file every night and subsequently copy the index on the search server. Maintaining a full fledged Tomcat/Jetty for just indexing is too much of a pain, so I wrote a small utility Java class which starts an Embedded Server, indexes the CSV and shuts down the server. I would like the community's input on this solution. Is this Okay to do? Is there a better way to do this without running two separate servers? Is my class safe enough to run everynight in production environment? Here's my utility calss. This is just a POC and before I productionize it, I would like some input from Solr Czars here. import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest; import org.apache.solr.common.util.NamedList; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.CoreDescriptor; import org.apache.solr.core.SolrConfig; import org.apache.solr.core.SolrCore; import java.io.File; public class StandaloneSolrIndexer { public static void main(String args[]) throws Exception { SolrCore core = null; CoreContainer container = null; try { container = new CoreContainer(); SolrConfig config = new SolrConfig("/tmp/solr", "solrconfig.xml", null); CoreDescriptor descriptor = new CoreDescriptor(container, "core1", "/tmp/solr"); core = new SolrCore("core1", "/tmp/solr/data", config, null, descriptor); container.register("core1", core, false); SolrServer server = new EmbeddedSolrServer(container, "core1"); //Start by deleting everything server.deleteByQuery("*:*"); ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/csv"); req.addFile(new File("/tmp/product-5k.tsv")); req.setParam("commit", "true"); req.setParam("stream.contentType", "text/plain;charset=utf-8"); req.setParam("escape", "\\"); req.setParam("separator", "\t"); req.setParam("fieldnames", "product_id,account_id,name,category_tags,short_desc,upc,manu_mdl_num,ext_prd_id,brand,long_desc,sku,seller,seller_email,vertical,cat,subcat"); req.setParam("skipLines", "1"); NamedList result = server.request(req); System.out.println("Result : \n" + result); } finally { if (core != null) core.close(); if (container != null) container.shutdown(); } } } Thanks, Rohit
Re: Indexing CSV without HTTP
Thanks Yonik! We want to go to Index replication soon (couple of months), which will also help with incremental updates. But for now we want a quick and dirty solution without running two servers. Does the utility look ok to index a CSV file? Is it safe to do in production environment? I know maintaining custom server code is not a good idea, but this is just until we can implement index replication. On Thu, Feb 4, 2010 at 12:28 PM, Yonik Seeley wrote: > On Thu, Feb 4, 2010 at 3:03 PM, Rohit Gandhe wrote: >> We are indexing quite a lot of data using update/csv handler. For >> reasons I can't get into right now, I can't implement a DIH since I >> can only access the DB using Stored Procs and stored proc support in >> DIH is not yet available. Indexing takes about 3 hours and I don't >> want to tax the server too much during indexing so I came up with a >> two server solution. Indexing server to index the file every night and >> subsequently copy the index on the search server. > > Why not use the built-in index replication? > >> Maintaining a full >> fledged Tomcat/Jetty for just indexing is too much of a pain, so I >> wrote a small utility Java class which starts an Embedded Server, > > Surely maintaining your own custom server code is going to be more > work than simply running a server provided by the community? > > -Yonik > http://www.lucidimagination.com >