Re: Dynamically calculated range facet

2009-02-19 Thread Rohit Gandhe

I know this is an old thread, but I am not sure if this thread concluded with
any concrete result. 

Martin,
I see you started some changes to Solr Code in "RangeRequestHandler". Did
you ever release them as a patch? Has anybody been working on this issue. I
read through most of Hossman's posts and I understand the necessity to make
this feature as generic as possible, but for now I am only interested in
using it for simple things like prices and percent discount etc.

I just want to know if anybody has 
1) released any code as patch which touches this issue
2) created a Jira issue which I can read and contribute to
3) added something to wiki

Otherwise, I would be happy to start this on jira with links to all the
posts I have found on this issue.
Thanks


Martin Grotzke wrote:
> 
> On Tue, 2007-06-26 at 23:22 -0700, Chris Hostetter wrote:
>> : So if it would be possible to go over each item in the search result
>> : I could check the price field and define my ranges for the specific
>> : query on solr side and return the price ranges as a facet.
>> 
>> : Otherwise, what would be a good starting point to plug in such
>> : functionality into solr?
>> 
>> if you relaly want to do statistical distributions, one way to avoid
>> doing
>> all of this work on the client side (and needing to pull back all of hte
>> prices from all of hte matches) would be to write a custom request
>> handler
>> that subclasses whichever on you currently use and does this computation
>> on the server side -- where it has lower level access to the data and
>> doesn't need to stream it over the wire.  FieldCache in particular would
>> come in handy.
> Now we want to have fun with statistics and calculation, and I just set
> up a new project with a dependency on apache-solr-1.2.0. I started a
> RangeRequestHandler extending StandardRequestHandler, but I don't really
> see where to plug in. Most probably it's the handleRequestBody, but
> there's a lot of stuff in StandardRequestHandler.handleRequestBody that
> I do not want to repeat...
> 
> To ask a question: how could I get each document of the result to
> check the price and do some calculation at the end?
> Logging that to stdout would be fine at first, afterwards I would
> like to add a new facet to the result with some info.
> 
> Thanx in advance,
> cheers,
> Martin
> 
> 
>> 
>> it occurs to me that even though there may not be a way to dynamicly
>> create facet ranges that can apply usefully on any numeric field, we
>> could
>> add generic support to the request handlers for optionally fetching some
>> basic statistics about a DocSet for clients that want them (either for
>> building ranges, or for any other purpose)
>> 
>> min, max, mean, median, mode, midrange ... those should all be easy to
>> compute using the ValueSource from the field type (it would be nice if
>> FieldType's had some way of indicating which DocValues function can best
>> manage the field type, but we can always assume float or have an option
>> for dictating it ... people might want a float mean for an int field
>> anyway)
>> 
>> i suppose even stddev could be computed fairly easily ... there's a
>> formula for that that works well in a single pass over a bunch of values
>> right?
>> 
>> 
>> 
>> 
>> -Hoss
>> 
> -- 
> Martin Grotzke
> http://www.javakaffee.de/blog/
> 
>  
> 

-- 
View this message in context: 
http://www.nabble.com/Dynamically-calculated-range-facet-tp11314725p22113426.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: grouping response docs together

2009-05-15 Thread Rohit Gandhe
Collapse component may be of interest to you

https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel


On Fri, May 15, 2009 at 3:52 PM, Matt Mitchell  wrote:

> Is there a built-in mechanism for grouping similar documents together in
> the
> response? I'd like to make it look like there is only one document with
> multiple "hits".
>
> Matt
>


Indexing CSV without HTTP

2010-02-04 Thread Rohit Gandhe
Hi Everyone,

We are indexing quite a lot of data using update/csv handler. For
reasons I can't get into right now, I can't implement a DIH since I
can only access the DB using Stored Procs and stored proc support in
DIH is not yet available. Indexing takes about 3 hours and I don't
want to tax the server too much during indexing so I came up with a
two server solution. Indexing server to index the file every night and
subsequently copy the index on the search server. Maintaining a full
fledged Tomcat/Jetty for just indexing is too much of a pain, so I
wrote a small utility Java class which starts an Embedded Server,
indexes the CSV and shuts down the server. I would like the
community's input on this solution.

Is this Okay to do?
Is there a better way to do this without running two separate servers?
Is my class safe enough to run everynight in production environment?

Here's my utility calss. This is just a POC and before I productionize
it, I would like some input from Solr Czars here.

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.SolrConfig;
import org.apache.solr.core.SolrCore;

import java.io.File;

public class StandaloneSolrIndexer {

public static void main(String args[]) throws Exception {

SolrCore core = null;
CoreContainer container = null;
try {
container = new CoreContainer();

SolrConfig config = new SolrConfig("/tmp/solr",
"solrconfig.xml", null);
CoreDescriptor descriptor = new CoreDescriptor(container,
"core1", "/tmp/solr");

core = new SolrCore("core1", "/tmp/solr/data", config,
null, descriptor);
container.register("core1", core, false);

SolrServer server = new EmbeddedSolrServer(container, "core1");

//Start by deleting everything
server.deleteByQuery("*:*");

ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/csv");
req.addFile(new File("/tmp/product-5k.tsv"));

req.setParam("commit", "true");
req.setParam("stream.contentType", "text/plain;charset=utf-8");
req.setParam("escape", "\\");
req.setParam("separator", "\t");
req.setParam("fieldnames",
"product_id,account_id,name,category_tags,short_desc,upc,manu_mdl_num,ext_prd_id,brand,long_desc,sku,seller,seller_email,vertical,cat,subcat");
req.setParam("skipLines", "1");

NamedList result = server.request(req);
System.out.println("Result
:
\n" + result);

} finally {
if (core != null) core.close();
if (container != null) container.shutdown();
}
}
}


Thanks,
Rohit


Re: Indexing CSV without HTTP

2010-02-04 Thread Rohit Gandhe
Thanks Yonik! We want to go to Index replication soon (couple of
months), which will also help with incremental updates. But for now we
want a quick and dirty solution without running two servers. Does the
utility look ok to index a CSV file? Is it safe to do in production
environment? I know maintaining custom server code is not a good idea,
but this is just until we can implement index replication.

On Thu, Feb 4, 2010 at 12:28 PM, Yonik Seeley
 wrote:
> On Thu, Feb 4, 2010 at 3:03 PM, Rohit Gandhe  wrote:
>> We are indexing quite a lot of data using update/csv handler. For
>> reasons I can't get into right now, I can't implement a DIH since I
>> can only access the DB using Stored Procs and stored proc support in
>> DIH is not yet available. Indexing takes about 3 hours and I don't
>> want to tax the server too much during indexing so I came up with a
>> two server solution. Indexing server to index the file every night and
>> subsequently copy the index on the search server.
>
> Why not use the built-in index replication?
>
>> Maintaining a full
>> fledged Tomcat/Jetty for just indexing is too much of a pain, so I
>> wrote a small utility Java class which starts an Embedded Server,
>
> Surely maintaining your own custom server code is going to be more
> work than simply running a server provided by the community?
>
> -Yonik
> http://www.lucidimagination.com
>