We will be suing Solr for indexing and Cassandra/Membase/Hbase instead of a
database.
That is the idea now unless some body gives a better solution :-)
thanks
--Siju
On Tue, Aug 31, 2010 at 11:39 AM, Amit Nithian wrote:
> I am curious about this too.. are you talking about using HBase/Cassandr
I am curious about this too.. are you talking about using HBase/Cassandra as
an aux store of large data or using Cassandra to store the actual lucene
index (as in LuCandra)?
On Mon, Aug 30, 2010 at 11:06 PM, Siju George wrote:
> Thanks a million Nick,
>
> We are currently debating whether we sho
Thanks a million Nick,
We are currently debating whether we should use cassandra or membase or
hbase with solr.
Do you have anything to contribute as advice to us?
Thanks again :-)
--Siju
On Tue, Aug 31, 2010 at 5:15 AM, nickdos wrote:
>
> Yes, we are Cassandra. There is nothing much to say r
Hey,
Currently we have indexed some biological fulltext files. I was wondering how to
config the schema.xml such that the gene names (eg, 'met1', 'met2', 'met3' etc)
won't
be stemmed into the same word ('met'). I added these gene names into the
protwords.txt
file but it doesn't seem to work.
Has anyone managed to deploy Solr 1.4.1 into Jboss AS 6? If yes could
you provide the required steps for deployment?
Thanks,
Bruno
There are synchronization points, which become chokepoints at some
number of cores. I don't know where they cause Lucene to top out.
Lucene apps are generally disk-bound, not CPU-bound, but yours will
be. There are so many variables that it's really not possible to give
any numbers.
Lance
On Mon,
Lance,
makes sense and I have heard about the long GC times on large heaps but I
personally haven't experienced a slowdown but that doesn't mean anything
either :-). Agreed that tuning the SOLR caching is the way to go.
I haven't followed all the solr/lucene changes but from what I remember
there
I am also curious as Amit does. Can you make an example about the garbage
collection problem you mentioned?
- Original Message -
From: "Lance Norskog"
To:
Sent: Tuesday, August 31, 2010 9:14 AM
Subject: Re: Hardware Specs Question
It generally works best to tune the Solr caches an
This is a mass batch-processing task, rather than a search task.
Mahout is the right Apache project for implementing this. It would
then create a set of (document->document list). You could then add
this to a Solr index. (And invert the graph and add those lists.)
It might be possible to do this w
: how could I have the highlighting component return only the terms that were
: matched, without any surrounding text ?
I'm not a Highlighter expert, but this is somethign that certainly
*sounds* like it should be easy.
I took a shot at it and this is hte best i could come up with...
http://lo
It generally works best to tune the Solr caches and allocate enough
RAM to run comfortably. Linux & Windows et. al. have their own cache
of disk blocks. They use very good algorithms for managing this cache.
Also, they do not make long garbage collection passes.
On Mon, Aug 30, 2010 at 5:48 PM, Am
Lance,
Thanks for your help. What do you mean by that the OS can keep the index in
memory better than Solr? Do you mean that you should use another means to
keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap
size/index size that you follow?
Thanks
Amit
On Mon, Aug 30, 20
Short summary:
* Multiple simultaneous phrase boosts with different ps2 parameters
are working very nicely for me on a few million doc QA system.
* I've submitted an updated patch to Jira incorporating feedback
from the jira comments. Will be testing it more this week.
https://issues
The price-performance knee for small servers is 32G ram, 2-6 SATA
disks on a raid, 8/16 cores. You can buy these servers and half-fill
them, leaving room for expansion.
I have not done benchmarks about the max # of processors that can be
kept busy during indexing or querying, and the total numbers
Hi all,
I am curious to know get some opinions on at what point having more CPU
cores shows diminishing returns in terms of QPS. Our index size is about 8GB
and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
Currently I have the heap to 8GB.
We are looking to get more servers to
Yes, we are Cassandra. There is nothing much to say really, it just works.
Note we are SOLR generating indexes using Java & SolrJ (embedded mode) and
reading data out of Cassandra with Java. Index generation is fast.
--
View this message in context:
http://lucene.472066.n3.nabble.com/anybody-usi
Hi-
Here is how it works: Lucene uses TF/DF as the "relevance" formula.
This means "term frequency divided by document frequency", or the
number of times a term appears in one document over the number of
documents that term appears in.
This is the basic idea: suppose there are 10 documents say "s
Am 26.08.2010 um 21:07 schrieb Ingo Renner:
For those interested and for "the" Google, I found a working solution myself.
The QParser is now down to this:
public AccessFilterQParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) {
super(q
Hi all,
I'm looking for examples or pointers to some info on implementing custom
scoring in solr/lucene. Basically, what we're looking at doing is to augment
the score from a dismax query with some custom signals based on data in fields
from the row initially matched. There will be several of t
The new spatial filtering (SOLR-1586) works great and is much faster
than fq={!frange. However, I am having problems sorting by distance.
If I try
GET
'http://localhost:8983/solr/select/?q=*:*&sort=dist(2,latitude,longitude,0,0)+asc'
I get an error:
Error 400 can not sort on unindexed field: dist(
please come to the Southern California area
On Mon, Aug 30, 2010 at 1:14 PM, Grant Ingersoll wrote:
> I'm pleased to announce the very first ever RTP area (Raleigh, Durham,
> Chapel Hill NC) Lucene/Solr meetup on Sept. 21. The event will be held at
> Lulu Press and co-sponsored by Lucid Imaginat
I'm pleased to announce the very first ever RTP area (Raleigh, Durham, Chapel
Hill NC) Lucene/Solr meetup on Sept. 21. The event will be held at Lulu Press
and co-sponsored by Lucid Imagination. To learn more and RSVP, please see
http://www.meetup.com/RTP-Apache-Solr-Lucene-Meetup/
Hope to se
Thanks for the section on "Passing "parameters" to DIH config:"
I'm going to try the parameter passing to allow the DIH to index
different DBs based on the system environment(local dev machine or
production machine)
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad sch
Thanks Lance.
I have decided to just put all of my processing on a bigger server along
with solr. It's too bad, but I can manage.
-Max
On Sun, Aug 29, 2010 at 9:59 PM, Lance Norskog wrote:
> No. Document creation is all-or-nothing, fields are not updateable.
>
> I think you have to filter all
On 8/30/2010 9:01 AM, Shawn Heisey wrote:
On 8/29/2010 2:17 PM, Erick Erickson wrote:
<<>>
Try putting this after any instances of, say, WhiteSpaceTokenizerFactory
in your analyzser definition, and I believe you'll see that this is not
true.
At least looking at this in the analysis page from S
On 8/29/2010 2:17 PM, Erick Erickson wrote:
<<>>
Try putting this after any instances of, say, WhiteSpaceTokenizerFactory
in your analyzser definition, and I believe you'll see that this is not
true.
At least looking at this in the analysis page from SOLR admin sure doesn't
seem to support that
Hallo everyone,
I installed the JTeam solr spatial plugin into Solr 1.4.
It seems to work fine except that I am unable to get the calculated distance
field back.
q={!spatial lat=49.294854 long=8.36869 radius=100 unit=km calc=arc
threadCount=2}*:*
fl=geo_distance
Any help would greatly be appreci
Hi Grant,
Thanks for the explanation.
Regards
ericz
On Mon, Aug 30, 2010 at 3:22 PM, Grant Ingersoll wrote:
>
> On Aug 30, 2010, at 7:20 AM, Eric Grobler wrote:
>
> > Hi Solr Community
> >
> > If you use a filter like:
> > q=*:*
> > fq=make:Volkswagen
> >
> > and then the next query is:
> >
Hi,
several documents from my index contain the phrase : "PS et".
However, PS is expanded to "parti socialiste" and a phrase search for
"PS et" fails.
A phrase search for "parti socialiste et" succeeds.
Can I have both queries working ?
Here's the field type :
Hi,
Is there any implementation in solr or lucene for "affinity ranking"? I've
been doing some research for content based ranking models and came across
the paper "Improving search results using affinity Graph"
http://research.microsoft.com/apps/pubs/default.aspx?id=67818
Any thoughts?
Cheers
Uk
Some of it will also depend on things like your caches, heap size, etc.
-Grant
On Aug 26, 2010, at 12:37 AM, Chengyang wrote:
> We have about 500million documents are indexed.The index size is aobut 10G.
> Running on a 32bit box. During the pressure testing, we monitered that the
> JVM GC is
On Aug 30, 2010, at 7:20 AM, Eric Grobler wrote:
> Hi Solr Community
>
> If you use a filter like:
> q=*:*
> fq=make:Volkswagen
>
> and then the next query is:
> q=blue
> fq=make:Volkswagen
>
> will Solr use the filter cache before the main query, or only after a "blue"
> subset?
The firs
After wasting a few days navigating the somewhat uncharted and murky
waters of DIH, thought I'd share my insights with the community to save
other newbies time, so here goes...
First off, this is not to say DIH is bad, I think it's great and it
works really well for my uses, but it has a few undoc
HI all,
iam using solr 1.4.0 with java.
recently i observed in my solr logs ,
Because of the invalid userName
i got java.sql.SQLException: Access denied for user '1234'@'localhost
i resolved this but iam not able to capture this error in my code so that to
throw a Proper message to the user .
h
Hi Solr Community
If you use a filter like:
q=*:*
fq=make:Volkswagen
and then the next query is:
q=blue
fq=make:Volkswagen
will Solr use the filter cache before the main query, or only after a "blue"
subset?
In other words will this query make more sense?
q=(blue) AND (make:Volkswagen)
Am 26.08.2010 um 21:07 schrieb Ingo Renner:
Hi again,
> I implemented a custom filter and am using it through a QParserPlugin. I'm
> wondering however, whether my implementation is that clever yet...
>
> Here's my QParser; I'm wondering whether I should apply the filter to all
> documents in
36 matches
Mail list logo