Max Field Scoring?

2011-10-07 Thread Brian Gerby

Hi all - 
I am trying to figure out if a particular use case is possible with solr. Let's 
say we are using solr to store a group of people and the universities they 
attended. We have four fields - NAME, PHD to house the name of the university 
they received a phd degree from, MASTERS to house the name of the university 
they received a masters from and BACHELORS to house the university the received 
a bachelors from. We also want to give a really big boost to matches for the 
PHD field (100), a big boost for matches to the MASTERS field (50) and a small 
boost to matches for the BACHELORS field (10). If someone attended the same 
university for all three degrees, we only want to use the score for the highest 
boost, the PHD field. The desired net result would be a search for 'Stanford' 
would boost documents the same for someone that received a PHD from Stanford as 
someone that received a PHD, a MASTERS and a BACHELORS from Stanford. We don't 
want the boosted score of all three fields to be used so simply adding all the 
boosted fields won't work. Is this possible and if so, what's the best way? A 
boost function using subqueries?
Many thanks in advance, 

Brian
  

Re: Max Field Scoring?

2011-10-08 Thread Brian Gerby
Thanks Ahmet. Do you know of a way to set a tie on only a set of fields, so in 
this case on PHD, MASTERS and BACHELORS, but not on name? The end result being 
s student named Stanford who went to Stanford for a PHD would get a higher 
score than someone named Joe with the same degree with a query for 'stanford'. 
My use case will have a few of these groupings. I'd like to use the tie for the 
groups, not all fields. 

On Oct 8, 2011, at 2:12 AM, Ahmet Arslan  wrote:

>> &defType=dismax&qf=NAME PHD MASTERS
>> BACHELORS&tie=0.0&q=Stanford
> 
> I forgot to add boost values
> 
> &defType=dismax&qf=NAME PHD^100 MASTERS^50 BACHELORS^10&tie=0.0&q=Stanford
> 


RE: Three questions about: Commit, single index vs multiple indexes and implementation advice

2011-11-04 Thread Brian Gerby

Gustavo - 

Even with the most basic requirements, I'd recommend setting up a multi-core 
configuration so you can RELOAD the main core you will be using when you make 
simple changes to config files. This is much cleaner than bouncing solr each 
time. There are other benefits to doing it, but this is the main reason I do 
it.  

Brian 

> Date: Fri, 4 Nov 2011 15:34:27 -0300
> Subject: Re: Three questions about: Commit, single index vs multiple indexes 
> and implementation advice
> From: comfortablynum...@gmail.com
> To: solr-user@lucene.apache.org
> 
> First of all, thanks a lot for your answer.
> 
> 1) I could use 5 to 15 seconds between each commit and give it a try. Is
> this an acceptable configuration? I'll take a look at NRT.
> 2) Currently I'm using a single core, the simplest setup. I don't expect to
> have an overwhelming quantity of records, but I do have lots of classes to
> persist, and I need to search all of them at the same time, and not per
> class (entity). For now is working good. With multiple indexes I mean using
> an index for each entity. Let's say, an index for "Articles", another for
> "Users", etc. The thing is that I don't know when I should divide it and
> use one index for each entity (or if it's possible to make a "UNION" like
> search between every index). I've read that when an entity reaches the size
> of one million records then it's best to give it a dedicated index, even
> though I don't expect to have that size even with all my entities. But I
> wanted to know from you just to be sure.
> 3) Great! for now I think I'll stick with one index, but it's good to know
> that in case I need to change later for some reason.
> 
> 
> 
> Again, thanks a lot for your help!
> 
> 2011/11/4 Erick Erickson 
> 
> > Let's see...
> > 1> Committing every second, even with commitWithin is probably going
> > to be a problem.
> > I usually think that 1 second latency is usually overkill, but
> > that's up to your
> > product manager. Look at the NRT (Near Real Time) stuff if you
> > really need this.
> > I thought that NRT was only on trunk, but it *might* be in the
> > 3.4 code base.
> > 2> Don't understand what "a single index per entity" is. How many cores do
> > you
> > have total? For not very many records, I'd put everything in a
> > single index and
> > use filterqueries to restrict views.
> > 3> I guess this relates to <2>. And I'd use a single core. If, for
> > some reason, you decide
> > that you need multiple indexes, use several cores with ONE Solr
> > rather than start
> > a new Solr per core, it's more resource expensive to have
> > multiple JVMs around.
> >
> > Best
> > Erick
> >
> > On Thu, Nov 3, 2011 at 2:03 PM, Gustavo Falco
> >  wrote:
> > > Hi guys!
> > >
> > > I have a couple of questions that I hope someone could help me with:
> > >
> > > 1) Recently I've implemented Solr in my app. My use case is not
> > > complicated. Suppose that there will be 50 concurrent users tops. This is
> > > an app like, let's say, a CRM. I tell you this so you have an idea in
> > terms
> > > of how many read and write operations will be needed. What I do need is
> > > that the data that is added / updated be available right after it's
> > added /
> > > updated (maybe a second later it's ok). I know that the commit operation
> > is
> > > expensive, so maybe doing a commit right after each write operation is
> > not
> > > a good idea. I'm trying to use the autoCommit feature with a maxTime of
> > > 1000ms, but then the question arised: Is this the best way to handle this
> > > type of situation? and if not, what should I do?
> > >
> > > 2) I'm using a single index per entity type because I've read that if the
> > > app is not handling lots of data (let's say, 1 million of records) then
> > > it's "safe" to use a single index. Is this true? if not, why?
> > >
> > > 3) Is it a problem if I use a simple setup of Solr using a single core
> > for
> > > this use case? if not, what do you recommend?
> > >
> > >
> > >
> > > Any help in any of these topics would be greatly appreciated.
> > >
> > > Thanks in advance!
> > >
> >
  

Highlight with multi word synonyms

2011-11-22 Thread Brian Gerby
I'm trying to use multi-word synonyms. For example in my synonyms file I have 
nhl, national hockey league. If I do this index only, a search for nhl returns 
a correct match, but highlights the first word only, national. Ideally, it 
would highlight national hockey league or not highlight at all. If I do the 
synonyms at both index and query time, it finds the match and does the correct 
highlighting, but I understand it is not ideal to do synonyms at index and 
query time. I am expanding synonyms and using edismax. Thoughts?


RE: import Data via PHP

2011-07-14 Thread Brian Gerby

Joerg - 
In your PHP code, you can create a snippet of XML with your fields and send 
that to your solr server via a form post. Example of XML Schema here 
http://wiki.apache.org/solr/UpdateXmlMessages. 



> Date: Thu, 14 Jul 2011 20:48:20 +0200
> Subject: import Data via PHP
> From: joerg.ag...@googlemail.com
> To: solr-user@lucene.apache.org
> 
> hallo users...
> 
> i have a Problem..
> 
> I have ti indexin Data via PHP.. the Information for data existing... and
> the Fiels too..
> 
>  $id = $_POST['id'];
> $name = $_POST['name'];
> $url = $_POST['url'];
> $color = $_POST['color'];
> $size = $_POST['size'];
> ect...
> ?>
> 
> "fields id, name, url, color, size" are existing in schema, and it work..
> 
> 
> But how can i send it to solr, i try it with curl.. but it dosent work
> 
> curl "
> http://myserver/solr/update/extract?commit=true&literal.id=$id&literal.name=$name&literal.url=$url&literal.size=$size&stream.file=
> "