from:"mark angelillo"

Solr 1.4 Replication index directories

2010-01-28 Thread mark angelillo


Hi,

We're using the new replication and it's working pretty well. There's  
one detail I'd like to get some more information about.


As the replication works, it creates versions of the index in the data  
directory. Originally we had index/, but now there are dated versions  
such as index.20100127044500/, which are the replicated versions.


Each copy is sized in the vicinity of 65G. With our current hard drive  
it's fine to have two around, but 3 gets a little dicey. Sometimes  
we're finding that the replication doesn't always clean up after  
itself. I would like to understand this better, or to not have this  
happen. It could be a configuration issue.


Some more specific questions:

- Is it safe to remove the index/ directory (that doesn't have the  
date on it)? I think I tried this once and the whole thing broke,  
however maybe something else was wrong at the time.


- Is there a way to know which one is the current one? (I'm looking at  
the file index.properties, and it seems to be correct, but sometimes  
there's a newer version in the directory, which later is removed)


- Could it be that the index does not finish replicating in the poll  
interval I give it? What happens if, say there's a poll interval X and  
replicating the index happens to take longer than X sometimes. (Our  
current poll interval is 45 minutes, and every time I'm watching it it  
completes in time.)


Thanks in advance
Mark

Re: Solr 1.4 Replication index directories

2010-01-28 Thread mark angelillo


Thanks, Otis. Responses inline.



Hi,

We're using the new replication and it's working pretty well.  
There's one detail

I'd like to get some more information about.

As the replication works, it creates versions of the index in the  
data
directory. Originally we had index/, but now there are dated  
versions such as

index.20100127044500/, which are the replicated versions.

Each copy is sized in the vicinity of 65G. With our current hard  
drive it's fine
to have two around, but 3 gets a little dicey. Sometimes we're  
finding that the
replication doesn't always clean up after itself. I would like to  
understand
this better, or to not have this happen. It could be a  
configuration issue.


Some more specific questions:

- Is it safe to remove the index/ directory (that doesn't have the  
date on it)?
I think I tried this once and the whole thing broke, however maybe  
something

else was wrong at the time.


No, that's the real, live index, you don't want to remove that one.



Yeah... I tried it once and remember things breaking.

However nothing in this directory has been modified for over a week  
(since the last replication initialization). And I'm still sitting on  
130GB of data for what is only 65GB on the master






- Is there a way to know which one is the current one? (I'm looking  
at the file
index.properties, and it seems to be correct, but sometimes there's  
a newer

version in the directory, which later is removed)


I think the "index" one is always current, no?  If not, I imagine  
the admin replication page will tell you, or even the Statistics page.

e.g.
reader :   
SolrIndexReader{this=46a55e,r=readonlysegmentrea...@46a55e,segments=1}
readerDir :  org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/ 
cores/foo/data/index



reader :  
SolrIndexReader 
{this=5c3aef1,r=readonlydirectoryrea...@5c3aef1,refCnt=1,segments=9}
readerDir : org.apache.lucene.store.NIOFSDirectory@/home/solr/solr_1.4/ 
solr/data/index.20100127044500







- Could it be that the index does not finish replicating in the  
poll interval I
give it? What happens if, say there's a poll interval X and  
replicating the
index happens to take longer than X sometimes. (Our current poll  
interval is 45

minutes, and every time I'm watching it it completes in time.)



I think only 1 replication will/should be happening at a time.


Whew, that's comforting.

Filter by Group

2007-09-19 Thread mark angelillo


Hey all,

Let's say I have an index of one hundred documents, and these  
documents are grouped into 4 groups A, B, C, and D. The groups do in  
fact overlap. What would people recommend as the best way to apply a  
search query and return only the documents that are in group A? Also,  
how about if we run the same search query but return only those  
documents in groups A, C and D?


I imagine that I could do this by indexing a text field populated  
with the group names and adding something like "groups:A" to the  
query but I'm wondering if there's a better solution.


Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.7 million ratings and counting...

Re: Filter by Group

2007-09-20 Thread mark angelillo


Thanks, Pieter. I'll go for that then.

Mark

On Sep 19, 2007, at 10:15 PM, Pieter Berkel wrote:


Sounds like you're on the right track, if your groups overap (i.e. a
document can be in group A and B), then you should ensure your  
"groups"

field is multivalued.

If you are searching for "foo" in documents contained in group "A",  
then it

might be more efficient to use a filter query (fq) like:

q=foo&fq=groups:A

See the wiki page on common query parameters for more info:
http://wiki.apache.org/solr/ 
CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002


cheers,
Piete



On 20/09/2007, mark angelillo <[EMAIL PROTECTED]> wrote:


Hey all,

Let's say I have an index of one hundred documents, and these
documents are grouped into 4 groups A, B, C, and D. The groups do in
fact overlap. What would people recommend as the best way to apply a
search query and return only the documents that are in group A? Also,
how about if we run the same search query but return only those
documents in groups A, C and D?

I imagine that I could do this by indexing a text field populated
with the group names and adding something like "groups:A" to the
query but I'm wondering if there's a better solution.

Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.7 million ratings and counting...





mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.7 million ratings and counting...

Forced Top Document

2007-10-23 Thread mark angelillo


Hi all,

Is there a way to get a specific document to appear on top of search  
results even if a sorting parameter would push it further down?


Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...

Re: Forced Top Document

2007-10-24 Thread mark angelillo


Charlie,

That's interesting. I did try something like this. Did you try your  
query with a sorting parameter?


What I've read suggests that all the results are returned based on  
the query specified, but then resorted as specified. Boosting (which  
modifies the document's score) should not change the order unless the  
results are sorted by score.


Mark

On Oct 24, 2007, at 1:05 PM, Charlie Jackson wrote:


Do you know which document you want at the top? If so, I believe you
could just add an "OR" clause to your query to boost that document  
very

high, such as

?q=foo OR id:bar^1000

Tried this on my installation and it did, indeed push the document
specified to the top.



-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 24, 2007 10:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Forced Top Document

I'd love to know this, as I just got a development request for this
very feature. I'd rather not spend time on it if it already exists.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
+----+


On Oct 23, 2007, at 10:12 PM, mark angelillo wrote:


Hi all,

Is there a way to get a specific document to appear on top of
search results even if a sorting parameter would push it further  
down?


Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...






mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...

Re: Forced Top Document

2007-10-24 Thread mark angelillo


That's the ticket exactly, Kyle.

What I have is the ID of my document, so I indexed a dynamic field  
with name id_*. Then I just set that field for each document with the  
proper ID.


So for example, to pop one document to the top of the index, i just run:

"&q=field: value; id_700390+desc, date+desc"

Works like a charm, even with multiple documents.

"&q=field: value; id_700390+desc, id_604030+desc, date+desc"

Mark

On Oct 24, 2007, at 4:15 PM, Kyle Banerjee wrote:

The typical use case, though, is for the featured document to be  
on top only
for certain queries.  Like in an intranet where someone queries  
401K or
retirement or similar, you want to feature a document about  
benefits that
would otherwise rank really low for that query.  I have not be  
able to make

sorting strategies work very well.


Depending on how many of these certain queries you have, it seems like
you could still use some variation of the strategy based on a bogus
tag sort. If you place a dynamic field for each query term (e.g.
foo_s, bar_s, etc) relevant to a document and then detect when one of
the special query terms is detected, you can still sort on the
appropriate dynamic field before applying the rest of the sort.

kyle


mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...

Re: Forced Top Document

2007-10-25 Thread mark angelillo

Thanks for your thoughts, Chris. I agree with you about the user's  
experience. Snooth doesn't serve any ads/sponsored results -- the  
goal here is to make sure that the most recent document the user has  
acted on shows up top in searches for recent activity. My aim is to  
forcibly preserve the sort order until the document can be reindexed/ 
updated.


Since the dynamic field is too memory intensive, I'll try boosting on  
the date field -- and boosting more on the date field for the  
document that needs to be up top. If that doesn't end up working I'll  
just perform two queries and be done with it.


Mark

On Oct 25, 2007, at 3:11 AM, Chris Hostetter wrote:



: The typical use case, though, is for the featured document to be  
on top only
: for certain queries.  Like in an intranet where someone queries  
401K or
: retirement or similar, you want to feature a document about  
benefits that
: would otherwise rank really low for that query.  I have not be  
able to make

: sorting strategies work very well.

this type of question typically falls into two use cases:
  1) "targeted ads"
  2) "sponsored results"

in the targeted ads case, the "special" matches aren't part of the  
normal
flow of results, and don't fit into pagination -- they always  
appera at
the top, or to the right, on every page, no matter what the  
sort  this
kind of usage doesn't really need any special logic, it can be  
solved as

easily by a second Solr hit as it can by custom request handler logic.

in the "sponsored results" use case, the "special" matches should  
appear
in the normal flow of results as the #1 (2, 3, etc) matches, so  
that they

don't appear on page#2 ... but that also means that it's extremely
disconcerting for users if those matches are still at the top when the
userse resort.  if a user is looking at product listings, sorted by
"relevancy" and the top 3 results all say they are "sponsered"  
that's fine
... but if the user sort by "price" and those 3 results are still  
at teh
top of the list, even though they clearly aren't the chepest,  
that's just

going to piss the user off.

in my profesional opinion: don't fuck with your users.  default to
whatever order you want, but if the user specificly requests to  
sort the

results by some option, do it.

assuming you follow my professional opinion, then "boosting" docs  
to have

an artifically high score will work fine.

if you absolutely *MUST* have certain docs "sorting" before others,
regardless of which sort option the user picks, then it is still  
possible
do ... i'm hesitant to even say how, but if people insist on  
knowing...




allways sort by score first, then by whatever field the user wants  
to sort
by ... but when the user wants to sort on a specific field, move  
the users
main query input into an "fq" (so it doesn't influence the  
score) ... and
use an extremely low boost matchalldocs query along with your  
"special doc

matching query" as the main (scoring) query param.  the key being that
even though your primary sort is on score, every doc except your  
special

matches have identical scores.

(this may not be possible with dismax because it's not trivial to move
the query into an fq, it might work if you can use "0" as the boost on
fields in the qf so it still dictates the matches but doesn't  
influence

the score enough to throw off the sort)





-Hoss



mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.8 million ratings and counting...

dynamicField Scaling

2007-03-07 Thread mark angelillo


Hello,

I've got a Solr index running and I want to use a dynamicField to  
store n different sorting fields. The field that is used to actually  
sort the results will be determined by the application that is  
querying the index.


I'm wondering if anyone has done something similar to this, or if  
anyone has an idea of how Solr will perform as the number n of  
sorting fields grows larger. Is there a way to make sure this doesn't  
start to slow the index down? Is there any information out there  
about the number of dynamicFields that can be declared in this way  
before the entire index suffers? Is there such a limit?


(I'm assuming the number of documents in the index will eventually be  
around 500k -- perhaps more in the future.)


TIA,
Mark Angelillo

Re: dynamicField Scaling

2007-03-07 Thread mark angelillo


On Mar 7, 2007, at 2:17 PM, Mike Klaas wrote:


On 3/7/07, mark angelillo <[EMAIL PROTECTED]> wrote:

Hello,

I've got a Solr index running and I want to use a dynamicField to
store n different sorting fields. The field that is used to actually
sort the results will be determined by the application that is
querying the index.

I'm wondering if anyone has done something similar to this, or if
anyone has an idea of how Solr will perform as the number n of
sorting fields grows larger. Is there a way to make sure this doesn't
start to slow the index down? Is there any information out there
about the number of dynamicFields that can be declared in this way
before the entire index suffers? Is there such a limit?


It's not realy about the number of dynamic fields.  The key variable
is the number of sort fields.  To sort efficiently, solr needs to
maintain a cache of field values.  This consumes memory per-field on
the order of

D x S + U

where D is the document count, S is the the size of the data type (eg.
4bytes for ints, 8 bytes for doubles, 4/8 bytes for anything else
[pointers]), and U is the cumulative size of the unique field values
(if sorting on a non-primitive type, like Strings).

If you have sufficient memory to store this data for each field you
are sorting on, you shouldn't have any problems.

best,
-Mike



Okay, makes sense.

Thanks,
Mark

Error loading custom similarity class

2007-04-09 Thread mark angelillo


Hiya,

I'm currently trying to compile and load my own similarity class in  
Solr, and I'm having a bit of a problem. Here's what I've done so far:


1) Create the .java for the class using SweetSpotSimilarity as a  
model. I'm using the code below to make sure I can get this working  
-- my real class will do something a bit different.


.
package org.apache.lucene.misc;

import org.apache.lucene.search.Similarity;
import org.apache.lucene.search.DefaultSimilarity;

public class CustomSimilarity extends DefaultSimilarity {

public CustomSimilarity() {
super();
}

public float lengthNorm(String fieldName, int numTerms) {
return (float)1.0;
}

public float tf(int freq) {
return (float)1.0;
}

}
.

2) Create the .jar file. (Maybe I'm doing this wrong?)

> javac classpath lucene-core-nightly.jar CustomSimilarity.java
> jar -cvf CustomSimilarity.jar CustomSimilarity.class

3) Put the .jar file in my solr home /lib directory. (/var/solr/lib  
for me)


4) Edit schema.xml with this line:

   

5) I'm using Jetty, and read that I may need to ensure the .jar is in  
the classpath, so I added this to start.config (I've tried with and  
without this):


# solr specific jars
/var/solr/lib/CustomSimilarity.jar   always

Then, when I fire up Jetty, I get the following error:

10:59:01.885 WARN!! [main] org.mortbay.jetty.Server.main(Server.java: 
465) >08> EXCEPTION
org.mortbay.util.MultiException[org.apache.solr.core.SolrException:  
Error loading class 'org.apache.lucene.misc.CustomSimilarity']

at org.mortbay.http.HttpServer.doStart(HttpServer.java:686)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.jetty.Server.main(Server.java:460)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
org.apache.solr.core.SolrException: Error loading class  
'org.apache.lucene.misc.CustomtSimilarity'

at org.apache.solr.core.Config.findClass(Config.java:208)
at org.apache.solr.core.Config.newInstance(Config.java:213)
at org.apache.solr.schema.IndexSchema.readConfig 
(IndexSchema.java:363)
at org.apache.solr.schema.IndexSchema. 
(IndexSchema.java:69)

at org.apache.solr.core.SolrCore.(SolrCore.java:191)
at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:172)
at org.apache.solr.servlet.SolrServlet.init(SolrServlet.java: 
72)

at javax.servlet.GenericServlet.init(GenericServlet.java:211)
at org.mortbay.jetty.servlet.ServletHolder.initServlet 
(ServletHolder.java:383)
at org.mortbay.jetty.servlet.ServletHolder.start 
(ServletHolder.java:243)
at  
org.mortbay.jetty.servlet.ServletHandler.initializeServlets 
(ServletHandler.java:446)
at  
org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets 
(WebApplicationHandler.java:321)
at org.mortbay.jetty.servlet.WebApplicationContext.doStart 
(WebApplicationContext.java:509)

at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
at org.mortbay.util.Container.start(Container.java:72)
at org.mortbay.jetty.Server.main(Server.java:460)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:151)
at org.mortbay.start.Main.start(Main.java:476)
at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.lang.ClassNotFoundException:  
org.apache.lucene.misc.CustomSimilarity

at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.net.FactoryURLClassLoader.loadClass 
(URLClassLoader.java:580)

at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java: 
319)

at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:242)
at org.apache.solr.core.Config.findClass(Config.java:192)
... 23 more
[0]=org.apache.solr.core.Solr

Re: Error loading custom similarity class

2007-04-09 Thread mark angelillo

Thanks, Yonik. I was definitely missing that.

On Apr 9, 2007, at 2:08 PM, Yonik Seeley wrote:

On 4/9/07, mark angelillo <[EMAIL PROTECTED]> wrote:

package org.apache.lucene.misc;

[...]

2) Create the .jar file. (Maybe I'm doing this wrong?)

 > javac classpath lucene-core-nightly.jar CustomSimilarity.java
 > jar -cvf CustomSimilarity.jar CustomSimilarity.class

This may be the problem.  The path in the jar file needs to reflect  
the package.
So the CustomSimilarity.class file needs to be in the org/apache/ 
lucene/misc/

directory.

-Yonik

Solr 1.4 Replication index directories

Re: Solr 1.4 Replication index directories

Filter by Group

Re: Filter by Group

Forced Top Document

Re: Forced Top Document

Re: Forced Top Document

Re: Forced Top Document

dynamicField Scaling

Re: dynamicField Scaling

Error loading custom similarity class

Re: Error loading custom similarity class

12 matches

Site Navigation

Mail list logo

Footer information