Re: Plans for a new Solr Python library

2008-03-24 Thread Christian Vogler
On Monday 24 March 2008 01:01:59 Leonardo Santagada wrote:
> I have done some modifications on the solr python client[1], and
> though we kept the same license and my work could be put back in solr
> I think if there are more people interested we could improve the
> module a lot.

Have you taken a look at SOLR-216 on the issue tracker? I've been using this 
version in production, and it is quite nice.

Maybe it is possible to take the best from both versions?

Best  regards
- Christian


Re: Plans for a new Solr Python library

2008-03-24 Thread Leonardo Santagada


On 24/03/2008, at 04:39, Christian Vogler wrote:

On Monday 24 March 2008 01:01:59 Leonardo Santagada wrote:

I have done some modifications on the solr python client[1], and
though we kept the same license and my work could be put back in solr
I think if there are more people interested we could improve the
module a lot.


Have you taken a look at SOLR-216 on the issue tracker? I've been  
using this

version in production, and it is quite nice.

Maybe it is possible to take the best from both versions?

Best  regards
- Christian



Thanks, I think most of the stuff that I wanted to do is there... I  
will take a closer look and if there is omething missing I will add to  
that. Why is this on the issue tracker and was not commited to the svn?


--
Leonardo Santagada






Re: Plans for a new Solr Python library

2008-03-24 Thread Ed Summers
On Mon, Mar 24, 2008 at 6:32 AM, Leonardo Santagada
<[EMAIL PROTECTED]> wrote:
>  Thanks, I think most of the stuff that I wanted to do is there... I
>  will take a closer look and if there is omething missing I will add to
>  that. Why is this on the issue tracker and was not commited to the svn?

Back in September of last year I started with SOLR-216 and ran into a
few problems (one of which is documented in  that ticket history).
Personally, I'd like to see a little project on google-code, and a
record up at the CheeseShop so the code could be easily installed and
deployed.

I'd be willing to help out on it too, as I'm embarking on another Solr
project where python integration is key.

//Ed


Re: Help Requested

2008-03-24 Thread Norberto Meijome
On Thu, 20 Mar 2008 09:07:08 -0700 (PDT)
Raghav Kapoor <[EMAIL PROTECTED]> wrote:

[...]

> > Any particular reason why need the server in this
> > situation? pretty much
> > everything you are doing can be done locally.
> > Except, probably, cross linking
> > between client's documents. I have no idea in what
> > kind of environment this app
> > is supposed to run (home? office LAN? the interweb
> > :P ? ). 
> 
> So its going to be a client/server app where all the
> documents will be stored on the client and only
> metadata of those docs will be sent to the server.
> That way server does not have to store any real
> documents. Its an internet based application. Search
> on the server will read the metadata for keywords and
> send the request to all the clients that contain
> documents with that keyword. We cannot store
> everything on one client, all clients are different
> machines distributed all over the world.

I see.

> > you don't need a webserver for this, just generate a
> > page in from your
> > webserver with file:// links and all you need is to
> > render it locally. 
> 
> How will the client serve the documents stored locally
> through a standard mechanism (like port 80) to send
> documents to the server when the server requests the
> documents ? The client will not open any special ports
> for the server, so we need the web server, I guess ? 

I see. Sure, webserver will work for thisbut tcp/80 will also have to be
opened @ the client's firewalls. IOW, whichever way you chose for the server to
connect each of the clients needs to work across a global namespace and across
firewalls... 

contact me privately if you want to discuss more on this ... it hardly relates
to Solr per-se. 

[.]
B
_
{Beto|Norberto|Numard} Meijome

Percusive Maintenance - The art of tuning or repairing equipment by hitting it.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Plans for a new Solr Python library

2008-03-24 Thread Leonardo Santagada

On 24/03/2008, at 09:11, Ed Summers wrote:

On Mon, Mar 24, 2008 at 6:32 AM, Leonardo Santagada
<[EMAIL PROTECTED]> wrote:

Thanks, I think most of the stuff that I wanted to do is there... I
will take a closer look and if there is omething missing I will add  
to
that. Why is this on the issue tracker and was not commited to the  
svn?


Back in September of last year I started with SOLR-216 and ran into a
few problems (one of which is documented in  that ticket history).
Personally, I'd like to see a little project on google-code, and a
record up at the CheeseShop so the code could be easily installed and
deployed.

I'd be willing to help out on it too, as I'm embarking on another Solr
project where python integration is key.



I can create the project but I think Jason Carter should be onboard  
with this or we would end up forking the code one more time. If he  
doesn't respond this week I will start the moving, and we should make  
more unittests for this code, specially for unicode problems (this  
being the problem you had and the most complicated part of python, at  
least until py3k).



--
Leonardo Santagada






Re: off list: Plans for a new Solr Python library

2008-03-24 Thread Leonardo Santagada


On 24/03/2008, at 10:15, Christian Vogler wrote:
Ok, so, let's get started. I made a few modifications to SOLR-216  
that fix
some unicode and timezone conversion issues, and I can upload them  
wherever

we want to host the project.

There is at an outstanding XML unicode bug that was discussed on the  
list

about a month ago, so we could take a stab at fixing it, as well.

Best regards
- Christian

On Monday 24 March 2008 14:11:44 Ed Summers wrote:

On Mon, Mar 24, 2008 at 6:32 AM, Leonardo Santagada

<[EMAIL PROTECTED]> wrote:

Thanks, I think most of the stuff that I wanted to do is there... I
will take a closer look and if there is omething missing I will  
add to
that. Why is this on the issue tracker and was not commited to the  
svn?


Back in September of last year I started with SOLR-216 and ran into a
few problems (one of which is documented in  that ticket history).
Personally, I'd like to see a little project on google-code, and a
record up at the CheeseShop so the code could be easily installed and
deployed.

I'd be willing to help out on it too, as I'm embarking on another  
Solr

project where python integration is key.

//Ed




I created the project on google code, the url is http://code.google.com/p/solrpy/ 
 please send me your google account if you want to join the project.  
I already added Vogler as another adimin to the project.


Christian, please commit your modified version inside the trunk  
directory... or better yet, commit the issue 216 version then your  
modified version so we keep history of the changes. Later we should  
discuss writing the necessary distutils files and maybe make a test  
directory and choosing a testing framework (I would vote to use  
py.test or nose and not unittest but I am ok with anything you guys  
want). When everyone joins we should make a mailing list to discuss  
this things or meet on IRC.


--
Leonardo Santagada






Re: Plans for a new Solr Python library

2008-03-24 Thread Yonik Seeley
On Mon, Mar 24, 2008 at 6:32 AM, Leonardo Santagada <[EMAIL PROTECTED]> wrote:
>  Thanks, I think most of the stuff that I wanted to do is there... I
>  will take a closer look and if there is omething missing I will add to
>  that. Why is this on the issue tracker and was not commited to the svn?

AFAIK, no one has fixed the outstanding bugs and indicated it was
ready to be committed.

-Yonik


Re: Plans for a new Solr Python library

2008-03-24 Thread Ed Summers
On Mon, Mar 24, 2008 at 12:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>  AFAIK, no one has fixed the outstanding bugs and indicated it was
>  ready to be committed.

What is the preferred approach to fixing bugs in patches? Attaching new patches?

//Ed


Re: Plans for a new Solr Python library

2008-03-24 Thread Yonik Seeley
On Mon, Mar 24, 2008 at 12:27 PM, Ed Summers <[EMAIL PROTECTED]> wrote:
> On Mon, Mar 24, 2008 at 12:13 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>  >  AFAIK, no one has fixed the outstanding bugs and indicated it was
>  >  ready to be committed.
>
>  What is the preferred approach to fixing bugs in patches? Attaching new 
> patches?

Yes, just attach a new version of the patch and describe what you changed.

-Yonik


missing content stream - simple tab file

2008-03-24 Thread tim robertson
Hi all,
I am a newbie with SOLR, trying to index a very simple tab delimitted file
(using a nightly build from a couple days ago).
Any help would be greatly appreciated!

My test tab file has only 3 lines:

Passer domesticus 1787248
Passer domesticus (Linnaeus, 1758) 694
Passer domesticus (Linnaeus,1758) 8

My schema:


  


 
 
   
   
 
 name
 


And I am uploading using this command:
curl
http://localhost:8983/solr/update/csv?fieldnames=name,count&separator=%09&escape=\&header=false--data-binary
@test -H 'Content-type:text/plain; charset=utf-8'

It gives a missing content stream error with the stack trace at the bottom
of this email.

Any help greatly appreciated!!!

Thanks

Tim


Mar 24, 2008 8:35:56 PM org.apache.solr.core.SolrCore execute
INFO: /update/csv fieldnames=id,name 0 3
Mar 24, 2008 8:36:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.CSVRequestHandler.handleRequestBody(
CSVRequestHandler.java:49)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:118)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest
(RequestHandlers.java:228)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:948)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:326)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:280)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(
ServletHandler.java:365)
at org.mortbay.jetty.security.SecurityHandler.handle(
SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(
SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(
ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java
:405)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(
ContextHandlerCollection.java:211)
at org.mortbay.jetty.handler.HandlerCollection.handle(
HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(
HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(
HttpConnection.java:502)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(
HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(
SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(
BoundedThreadPool.java:442)

Mar 24, 2008 8:36:16 PM org.apache.solr.core.SolrCore execute
INFO: /update/csv fieldnames=name,count 0 2


missing content stream - simple tab file

2008-03-24 Thread tim robertson
Hi all,
I am a newbie with SOLR, trying to index a very simple tab delimitted file
(using a nightly build from a couple days ago).
Any help would be greatly appreciated!

My test tab file has only 3 lines:

Passer domesticus 1787248
Passer domesticus (Linnaeus, 1758) 694
Passer domesticus (Linnaeus,1758) 8

My schema:


  


 
 
   
   
 
 name
 


And I am uploading using this command:
curl
http://localhost:8983/solr/update/csv?fieldnames=name,count&separator=%09&escape=\&header=false
--data-binary
@test -H 'Content-type:text/plain; charset=utf-8'

It gives a missing content stream error with the stack trace at the bottom
of this email.

Any help greatly appreciated!!!

Thanks

Tim


Mar 24, 2008 8:35:56 PM org.apache.solr.core.SolrCore execute
INFO: /update/csv fieldnames=id,name 0 3
Mar 24, 2008 8:36:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.CSVRequestHandler.handleRequestBody(
CSVRequestHandler.java:49)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:118)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest
(RequestHandlers.java:228)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:948)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:326)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:280)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(
ServletHandler.java:365)
at org.mortbay.jetty.security.SecurityHandler.handle(
SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(
SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(
ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java
:405)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(
ContextHandlerCollection.java:211)
at org.mortbay.jetty.handler.HandlerCollection.handle(
HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(
HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(
HttpConnection.java:502)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(
HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(
SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(
BoundedThreadPool.java:442)

Mar 24, 2008 8:36:16 PM org.apache.solr.core.SolrCore execute
INFO: /update/csv fieldnames=name,count 0 2


Fwd: missing content stream - simple tab file

2008-03-24 Thread tim robertson
Ah, for some reason I am not receiving SOLR-user messages even though I am
subscribed.
If anyone has any ideas, can you please copy me in on the reply?

Thanks

-- Forwarded message --
From: tim robertson <[EMAIL PROTECTED]>
Date: Mon, Mar 24, 2008 at 8:53 PM
Subject: missing content stream - simple tab file
To: solr-user@lucene.apache.org


Hi all,
I am a newbie with SOLR, trying to index a very simple tab delimitted file
(using a nightly build from a couple days ago).
Any help would be greatly appreciated!

My test tab file has only 3 lines:

Passer domesticus 1787248
Passer domesticus (Linnaeus, 1758) 694
Passer domesticus (Linnaeus,1758) 8

My schema:


  


 
 
   
   
 
 name
 


And I am uploading using this command:
curl
http://localhost:8983/solr/update/csv?fieldnames=name,count&separator=%09&escape=\&header=false
--data-binary
@test -H 'Content-type:text/plain; charset=utf-8'

It gives a missing content stream error with the stack trace at the bottom
of this email.

Any help greatly appreciated!!!

Thanks

Tim


Mar 24, 2008 8:35:56 PM org.apache.solr.core.SolrCore execute
INFO: /update/csv fieldnames=id,name 0 3
Mar 24, 2008 8:36:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
at org.apache.solr.handler.CSVRequestHandler.handleRequestBody(
CSVRequestHandler.java:49)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:118)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest
(RequestHandlers.java:228)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:948)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:326)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:280)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(
ServletHandler.java:365)
at org.mortbay.jetty.security.SecurityHandler.handle(
SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(
SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(
ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java
:405)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(
ContextHandlerCollection.java:211)
at org.mortbay.jetty.handler.HandlerCollection.handle(
HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(
HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(
HttpConnection.java:502)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(
HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(
SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(
BoundedThreadPool.java:442)

Mar 24, 2008 8:36:16 PM org.apache.solr.core.SolrCore execute
INFO: /update/csv fieldnames=name,count 0 2


CJKTokenizer in Solr 1.3?

2008-03-24 Thread Vinci

Hi,

I would like to ask, does any support of CJKTokenizer
(org.apache.lucene.analysis.cjk.CJKTokenizer) available for Solr 1.3 now? 
If it is supported, which nightly build I can try and how can I turn it on?
(I have nightly build up to 2008 Mar 8 on hand)
If it is not supported, how can I use plugin to turn on this feature in 1.3
nightly build?

Thank you,
Vinci

-- 
View this message in context: 
http://www.nabble.com/CJKTokenizer-in-Solr-1.3--tp16260321p16260321.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: missing content stream - simple tab file

2008-03-24 Thread Chris Hostetter

Tim: double check that solr-user mail isn't showing up in your spam 
folder, you may need to whitelist it since it identifies itself as bulk 
mail.

: And I am uploading using this command:
: curl
: 
http://localhost:8983/solr/update/csv?fieldnames=name,count&separator=%09&escape=\&header=false--data-binary
: @test -H 'Content-type:text/plain; charset=utf-8'

It looks like you aren't Quoting hte URL so that your shell knows it's a 
single string .. the "&" characters are getting treated special .. you can 
tell because the URL with params that Solr says is getting hit ends with 
"...,count" ...

: Mar 24, 2008 8:36:16 PM org.apache.solr.core.SolrCore execute
: INFO: /update/csv fieldnames=name,count 0 2

everything after that "&" is probably getting interpreted by your shell as 
additional commands (don't you see any errors in your terminal where you 
run this command?)

Also: it looks like you are missing a space between "false' and 
"--data-binary"


-Hoss



Re: missing content stream - simple tab file

2008-03-24 Thread tim robertson
Thanks,
You are correct...  ' ' around the URL solved it - schoolboy error

thanks

Tim


On Mon, Mar 24, 2008 at 9:48 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> Tim: double check that solr-user mail isn't showing up in your spam
> folder, you may need to whitelist it since it identifies itself as bulk
> mail.
>
> : And I am uploading using this command:
> : curl
> :
> http://localhost:8983/solr/update/csv?fieldnames=name,count&separator=%09&escape=\&header=false--data-binary
> : @test -H 'Content-type:text/plain; charset=utf-8'
>
> It looks like you aren't Quoting hte URL so that your shell knows it's a
> single string .. the "&" characters are getting treated special .. you can
> tell because the URL with params that Solr says is getting hit ends with
> "...,count" ...
>
> : Mar 24, 2008 8:36:16 PM org.apache.solr.core.SolrCore execute
> : INFO: /update/csv fieldnames=name,count 0 2
>
> everything after that "&" is probably getting interpreted by your shell as
> additional commands (don't you see any errors in your terminal where you
> run this command?)
>
> Also: it looks like you are missing a space between "false' and
> "--data-binary"
>
>
> -Hoss
>
>


Re: Converting lucene index into solr usable xml

2008-03-24 Thread Chris Hostetter

: How can we convert the lucene index file into format
: that solr can understand. I have very little knowledge

This seems to be the exact same question...

http://www.nabble.com/Reusing-lucene-index-file-in-Solr-to16215877.html



-Hoss



What are the limits? Billions of records anyone?

2008-03-24 Thread tim robertson
Hi all,
I have just got a SOLR index working for the first time on a few 100,000
records from a custom database dump, and the results are very impressive,
both in the speed it indexes (even on my macbook) and the response times.

If I want to index "what, where(grid based to 0.1 degree cells), when, who"
type information (lets say a schema of 10 strings, 2 dates, 4 ints) what are
the limitations going to be?

Is there any documentation on whether indexes can be partitioned easily, so
scaling is somewhat linear?

My reasoning to look for this is our current searchable "index" is on a
mysql database with 2 main fact tables of 150,000,000 records and 15,000,000
records which are normally joined for most queries.  We are looking to
increase to 10x that size so I am looking at Billions of records...

How likely will this scale on SOLR?
What's the biggest number of items people have indexed?
How complicated do the queries have to get before things get slow? This is
the kind of thing I am looking for:
(name:"Passer domesticus*" AND cell:[36543 TO 43324] AND mod360Cell[45 TO
65] AND year:[1950 TO *])
- if you care, this is a search for "The bird of type Sparrows in a geo
bounding box and collected/observed after 1950"...

I'm going to be trying anyway, but any pointers appreciated (Hadoop
perhaps?)

Thanks,

Tim
PS - This is an open source open access project to create an index of
biodiversity data (http://data.gbif.org) so your help is going towards a
worthwhile cause!


Re: Survey: How do you store your fields?

2008-03-24 Thread Chris Hostetter

: I'm curious: do you store everything in a database and just use Solr
: for indexing/searching, or do you store everything in Solr so that
: your search results come back with context?  Or something in between?
: (I know if you want highlighting you have to store those fields.)

There are really two ways of interpreting your question: one is about the 
"authoritativeness" of the data, the other is about the "access" of the 
data.

You'd be hard pressed to find a bigger Solr advocate them myself, but I've 
never encountered a situation where i thought it made sense for Solr to be 
the authoritative copy of a dataset -- as in: Solr is the primary copy of 
the data, you might generate other copies in other formats from Solr (ie: 
HTML pages, RSS feeds, plain text reports, xml dumps, etc..) but Solr 
is the "master" datastore of record and when you change it every other 
copy is eventually changed.  Solr isn't as well suited to be an 
authoritative datastore as something like a relational database because 
"updating" isn't as easy (and "column" based upddates are really hard).  I 
almost always treat a relational DB as the primary copy o the data, and 
build my SOlr indexes from them (rebuilding from scratch periodically in 
some cases, or updating "in real time" as updates are made to the DB in 
others)

the second form of your question thought, the question of "access" ... i 
encourage people to put whatever data they need in Solr to power whatever 
application you want to power.  if you want to search on 10 fields and 
display 5 on the listing pages and 200 on the "item" pages you really need 
to store the 10 fields, and i would sugest you store all 200 and power 
your item pages using Solr as well ... that way you can be certain your 
data is consistent across all views.  it may not be 100% up to date with 
your authoritative data store in the case of indexing/updating lag, but 
you don't have to worry about a record which has changed matching on one 
term that isn't there when you go show the full item because the fields 
you fetched from your database have changed since the last time you 
updated/commited on your Solr index. 


-Hoss



Re: Plans for a new Solr Python library

2008-03-24 Thread Leonardo Santagada


On 24/03/2008, at 15:34, Yonik Seeley wrote:

On Mon, Mar 24, 2008 at 12:27 PM, Ed Summers <[EMAIL PROTECTED]> wrote:
On Mon, Mar 24, 2008 at 12:13 PM, Yonik Seeley <[EMAIL PROTECTED]>  
wrote:

AFAIK, no one has fixed the outstanding bugs and indicated it was
ready to be committed.


What is the preferred approach to fixing bugs in patches? Attaching  
new patches?


Yes, just attach a new version of the patch and describe what you  
changed.


-Yonik



We started the project on google code to do an independent module...  
that is so we can have version control and everything. If anyone more  
wants to join just send me their google account.


I at least intend to commit back the library or at least maintain the  
same license so our lib can become the official one in the future...  
but yep lets fix the bugs first and think about integration later.


--
Leonardo Santagada






Re: What are the limits? Billions of records anyone?

2008-03-24 Thread Yonik Seeley
On Mon, Mar 24, 2008 at 5:30 PM, tim robertson
<[EMAIL PROTECTED]> wrote:
>  Is there any documentation on whether indexes can be partitioned easily, so
>  scaling is somewhat linear?

http://wiki.apache.org/solr/DistributedSearch

It's very new, so you would need a recent nightly build.
If you try it, let us know how it works (or what issues you run into).

>  (name:"Passer domesticus*" AND cell:[36543 TO 43324] AND mod360Cell[45 TO
>  65] AND year:[1950 TO *])

Range queries can be slow if the number of terms in the range is large.
If the range query is common, it can be pulled out into a separate
filter query (fq param) and cached separately.  If it's rather unique
(different endpoint values each time), then there is currently no
quick fix.  But due to some basic work being done in Lucene, I predict
some relief not too far in the future.

-Yonik


Re: What are the limits? Billions of records anyone?

2008-03-24 Thread Norberto Meijome
On Mon, 24 Mar 2008 22:30:08 +0100
"tim robertson" <[EMAIL PROTECTED]> wrote:

> How likely will this scale on SOLR?
> What's the biggest number of items people have indexed?
> How complicated do the queries have to get before things get slow? This is
> the kind of thing I am looking for:
> (name:"Passer domesticus*" AND cell:[36543 TO 43324] AND mod360Cell[45 TO
> 65] AND year:[1950 TO *])
> - if you care, this is a search for "The bird of type Sparrows in a geo
> bounding box and collected/observed after 1950"...

hi Tim,
probably OT and without much real world experience in this area,  ...but have
you looked into using domain specific data objects, like PostGIS (GIS
extensions for Postgresql), or even custom data types in pgsql ? I imagine
converting a geographical 'data set' into a plain format wouldn't be as
effective as using something developed specifically for that purpose. 

B
_
{Beto|Norberto|Numard} Meijome

"The freethinking of one age is the common sense of the next."
   Matthew Arnold

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Nightly build archives don't contain solrJ source?

2008-03-24 Thread Chris Hostetter
: 
: I download the zip for nightly Solr Source files for 03/18/2008. This doesnt
: seem to contain
: the corresponding client folder with the SolrJ files. Is there a way to get

you are correct ... at teh moment none of the clients have their sources 
included in the builds ... i'll ope na bug so we make sure to fix that 
before 1.3

: files as of that date? I have downloaded the solrj source from trunk, but am
: not sure if this will work 
: with a specific nightly build for solr.

typically, the safest thing to do is to just check out all of SOlr from 
subversion and build the whole thing directly (particularly if you are 
already familiar with doing Java development, which i'm guessing you are 
if you are using SolrJ)


-Hoss



Query Time Boosting

2008-03-24 Thread Amitha Talasila

Hi All,
We have a requirement for our Project, there is a date field called
start date in the schema. When a query is done Products whose start date is
within 10 days from today's date need to be boosted by 100 points, and start
date within 30 days should be boosted by 80 points and so. Is there any way
to get this at query time?

Thanks in Advance


Amitha


DISCLAIMER:
This message (including attachment if any) is confidential and may be 
privileged. If you have received this message by mistake please notify the 
sender by return e-mail and delete this message from your system. Any 
unauthorized use or dissemination of this message in whole or in part is 
strictly prohibited.
E-mail may contain viruses. Before opening attachments please check them for 
viruses and defects. While MindTree Consulting Limited (MindTree) has put in 
place checks to minimize the risks, MindTree will not be responsible for any 
viruses or defects or any forwarded attachments emanating either from within 
MindTree or outside.
Please note that e-mails are susceptible to change and MindTree shall not be 
liable for any improper, untimely or incomplete transmission.
MindTree reserves the right to monitor and review the content of all messages 
sent to or from MindTree e-mail address. Messages sent to or from this e-mail 
address may be stored on the MindTree e-mail system or else where.


Re: introduction and help!

2008-03-24 Thread Vinci

Hi David,

Start Solr from jetty first, then check the following:
1. your war file is placed outside of the normal webapps
2. your content file, solr.xml is written properly. In the case you are not
sure, use absolute path
3. copy and paste the trace after SEVERE, especially the last line. If you
see something like XPathFactory, check the lib of your webapps folder of
solr to see if there is some library missing. This post may help:
http://www.nabble.com/-Update--Solr-can-be-started-from-jetty-but-not-tomcat-td15998642.html#a16023734

Vinci

David Arpad Geller wrote:
> 
> I normally wouldn't just signup to a list and post immediately but...
> 
> I hope there are some Tomcat experts here.
> I'm trying to setup solr and tomcat.  I get the following:
> 
> INFO: HTMLManager: start: Starting web application at '/solr'
> Mar 19, 2008 12:57:26 AM org.apache.solr.servlet.SolrDispatchFilter init
> INFO: SolrDispatchFilter.init()
> Mar 19, 2008 12:57:26 AM org.apache.solr.core.Config getInstanceDir
> INFO: No /solr/home in JNDI
> Mar 19, 2008 12:57:26 AM org.apache.solr.core.Config getInstanceDir
> INFO: Solr home defaulted to 'null' (could not find system property or
> JNDI)
> Mar 19, 2008 12:57:26 AM org.apache.solr.core.Config setInstanceDir
> INFO: Solr home set to 'solr/'
> Mar 19, 2008 12:57:26 AM org.apache.catalina.core.StandardContext 
> filterStart
> SEVERE: Exception starting filter SolrRequestFilter
> 
> 
> I get that I should be specifying my solr home somewhere and some have 
> suggested that it should be specified in 
> $CATALINA_HOME/conf/Catalina/localhost  Others have mentioned specifying 
>  in a solr.xml file placed in the $CATALINA_HOME/conf 
> directory.  Others have said that solr.war is a special webapp that 
> should not be placed in the standard Tomcat webapps directory and 
> JAVA_OPTS should have solr.solr.home set.  Huh?  Where? I tried setting 
> it in the ENV before starting Tomcat to no avail. Sigh.
> 
> Help!
> 
> I'm running Tomcat 6.0.16 and SOLR 1.2.0
> 
> 1. What the heck is a "conf/Catalina/localhost?"  Is it a directory?  A 
> file?  I have neither a Catalina or a localhost directory there.
> 2. Tomcat does absolutely nothing unless I put solr.war in its webapps 
> directory.  Then I can get it to fail on solr, at least.
> 3. I tried putting the following into my server.xml file (no effect):
> 
> docBase="/usr/local/solr/dist/apache-solr-1.2.0.war"
>debug="0"
>crossContext="true" >
> 
> value="/usr/local/solr"
> type="java.lang.String"
> override="true" />
> 
> 
> 
> 4. I tried putting the same into my context.xml file but then it just 
> overrode the context for the manager and ruined that whole thing for me.
> 
> Honestly.  Any help would be *much* appreciated but shouldn't Tomcat be 
> the easiest way to run SOLR?
> 
> Thank you,
> David
> 
> 

-- 
View this message in context: 
http://www.nabble.com/introduction-and-help%21-tp16137526p16268727.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: What are the limits? Billions of records anyone?

2008-03-24 Thread Vinci

Hi,

100,000 is not a big number in IR world. Lucene actually played some trick
so people can do very big IR system with Lucene-based system. I have read
blog post that people have much larger search document baseThe only
concern you need to think of is to provide bigger heap size to JRE to avoid
OutOfMemory Exception.
*Hadoop is more focusing on the disturbuted crawler as far I know...

Hope it help,
Vinci


TimRobertson100 wrote:
> 
> Hi all,
> I have just got a SOLR index working for the first time on a few 100,000
> records from a custom database dump, and the results are very impressive,
> both in the speed it indexes (even on my macbook) and the response times.
> 
> If I want to index "what, where(grid based to 0.1 degree cells), when,
> who"
> type information (lets say a schema of 10 strings, 2 dates, 4 ints) what
> are
> the limitations going to be?
> 
> Is there any documentation on whether indexes can be partitioned easily,
> so
> scaling is somewhat linear?
> 
> My reasoning to look for this is our current searchable "index" is on a
> mysql database with 2 main fact tables of 150,000,000 records and
> 15,000,000
> records which are normally joined for most queries.  We are looking to
> increase to 10x that size so I am looking at Billions of records...
> 
> How likely will this scale on SOLR?
> What's the biggest number of items people have indexed?
> How complicated do the queries have to get before things get slow? This is
> the kind of thing I am looking for:
> (name:"Passer domesticus*" AND cell:[36543 TO 43324] AND mod360Cell[45 TO
> 65] AND year:[1950 TO *])
> - if you care, this is a search for "The bird of type Sparrows in a geo
> bounding box and collected/observed after 1950"...
> 
> I'm going to be trying anyway, but any pointers appreciated (Hadoop
> perhaps?)
> 
> Thanks,
> 
> Tim
> PS - This is an open source open access project to create an index of
> biodiversity data (http://data.gbif.org) so your help is going towards a
> worthwhile cause!
> 
> 

-- 
View this message in context: 
http://www.nabble.com/What-are-the-limits--Billions-of-records-anyone--tp16262032p16268808.html
Sent from the Solr - User mailing list archive at Nabble.com.



Adding custom field for sorting?

2008-03-24 Thread Vinci

Hi,

Inspirited by the previous post, does it possible to add my custom field and
use it for sorting the search result?
If It is possible, what will be the step? Do I need to modify the source
code?
 
Thank you,
Vinci
-- 
View this message in context: 
http://www.nabble.com/Adding-custom-field-for-sorting--tp16269118p16269118.html
Sent from the Solr - User mailing list archive at Nabble.com.



Beginner questions: Jetty and solr with utf-8 + cached page + dedup

2008-03-24 Thread Vinci

Hi all,

I am new to Solr and just make the Solr (3-8-nightly) run on the machine.
I want the System to be more portable so I want to use the jetty Solr in
example...before I tried to index the documents, I would like to ask some
question:
1. Do I need to pay special attention when I dealing with query in utf-8
non-English character?
2. Does the jetty version have something like "cached page" so I can view my
document retrieved?
3. If I want to use my Tokenizer that available in lucence sandbox and do my
own XSL transformation, which article I can reference to?
4. The dedup (de-duplication) is purely my own task?
Also, any introductory reference are welcome.

Thank you,
Vinci
-- 
View this message in context: 
http://www.nabble.com/Beginner-questions%3A-Jetty-and-solr-with-utf-8-%2B-cached-page-%2B-dedup-tp16269261p16269261.html
Sent from the Solr - User mailing list archive at Nabble.com.