Ruby writer - arrays for facet data?

2007-05-15 Thread Nigel McNie
Hi,

I'm using solr to implement a faceted search interface. I found that I
needed the ability to search for *:* so I could get the entire facet
counts, so I upgraded solr from the latest stable to the latest nightly
release.

However, it seems that the ruby writer has changed, or something else
inside solr has changed, because now for the facet data I am getting an
array instead of a hash. Example:

[EMAIL PROTECTED]:~/sounz/solr_server$ curl -s
'http://localhost:8983/solr/select?indent=1&rows=0&q=thea*&facet=true&facet.field=year_group_facet&wt=ruby';
echo
{
 'responseHeader'=>{
  'status'=>0,
  'QTime'=>0,
  'params'=>{
'wt'=>'ruby',
'rows'=>'0',
'facet'=>'true',
'facet.field'=>'year_group_facet',
'indent'=>'1',
'q'=>'thea*'}},
 'response'=>{'numFound'=>5,'start'=>0,'docs'=>[]
 },
 'facet_counts'=>{
  'facet_queries'=>{},
  'facet_fields'=>{
'year_group_facet'=>[
 '1970-1979',2,
 '1990-1999',2,
 '1980-1989',1,
 '',0,
 '1950-1954',0,
 '1955-1959',0,
 '1960-1964',0,
 '1965-1969',0,
 '2000-2004',0,
 '2005-',0,
 'before-1950',0]}}}

See how the 'year_group_facet' data is in an array instead of a hash, as
it used to be in the stable version.

Is there a reason for this change? From my "hasn't used solr too much
before" perspective, the change does not seem to make much sense.
Perhaps it's a bug?

Thanks in advance anyone who can help :)

-- 
Regards,
Nigel McNie
Catalyst IT Ltd.
DDI: +64 4 803 2203



signature.asc
Description: OpenPGP digital signature


Re: delete for multiple documents at once

2007-05-15 Thread Maximilian Hütter
Yonik Seeley schrieb:
> On 5/11/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>> On 11-May-07, at 9:43 AM, Maximilian Hütter wrote:
>> > I'm trying to delete multiple documents at once, but it doesn't work.
>> >
>> > I am sending this:
>> >
>> > 
>> > 
>> > 1_3223_po_opc_2
>> > 1_2454_po_opc_4
>> > 
>> >
>>
>> > Isn't it possible to do deletes like that?
>>
>> No it isn't, but you can do multi deletes using delete by query:
> 
> Sounds like it should be added though...
> 
> -Yonik
> 

I would definitely want it, as it would make thing a lot easier for my app.

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Re: Documenting function queries [was Re: NumberFormat exception when trying to use recip function query]

2007-05-15 Thread Yonik Seeley

On 5/14/07, Mekin Maheshwari <[EMAIL PROTECTED]> wrote:

>   2) eliminate the space inside the recip functions

This solved it :)


I think that issue is probably due to the dismax handler splitting up
function queries on whitespace, and not the parsing of the individual
function queries.  We could probably handle that better by not
splitting inside parens.

-Yonik


Re: Ruby writer - arrays for facet data?

2007-05-15 Thread Yonik Seeley

On 5/15/07, Nigel McNie <[EMAIL PROTECTED]> wrote:

I'm using solr to implement a faceted search interface. I found that I
needed the ability to search for *:* so I could get the entire facet
counts, so I upgraded solr from the latest stable to the latest nightly
release.

However, it seems that the ruby writer has changed, or something else
inside solr has changed, because now for the facet data I am getting an
array instead of a hash.


It's documented for JSON (which Python and Ruby formats inherit from)
in both CHANGES.txt and on the Wiki:

http://wiki.apache.org/solr/SolJSON
http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt

'''
The JSON response format for facets has changed to make it easier for
clients to retain sorted order.  Use json.nl=map explicitly in clients
to get the old behavior, or add it as a default to the request handler
in solrconfig.xml
'''

-Yonik


system architecture question when using solr/lucene

2007-05-15 Thread Ajanta


We are currently looking at large numbers of queries/sec and would like to
optimize that as much as possible. The special need is that we would like to
show specific results based on a specific field - territory field and
depending on where in the world you're coming from we'd like to show you
specific results. The  index is very large (currently 2 million rows) and
could grow even larger (2-3 times) in the future. How do we accomplish this
given that we have some domain knowledge (the territory) to use to our
advantage? Is there a way we can hint solr/lucene to use this information to
provide better results? We could use filters on territory or we could use
different indexes for different territories (individually or in a
combination.)  Are there any other ways to do this? How do we figure out the
best case in this situation?


-- 
View this message in context: 
http://www.nabble.com/system-architecture-question-when-using-solr-lucene-tf3759225.html#a10625155
Sent from the Solr - User mailing list archive at Nabble.com.



update no work

2007-05-15 Thread Alessandro Ferrucci

I installed solr

solr-2007-05-10.zip


I ran example indexing and it indexes it fine.  Search also works fine.

Now when I go to delete the docs I do:
curl http://localhost:8983/solr/update --data-binary
'SOLR1000'

I get



Error 400 missing content stream


HTTP ERROR: 400missing content stream
RequestURI=/solr/update
http://jetty.mortbay.org";>Powered by
Jetty://

also when I go to

http://localhost:8983/solr/update in browser I get 400 error (this is
understandable since no post body is in request).

I'm using curl v 7.15.0

Thanks

Alessandro ferrucci
















Re: Question: Pagination with multi index box

2007-05-15 Thread Mike Klaas


On 14-May-07, at 10:05 PM, James liu wrote:


2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


I'm not ignoring it: I'm implying that the above is the correct
descending score-sorted order.  You have to perform that sort  
manually.



i mean merged results(from 60 p) and sort it, not solr's sort.
every result from box have been  sorted by score.


Yep, me too.




so it will not sorted by score correctly.
>
> and if user click page 2 to see, how to show data?
>
> p1 start from 10 or query other partitions?

Assemble results 1 through 20, then display 11-20 to the user.



for example, i wanna query "solr"

p1 have 100 results which score is bigger than 80

p2 have 100 results which score is smaller than 20

so if i use rows=10, score not correct.

if i wanna promise 10 pages which sort by score correctly.

so i have to get 100(rows=100) results from every box.

and merge results, sort it, finallay get top 100 results.

but it will very slow.


i don't know other search how to solve it? maybe they not sort by  
score very

correctly.


Hmm, I feel as though we are going in circles.

If you want to cache the top 100 documents for a query, there is  
essentially no efficient means of accumulating these results in one  
request--as you note, to be sure of having the top 100 documents, 100  
documents from each partition must be requested.


Your options are essentially:

1) request a smaller number of documents, and accept some  
inaccuracies (frinstance, if you request 10 docs, then the first page  
is guaranteed to be correct, but page 10 probably won't be quite right)


2) request a smaller number of documents and attempt to assemble the  
top 100 docs.  if you can't, then request more documents from the  
partitions that were exhausted soonest.


Keep in mind also that the scores across independent solr partitions  
are comparable, but not exact, due to idf differences.  The relative  
exactitude of page 10 results might not be too important.


-Mike


problem installing solr

2007-05-15 Thread Yosvanys Aponte Báez
This is the error:

 

type Informe de Excepción

mensaje 

descripción El servidor encontró un error interno () que hizo que no pudiera
rellenar este requerimiento.

excepción 

org.apache.jasper.JasperException
 
org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWra
pper.java:510)
 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
75)
 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

causa raíz 

javax.servlet.ServletException
 
org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextI
mpl.java:858)
 
org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImp
l.java:791)
org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:313)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
32)
 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

causa raíz 

java.lang.NoClassDefFoundError
org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:80)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
32)
 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

nota La traza completa de la causa de este error se encuentra en los
archivos de diario de Apache Tomcat/5.5.17.

 



Universidad 2008 

Del 11 al 15 de febrero del 2008
Palacio de Convenciones. La Habana. Cuba.

Sitio Web: http://www.universidad2008.cu

***
***

Agrociencias 2007

Del 5 al 8 de junio del 2007
Congreso Internacional de las Ciencias Agropecuarias. La Habana. Cuba.

Sitio Web: http://www.isch.edu.cu/agrociencias2007/news.php



What does the name attribute of lst element in highlighting result mean?

2007-05-15 Thread Teruhiko Kurosaka
I am trying to understand the highlighting output example, the last 
one in this page:
http://wiki.apache.org/solr/StandardRequestHandler

It the example is showing the top level element of a set of higlighted
results 
for a document is  .

What does this, SOLR1000, mean? Or rather, how does Solr
choose what to put as the value of the name attribute in the lst
element? I see this is a value of either the id field or the sku field,
but neither field looks any special among other fields in 
schema.xml.  Neither field looks like a unique field.
How can I associate each highlighting result
(each node in response/[EMAIL PROTECTED]'highlighting']) with
a doc node in response/result ?

-kuro


RE: problem installing solr

2007-05-15 Thread Teruhiko Kurosaka
I've had this a few weeks ago. 
You are probably starting Tomcat from somewhere other than the Solr
home.

See "Simple Example Install" section of
http://wiki.apache.org/solr/SolrTomcat

There, tomcat is started from the Solr home by:
./apache-tomcat-5.5.20/bin/startup.sh

If you do
cd apache-tomcat-5.5.20/bin
./startup.sh

you'll get this error. (I have no idea why the error is reported
as NoClassDefFoundError, which is not what it is.  Some 
improvement is needed in the error reporting, IMHO.)

If you want to start Tomcat from other location, follow the
instructions in the next subsection of this page.

Hope this helps.
-kuro


Re: update no work

2007-05-15 Thread Brian Whitman
Add -H 'Content-type:text/xml; charset=utf-8'  after the '' bit.



On May 15, 2007, at 12:55 PM, Alessandro Ferrucci wrote:


I installed solr

solr-2007-05-10.zip



I ran example indexing and it indexes it fine.  Search also works  
fine.


Now when I go to delete the docs I do:
curl http://localhost:8983/solr/update --data-binary
'SOLR1000'

I get



Error 400 missing content stream


HTTP ERROR: 400missing content stream
RequestURI=/solr/update
http://jetty.mortbay.org";>Powered by
Jetty://

also when I go to

http://localhost:8983/solr/update in browser I get 400 error (this is
understandable since no post body is in request).

I'm using curl v 7.15.0

Thanks

Alessandro ferrucci
















--
http://variogr.am/
[EMAIL PROTECTED]





Using Solr without using a web-app

2007-05-15 Thread bhecht

Hello all,

I am using Lucene and hibernate search. I downloaded the Solr sources and
found there some classes I want to use:

1) SynonymFilter
2) WordDelimiterFilter

I have no use in Solr except for this at the moment.
How can I use these classes without the need of using the configuration
files of Solr, or is it possible to use these classes and using the Solr
config files, without loading the whole Solr framework just for this?

Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Using-Solr-without-using-a-web-app-tf3760714.html#a10629880
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Solr without using a web-app

2007-05-15 Thread Yonik Seeley

On 5/15/07, bhecht <[EMAIL PROTECTED]> wrote:

I am using Lucene and hibernate search. I downloaded the Solr sources and
found there some classes I want to use:

1) SynonymFilter
2) WordDelimiterFilter

I have no use in Solr except for this at the moment.
How can I use these classes without the need of using the configuration
files of Solr, or is it possible to use these classes and using the Solr
config files, without loading the whole Solr framework just for this?


Use the filters without the filter factories... the filters themselves
have no Solr dependencies.
Look at the factories for how to create and configure the filters.

-Yonik


Re: Using Solr without using a web-app

2007-05-15 Thread bhecht

Thanks yonik. 
For some reason I thought it was much more complicated, but looking at the
sources again I see its going to be an easy task. 
Thanks for the help.
-- 
View this message in context: 
http://www.nabble.com/Using-Solr-without-using-a-web-app-tf3760714.html#a10630065
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Solr without using a web-app

2007-05-15 Thread bhecht

Hi yonik again,

When trying to do so, i discover that  the function parseRules in
SynonymFilterFactory is private, and I need to call this function when I
want to populate the SynonymMap like here:
parseRules(wlist, synMap, "=>", ",", ignoreCase,expand);

any suggestions?


bhecht wrote:
> 
> Thanks yonik. 
> For some reason I thought it was much more complicated, but looking at the
> sources again I see its going to be an easy task. 
> Thanks for the help.
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Solr-without-using-a-web-app-tf3760714.html#a10630317
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Solr without using a web-app

2007-05-15 Thread Yonik Seeley

On 5/15/07, bhecht <[EMAIL PROTECTED]> wrote:

When trying to do so, i discover that  the function parseRules in
SynonymFilterFactory is private, and I need to call this function when I
want to populate the SynonymMap like here:
parseRules(wlist, synMap, "=>", ",", ignoreCase,expand);

any suggestions?


If you start using Solr's configuration, you drag more of Solr in.

You can add the synonyms to the SynonymMap yourself, or if you want to
use a solr-style synonyms.txt, perhaps just extract the code that
implements that.

-Yonik


Re: Using Solr without using a web-app

2007-05-15 Thread bhecht

OK, that was what I was trying to avoid, but for this cool functionality I
will do so.
Just cut and paste the factory classes and change the init functions, not to
use the args[] parameter.

What kind of copyright (if any) do I need to include in these 2 classes
which I will cut and paste ?
I just never incorporated in my source, sources from open source frameworks
in such an obvious way.
Thanks


Yonik Seeley wrote:
> 
> On 5/15/07, bhecht <[EMAIL PROTECTED]> wrote:
>> When trying to do so, i discover that  the function parseRules in
>> SynonymFilterFactory is private, and I need to call this function when I
>> want to populate the SynonymMap like here:
>> parseRules(wlist, synMap, "=>", ",", ignoreCase,expand);
>>
>> any suggestions?
> 
> If you start using Solr's configuration, you drag more of Solr in.
> 
> You can add the synonyms to the SynonymMap yourself, or if you want to
> use a solr-style synonyms.txt, perhaps just extract the code that
> implements that.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Solr-without-using-a-web-app-tf3760714.html#a10630646
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Requests per second/minute monitor?

2007-05-15 Thread Chris Hostetter

: Why does SolrCore.setResponseHeaderValues(...) set the QTime (and other
: response header options) instead of having it as a function of

the main reason was to ensure that that data would *always* be there no
matter who wrote a request handler (or wether or not they subclassed the
RequestHandlerBase ... it's a backwards compatibility issue, we want to
ensure that data even if you have a custom request handler you've been
using since Solr 1.1 (or earlier)

: RequestHandlerBase?  If things were tracked in the RequestHandlers you
: could add timing information there: avg query time, etc.  I know some

you still can track those things, and return that data or log that data as
well ... although if you really wnated to know how long the *whole*
request took you would need to do it in teh ResponseWriter (or in the
core, after the response has been written)

: I'm happy to make the changes and supply a patch to move the logic as
: well as adding a few simple metrics unless enough people on this thread
: really feel that it's always better to do it with log files and
: postmortem math.

moving the logic would be bad .. adding new helper utilities to the
RequestHandler Base for handlers that want to do more sounds fine to me.

see also http://issues.apache.org/jira/browse/SOLR-176 which already adds
a lot of timing info to requests.


-Hoss



RE: Null pointer exception

2007-05-15 Thread Chris Hostetter

: I decided to trash the whole installation and start again. I downloaded
: last nights build and untarred it. Put the .war into
: $TOMCAT_HOME/webapps. Copied the example/solr directory as
: /var/www/html/solr. No JNDI file this time, just updated solrconfig to
: read /var/www/html/solr as my data.dir.

if you don't use JNDI or a system property to point solr at your
/var/www/html/solr directory, then it's never going to be able to find
your solrconfig.xml to know where your data directory is.

with a setup like that, besides the error you got in your browser trying
to load the admin page, you should have seen an exception in (one of) the
tomcat logs on startup, most likely a NoClassDefFoundError ... if you
dont' see an excpetion like that then you are *definiely* not seeing all
of your errors, maybe you've got ctomcat configured to not log them, or
maybe it's logging them someplace you arent' expecting, because based on
what you described you would definitely get one.




-Hoss



Re: What does the name attribute of lst element in highlighting result mean?

2007-05-15 Thread Chris Hostetter
: element? I see this is a value of either the id field or the sku field,
: but neither field looks any special among other fields in
: schema.xml.  Neither field looks like a unique field.

it is the "id" field, which is declared to be the uniqueKey field in the
 declaration...

[EMAIL PROTECTED] grep unique solr/conf/schema.xml
 

Re: Using Solr without using a web-app

2007-05-15 Thread Yonik Seeley

On 5/15/07, bhecht <[EMAIL PROTECTED]> wrote:

OK, that was what I was trying to avoid, but for this cool functionality I
will do so.
Just cut and paste the factory classes and change the init functions, not to
use the args[] parameter.

What kind of copyright (if any) do I need to include in these 2 classes
which I will cut and paste ?
I just never incorporated in my source, sources from open source frameworks
in such an obvious way.
Thanks


Just keep the top with the Apache blurb.
The LICENSE.txt actually spells it out.  If you are already using
Lucene, I don't think you should have to do anything extra to
incorporate Solr components that come from the ASF.

-Yonik


Re: Using Solr without using a web-app

2007-05-15 Thread bhecht

Great, thanks for the patience and help.

Good day.


Yonik Seeley wrote:
> 
> On 5/15/07, bhecht <[EMAIL PROTECTED]> wrote:
>> OK, that was what I was trying to avoid, but for this cool functionality
>> I
>> will do so.
>> Just cut and paste the factory classes and change the init functions, not
>> to
>> use the args[] parameter.
>>
>> What kind of copyright (if any) do I need to include in these 2 classes
>> which I will cut and paste ?
>> I just never incorporated in my source, sources from open source
>> frameworks
>> in such an obvious way.
>> Thanks
> 
> Just keep the top with the Apache blurb.
> The LICENSE.txt actually spells it out.  If you are already using
> Lucene, I don't think you should have to do anything extra to
> incorporate Solr components that come from the ASF.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Solr-without-using-a-web-app-tf3760714.html#a10630985
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Requests per second/minute monitor?

2007-05-15 Thread Mike Klaas


On 15-May-07, at 1:41 PM, Chris Hostetter wrote:

: I'm happy to make the changes and supply a patch to move the  
logic as
: well as adding a few simple metrics unless enough people on this  
thread

: really feel that it's always better to do it with log files and
: postmortem math.

moving the logic would be bad .. adding new helper utilities to the
RequestHandler Base for handlers that want to do more sounds fine  
to me.


see also http://issues.apache.org/jira/browse/SOLR-176 which  
already adds

a lot of timing info to requests.


Yeah, I held off on that as it seemed that timing/statistics might be  
handleable on a larger scale.  OTOH, it does give an easy way to  
requesthandlers to insert detailed timing data in a logical place in  
the output.


-Mike


Re: Ruby writer - arrays for facet data?

2007-05-15 Thread Nigel McNie
Yonik Seeley wrote:
> 
> '''
> The JSON response format for facets has changed to make it easier for
> clients to retain sorted order.  Use json.nl=map explicitly in clients
> to get the old behavior, or add it as a default to the request handler
> in solrconfig.xml
> '''

Thanks for pointing that out, it has fixed my issue. Though I wonder:
why was it decided to do [key, value, key, value] instead of [{key =>
value}, {key => value}] when it was found that you needed sorting? IMHO,
a little easier to work with. (I presume there's a good reason for it
though...)

-- 
Regards,
Nigel McNie
Catalyst IT Ltd.
DDI: +64 4 803 2203



signature.asc
Description: OpenPGP digital signature


Re: Ruby writer - arrays for facet data?

2007-05-15 Thread Yonik Seeley

On 5/15/07, Nigel McNie <[EMAIL PROTECTED]> wrote:

Yonik Seeley wrote:
>
> '''
> The JSON response format for facets has changed to make it easier for
> clients to retain sorted order.  Use json.nl=map explicitly in clients
> to get the old behavior, or add it as a default to the request handler
> in solrconfig.xml
> '''

Thanks for pointing that out, it has fixed my issue. Though I wonder:
why was it decided to do [key, value, key, value] instead of [{key =>
value}, {key => value}] when it was found that you needed sorting? IMHO,
a little easier to work with. (I presume there's a good reason for it
though...)


Memory is one reason... a map or hash per count isn't ideal.

Another is the API for accessing such a data structure.  While it
looks fine on paper, getting a single key/value easily and efficiently
is not what a map/hash is good for.  What's a map doing in the
equation if there is only ever going to be a single key?

-Yonik