Feature idea - delete and commit from web interface ?

2008-06-18 Thread JLIST
It seems that the web interface only supports select but not delete.
Is it possible to do delete from the browser? It would be nice to be
able to do delete and commit, and even post (put XML in an html form)
from the admin web interface :)

Also, does delete have to be a POST? A GET should do.





Default result rows

2008-06-18 Thread Mihails Agafonovs
Hi!

Where can I define, how many rows must be returned in the result?
Default is 10, and specifying other value each time through URL or
advanced interface isn't comfortable.
 Ar cieņu, Mihails

Deleting Solr index

2008-06-18 Thread Mihails Agafonovs
How can I clear the whole Solr index?
 Ar cieņu, Mihails

Re: Deleting Solr index

2008-06-18 Thread j . L
just rm -r SOLR_DIR/data/index.


2008/6/18 Mihails Agafonovs <[EMAIL PROTECTED]>:

> How can I clear the whole Solr index?
>  Ar cieņu, Mihails




-- 
regards
j.L


Re: Default result rows

2008-06-18 Thread Shalin Shekhar Mangar
You can configure this in solrconfig.xml under the "defaults" section for
StandardRequestHandler



 
   explicit
   30
   *
   2.1
 
  

2008/6/18 Mihails Agafonovs <[EMAIL PROTECTED]>:

> Hi!
>
> Where can I define, how many rows must be returned in the result?
> Default is 10, and specifying other value each time through URL or
> advanced interface isn't comfortable.
>  Ar cieņu, Mihails




-- 
Regards,
Shalin Shekhar Mangar.


Re: Deleting Solr index

2008-06-18 Thread Shalin Shekhar Mangar
You can delete by query *:* (which matches all documents)

http://wiki.apache.org/solr/UpdateXmlMessages

2008/6/18 Mihails Agafonovs <[EMAIL PROTECTED]>:

> How can I clear the whole Solr index?
>  Ar cieņu, Mihails




-- 
Regards,
Shalin Shekhar Mangar.


Re: Default result rows

2008-06-18 Thread Mihails Agafonovs
Doesn't work :(. None of the parameters in the "defaults" section is
being read. Solr still uses the predefined default parameters.

P.S. In "defaults" section I should be able specify also what
stylesheet to use, right?
 Quoting Shalin Shekhar Mangar : You can configure this in
solrconfig.xml under the "defaults" section for
 StandardRequestHandler
 
 
 
 explicit
 30
 *
 2.1
 
 
 2008/6/18 Mihails Agafonovs <[EMAIL PROTECTED]>:
 > Hi!
 >
 > Where can I define, how many rows must be returned in the
result?
 > Default is 10, and specifying other value each time through URL
or
 > advanced interface isn't comfortable.
 >  Ar cieņu, Mihails
 -- 
 Regards,
 Shalin Shekhar Mangar.
 Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


SOLR-236 patch works

2008-06-18 Thread JLIST
I had the patch problem but I manually created that file and
solr nightly builds fine.

After replacing solr.war with apache-solr-solrj-1.3-dev.jar,
in solrconfig.xml, I added this:



Then added this to the standard and dismax handler handler

  collapse



I added &collapse.field=&collapse.threshold=, and the result
collapsed as expected.

> Can you provide feedback about this particular patch once you try
> it?  I'd like to get it on Solr 1.3, actually, so any feedback would
> help.

> Thanks,
> Otis





Re: "Did you mean" functionality

2008-06-18 Thread Lucas F. A. Teixeira

Yeah, i read it.
Thanks a lot, I`m waiting for it!

[]s,

Lucas

Lucas Frare A. Teixeira
[EMAIL PROTECTED] 
Tel: +55 11 3660.1622 - R3018



Grant Ingersoll escreveu:

Also see http://wiki.apache.org/solr/SpellCheckComponent

I expect to commit fairly soon.

On Jun 17, 2008, at 5:46 PM, Otis Gospodnetic wrote:


Hi Lucas,

Have a look at (the patch in) SOLR-572, lots of work happening there 
as we speak.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 

From: Lucas F. A. Teixeira <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, June 17, 2008 4:30:12 PM
Subject: "Did you mean" functionality

Hello everybody,

I need to integrate the Lucene SpellChecker Contrib lib in my
applycation, but I`m using the EmbeededSolrServer to access all 
indexes.
I want to know what should I do (if someone have any step-by-step, 
link,

tutorial or smoke signal) of what I need to do during indexing, and of
course to search through this words generated by this API.

I can use the lib itself to search the suggestions, w/out using solr,
but I`m confused about how may I proceed when indexing this docs.

Thanks a lot,

[]s,

--
Lucas Frare A. Teixeira
[EMAIL PROTECTED]
Tel: +55 11 3660.1622 - R3018









Re: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Koji Sekiguchi

A patch for this had been posted before, though I don't know it can delete.
It can add documents and commit from admin gui.

https://issues.apache.org/jira/browse/SOLR-85

Koji

JLIST wrote:

It seems that the web interface only supports select but not delete.
Is it possible to do delete from the browser? It would be nice to be
able to do delete and commit, and even post (put XML in an html form)
from the admin web interface :)

Also, does delete have to be a POST? A GET should do.




  




Solr/bin/commit problem - fails to commit correctly and render response

2008-06-18 Thread McBride, John
Hello,

I am using the solr/bin/commit file to commit index changes after index
distribution in the collection distribution operations model.

The commit script is printed at the end of the email.

When I run the script as is, I get the following error:

commit request to Solr at port 8080 failed

This is corrected with the following addition to the line:

rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
""`
Becomes:
rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
"" -H 'Content-type:text/xml; charset=utf-8'`

This works, but the log reports an error, because the response is not as
expected.
SOLR returns:  0

But the commit script expects:[regular
expression]


Has anybody else had problems using this commit script?
Where can I get the latest version?  I got this script from the solr 1.2
package.

Thanks,
John

---
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version
2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Shell script to force a commit of all changes since last commit
# for a Solr server

orig_dir=$(pwd)
cd ${0%/*}/..
solr_root=$(pwd)
cd ${orig_dir}

unset solr_hostname solr_port webapp_name user verbose debug
. ${solr_root}/bin/scripts-util

# set up variables
prog=${0##*/}
log=${solr_root}/logs/${prog}.log

# define usage string
USAGE="\
usage: $prog [-h hostname] [-p port] [-w webapp_name] [-u username] [-v]
   -h  specify Solr hostname
   -p  specify Solr port number
   -w  specify name of Solr webapp (defaults to solr)
   -u  specify user to sudo to before running script
   -v  increase verbosity
   -V  output debugging info
"

# parse args
while getopts h:p:w:u:vV OPTION
do
case $OPTION in
h)
solr_hostname="$OPTARG"
;;
p)
solr_port="$OPTARG"
;;
w)
webapp_name="$OPTARG"
;;
u)
user="$OPTARG"
;;
v)
verbose="v"
;;
V)
debug="V"
;;
*)
echo "$USAGE"
exit 1
esac
done

[[ -n $debug ]] && set -x

if [[ -z ${solr_port} ]]
then
echo "Solr port number missing in $confFile or command line."
echo "$USAGE"


exit 1
fi

# use default hostname if not specified
if [[ -z ${solr_hostname} ]]
then
solr_hostname=localhost
fi

# use default webapp name if not specified
if [[ -z ${webapp_name} ]]
then
webapp_name=solr
fi

fixUser "$@"

start=`date +"%s"`

logMessage started by $oldwhoami
logMessage command: $0 $@

rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
""`
if [[ $? != 0 ]]
then
  logMessage failed to connect to Solr server at port ${solr_port}
  logMessage commit failed
  logExit failed 1
fi

# check status of commit request
echo $rs | grep ' /dev/null 2>&1
if [[ $? != 0 ]]
then
  logMessage commit request to Solr at port ${solr_port} failed:
  logMessage $rs
  logExit failed 2
fi

logExit ended 0
---



RE: Solr/bin/commit problem - fails to commit correctly and render response

2008-06-18 Thread McBride, John
Ok I checked out the nightly builds and the two changes have been made.

I will use the SOLR 1.3 version of solr/bin/commit.

Thanks,
John 

-Original Message-
From: McBride, John [mailto:[EMAIL PROTECTED] 
Sent: 18 June 2008 11:48
To: solr-user@lucene.apache.org
Subject:  Solr/bin/commit problem - fails to commit correctly and
render response

Hello,

I am using the solr/bin/commit file to commit index changes after index
distribution in the collection distribution operations model.

The commit script is printed at the end of the email.

When I run the script as is, I get the following error:

commit request to Solr at port 8080 failed

This is corrected with the following addition to the line:

rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
""`
Becomes:
rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
"" -H 'Content-type:text/xml; charset=utf-8'`

This works, but the log reports an error, because the response is not as
expected.
SOLR returns:  0

But the commit script expects:[regular
expression]


Has anybody else had problems using this commit script?
Where can I get the latest version?  I got this script from the solr 1.2
package.

Thanks,
John

---
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more #
contributor license agreements.  See the NOTICE file distributed with #
this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version
2.0 # (the "License"); you may not use this file except in compliance
with # the License.  You may obtain a copy of the License at #
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software #
distributed under the License is distributed on an "AS IS" BASIS, #
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and #
limitations under the License.
#
# Shell script to force a commit of all changes since last commit # for
a Solr server

orig_dir=$(pwd)
cd ${0%/*}/..
solr_root=$(pwd)
cd ${orig_dir}

unset solr_hostname solr_port webapp_name user verbose debug .
${solr_root}/bin/scripts-util

# set up variables
prog=${0##*/}
log=${solr_root}/logs/${prog}.log

# define usage string
USAGE="\
usage: $prog [-h hostname] [-p port] [-w webapp_name] [-u username] [-v]
   -h  specify Solr hostname
   -p  specify Solr port number
   -w  specify name of Solr webapp (defaults to solr)
   -u  specify user to sudo to before running script
   -v  increase verbosity
   -V  output debugging info
"

# parse args
while getopts h:p:w:u:vV OPTION
do
case $OPTION in
h)
solr_hostname="$OPTARG"
;;
p)
solr_port="$OPTARG"
;;
w)
webapp_name="$OPTARG"
;;
u)
user="$OPTARG"
;;
v)
verbose="v"
;;
V)
debug="V"
;;
*)
echo "$USAGE"
exit 1
esac
done

[[ -n $debug ]] && set -x

if [[ -z ${solr_port} ]]
then
echo "Solr port number missing in $confFile or command line."
echo "$USAGE"


exit 1
fi

# use default hostname if not specified
if [[ -z ${solr_hostname} ]]
then
solr_hostname=localhost
fi

# use default webapp name if not specified if [[ -z ${webapp_name} ]]
then
webapp_name=solr
fi

fixUser "$@"

start=`date +"%s"`

logMessage started by $oldwhoami
logMessage command: $0 $@

rs=`curl http://${solr_hostname}:${solr_port}/solr/update -s -d
""` if [[ $? != 0 ]] then
  logMessage failed to connect to Solr server at port ${solr_port}
  logMessage commit failed
  logExit failed 1
fi

# check status of commit request
echo $rs | grep ' /dev/null 2>&1 if [[ $? != 0 ]]
then
  logMessage commit request to Solr at port ${solr_port} failed:
  logMessage $rs
  logExit failed 2
fi

logExit ended 0
---



never desallocate RAM...during search

2008-06-18 Thread Roberto Nieto
Hi users,

Somedays ago I made a question about RAM use during searchs but I didn't
solve my problem with the ideas that some expert users told me. After making
somes test I can make a more specific question hoping someone can help me.

My problem is that i need highlighting and i have quite big docs (txt of
40MB). The conclusion of my tests is that if I set "rows" to 10, the content
of the first 10 results are cached. This if something normal because its
probable needed for the highlighting, but this memory is never desallocate
although I set solr's caches to 0. With this, the memory grows up until is
close to the heap, then the gc start to desallocate memory..but at that
point the searches are quite slow. Is this a normal behavior? Can I
configure some solr parameter to force the desallocation of results after
each search? [I´m using solr 1.2]

Another thing that I found is that although I comment (in solrconfig) all
this options:
 > filterCache, queryResultCache, documentCache, enableLazyFieldLoading,
useFilterForSortedQuery, boolTofilterOptimizer
In the stats always appear "caching:true".

I'm probably leaving some stupid thing but I can't find it.

If anyone can help me..i'm quite desperate.


Rober.


Re: Default result rows

2008-06-18 Thread Otis Gospodnetic
Use &rows=NNN in the URL.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Mihails Agafonovs <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 18, 2008 4:30:53 AM
> Subject: Default result rows
> 
> Hi!
> 
> Where can I define, how many rows must be returned in the result?
> Default is 10, and specifying other value each time through URL or
> advanced interface isn't comfortable.
> Ar cieņu, Mihails



Re: SOLR-236 patch works

2008-06-18 Thread Otis Gospodnetic
That looks right.  CollapseComponent replaces QueryComponent.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: JLIST <[EMAIL PROTECTED]>
> To: Otis Gospodnetic 
> Sent: Wednesday, June 18, 2008 5:24:25 AM
> Subject: SOLR-236 patch works
> 
> I had the patch problem but I manually created that file and
> solr nightly builds fine.
> 
> After replacing solr.war with apache-solr-solrj-1.3-dev.jar,
> in solrconfig.xml, I added this:
> 
> 
> class="org.apache.solr.handler.component.CollapseComponent" />
> 
> Then added this to the standard and dismax handler handler
> 
> 
>   collapse
> 
> 
> 
> I added &collapse.field=&collapse.threshold=, and the result
> collapsed as expected.
> 
> > Can you provide feedback about this particular patch once you try
> > it?  I'd like to get it on Solr 1.3, actually, so any feedback would
> > help.
> 
> > Thanks,
> > Otis



Re: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Otis Gospodnetic
As for POST vs. GET - don't let REST purists hear you. :)
Actually, isn't there a DELETE HTTP method that REST purists would say should 
be used in case of doc deletion?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: JLIST <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 18, 2008 4:13:09 AM
> Subject: Feature idea - delete and commit from web interface ?
> 
> It seems that the web interface only supports select but not delete.
> Is it possible to do delete from the browser? It would be nice to be
> able to do delete and commit, and even post (put XML in an html form)
> from the admin web interface :)
> 
> Also, does delete have to be a POST? A GET should do.



Re: never desallocate RAM...during search

2008-06-18 Thread Otis Gospodnetic
Hi,
I don't have the answer about why cache still shows "true", but as far as 
memory usage goes, based on your description I'd guess the memory is allocated 
and used by the JVM which typically  tries not to run GC unless it needs to.  
So if you want to get rid of that used memory, you need to talk to the JVM and 
persuade it to run GC.  I don't think there is a way to manage memory usage 
directly.  There is System.gc() that you can call, but that's only a 
"suggestion" for the JVM to run GC.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Roberto Nieto <[EMAIL PROTECTED]>
> To: solr-user 
> Sent: Wednesday, June 18, 2008 7:43:12 AM
> Subject: never desallocate RAM...during search
> 
> Hi users,
> 
> Somedays ago I made a question about RAM use during searchs but I didn't
> solve my problem with the ideas that some expert users told me. After making
> somes test I can make a more specific question hoping someone can help me.
> 
> My problem is that i need highlighting and i have quite big docs (txt of
> 40MB). The conclusion of my tests is that if I set "rows" to 10, the content
> of the first 10 results are cached. This if something normal because its
> probable needed for the highlighting, but this memory is never desallocate
> although I set solr's caches to 0. With this, the memory grows up until is
> close to the heap, then the gc start to desallocate memory..but at that
> point the searches are quite slow. Is this a normal behavior? Can I
> configure some solr parameter to force the desallocation of results after
> each search? [I´m using solr 1.2]
> 
> Another thing that I found is that although I comment (in solrconfig) all
> this options:
> > filterCache, queryResultCache, documentCache, enableLazyFieldLoading,
> useFilterForSortedQuery, boolTofilterOptimizer
> In the stats always appear "caching:true".
> 
> I'm probably leaving some stupid thing but I can't find it.
> 
> If anyone can help me..i'm quite desperate.
> 
> 
> Rober.



Re: Default result rows

2008-06-18 Thread Yonik Seeley
2008/6/18 Mihails Agafonovs <[EMAIL PROTECTED]>:
> Doesn't work :(. None of the parameters in the "defaults" section is
> being read.

Everyone uses this functionality, so it's a bug in your request or
config somewhere.
- Make sure you restarted Solr after changing solrconfig.xml
- Make sure you changed the defaults in the right request handler
- Add echoParams=all to your request to see what parameters are being used
- if you can't get it to work, post the URL of the query you are using
to test, the response output, and the relevant part of the
solrconfig.xml

-Yonik


RE: never desallocate RAM...during search

2008-06-18 Thread r.nieto
Hi Otis,

Thank you for your attention.

I've read for days the mail list of lucene and solr and no-one have problems
with anything similar that’s why it's seem a bit strange for me this
behaviour. 

I can try what you comment about the "gc", but what I'm telling is a normal
behaviour? I must configure my JVM with gc especial parameters for solr?

Thanks a lot. I hope I can arrive to one solution with your help.

Rober.
-Mensaje original-
De: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Enviado el: miércoles, 18 de junio de 2008 14:55
Para: solr-user@lucene.apache.org
Asunto: Re: never desallocate RAM...during search

Hi,
I don't have the answer about why cache still shows "true", but as far as
memory usage goes, based on your description I'd guess the memory is
allocated and used by the JVM which typically  tries not to run GC unless it
needs to.  So if you want to get rid of that used memory, you need to talk
to the JVM and persuade it to run GC.  I don't think there is a way to
manage memory usage directly.  There is System.gc() that you can call, but
that's only a "suggestion" for the JVM to run GC.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Roberto Nieto <[EMAIL PROTECTED]>
> To: solr-user 
> Sent: Wednesday, June 18, 2008 7:43:12 AM
> Subject: never desallocate RAM...during search
> 
> Hi users,
> 
> Somedays ago I made a question about RAM use during searchs but I didn't
> solve my problem with the ideas that some expert users told me. After
making
> somes test I can make a more specific question hoping someone can help me.
> 
> My problem is that i need highlighting and i have quite big docs (txt of
> 40MB). The conclusion of my tests is that if I set "rows" to 10, the
content
> of the first 10 results are cached. This if something normal because its
> probable needed for the highlighting, but this memory is never desallocate
> although I set solr's caches to 0. With this, the memory grows up until is
> close to the heap, then the gc start to desallocate memory..but at that
> point the searches are quite slow. Is this a normal behavior? Can I
> configure some solr parameter to force the desallocation of results after
> each search? [I´m using solr 1.2]
> 
> Another thing that I found is that although I comment (in solrconfig) all
> this options:
> > filterCache, queryResultCache, documentCache,
enableLazyFieldLoading,
> useFilterForSortedQuery, boolTofilterOptimizer
> In the stats always appear "caching:true".
> 
> I'm probably leaving some stupid thing but I can't find it.
> 
> If anyone can help me..i'm quite desperate.
> 
> 
> Rober.



Re: Default result rows

2008-06-18 Thread Mihails Agafonovs
The "problem" was in the search query submit form, where rows value
was defined as 10.
 Quoting Yonik Seeley : 2008/6/18 Mihails Agafonovs
<[EMAIL PROTECTED]>:
 > Doesn't work :(. None of the parameters in the
"defaults" section is
 > being read.
 Everyone uses this functionality, so it's a bug in your request or
 config somewhere.
 - Make sure you restarted Solr after changing solrconfig.xml
 - Make sure you changed the defaults in the right request handler
 - Add echoParams=all to your request to see what parameters are
being used
 - if you can't get it to work, post the URL of the query you are
using
 to test, the response output, and the relevant part of the
 solrconfig.xml
 -Yonik
 Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


Re: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess delete over REST is not so evil. Let us do it.
Don't we delete mails over HTTP GET?
--Noble

On Wed, Jun 18, 2008 at 6:20 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> As for POST vs. GET - don't let REST purists hear you. :)
> Actually, isn't there a DELETE HTTP method that REST purists would say should 
> be used in case of doc deletion?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message 
>> From: JLIST <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, June 18, 2008 4:13:09 AM
>> Subject: Feature idea - delete and commit from web interface ?
>>
>> It seems that the web interface only supports select but not delete.
>> Is it possible to do delete from the browser? It would be nice to be
>> able to do delete and commit, and even post (put XML in an html form)
>> from the admin web interface :)
>>
>> Also, does delete have to be a POST? A GET should do.
>
>



-- 
--Noble Paul


Re: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Walter Underwood
Only if each document in Solr has a URI. Add/replace would use PUT.

Never, never delete with a GET. The Ultraseek spider deleted 20K
docments on an intranet once because they gave it admin perms and
it followed the "delete this page" link on every page.

wunder

On 6/18/08 5:50 AM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote:

> As for POST vs. GET - don't let REST purists hear you. :)
> Actually, isn't there a DELETE HTTP method that REST purists would say should
> be used in case of doc deletion?
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> - Original Message 
>> From: JLIST <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, June 18, 2008 4:13:09 AM
>> Subject: Feature idea - delete and commit from web interface ?
>> 
>> It seems that the web interface only supports select but not delete.
>> Is it possible to do delete from the browser? It would be nice to be
>> able to do delete and commit, and even post (put XML in an html form)
>> from the admin web interface :)
>> 
>> Also, does delete have to be a POST? A GET should do.
> 



missing document count?

2008-06-18 Thread Geoffrey Young

hi all :)

maybe I'm just missing it, but I don't see a way to consistently (and 
easily) know the number of returned documents without a lot of acrobatics.


  numFound represents the entire number of matching documents

  if numFound <= rows then numFound is the number of documents in the 
response


  if numFound > rows then rows is the number of documents in the response

  if echoParams=none is set I need to count docs myself

is there a reason for this complexity?  a numDocs response field would 
seem much simpler.


--Geoff


Slight issue with classloading and DataImportHandler

2008-06-18 Thread Brendan Grainger

Hi,

I set up the new DataimportHandler last night to replace some custom  
import code I'd written and so far I'm loving it thank you.


I had one issue you might want to know about it. I have some solr  
extensions I've written and packaged in a jar which I place in:


solr-home/lib

as per:

http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc

This works well for my handlers but a custom Transformer I wrote and  
packaged the same way was throwing a ClassNotFoundException. I tracked  
it down to the DocBuilder.loadClass method which was just doing a  
Class.forName. Anyway, I fixed it for the moment by probably do  
something stupid and creating a SolrResourceLoader (which I imagine  
could be an instance variable, but at 3am I just wanted to get it  
working). Anyway, this fixes the problem:


  @SuppressWarnings("unchecked")
  static Class loadClass(String name) throws ClassNotFoundException {
SolrResourceLoader loader = new SolrResourceLoader( null );
return loader.findClass(name);
 // return Class.forName(name);
  }

Brendan

Re[2]: SOLR-236 patch works

2008-06-18 Thread JLIST

> That looks right.  CollapseComponent replaces QueryComponent.

Does it mean that if collapse parameters show up in the URL,
CollapseComponent will be used automatically, and if not,
QueryComponent will be used? Or is it always going to be
CollapseComponent, which defaults to QueryComponent's behavior
when collapse parameters are absent?



Re: missing document count?

2008-06-18 Thread Walter Underwood
Parse the results into a list and do something like this:

  Math.min(results.size(), numFound)  // Java
  min(len(results), numFound) # Python

Doesn't seem all that hard, and not worth duplicating the info.

I don't ever remember needing to know the number of hits returned
in a page. Either print the total or iterate over all returned rows.
Next and prev buttons can be done with numFound, start, and the
requested row count.

wunder

On 6/18/08 10:07 AM, "Geoffrey Young" <[EMAIL PROTECTED]> wrote:
> 
> maybe I'm just missing it, but I don't see a way to consistently (and
> easily) know the number of returned documents without a lot of acrobatics.
> 
>numFound represents the entire number of matching documents
> 
>if numFound <= rows then numFound is the number of documents in the
> response
> 
>if numFound > rows then rows is the number of documents in the response
> 
>if echoParams=none is set I need to count docs myself
> 
> is there a reason for this complexity?  a numDocs response field would
> seem much simpler.
> 
> --Geoff



Re: missing document count?

2008-06-18 Thread Geoffrey Young



Walter Underwood wrote:

Parse the results into a list and do something like this:

  Math.min(results.size(), numFound)  // Java
  min(len(results), numFound) # Python


we're using json, but sure, I can calculate it :)



Doesn't seem all that hard, and not worth duplicating the info.


not hard, but useful information to have handy without additional 
manipulations on my part.




I don't ever remember needing to know the number of hits returned
in a page. Either print the total or iterate over all returned rows.
Next and prev buttons can be done with numFound, start, and the
requested row count.


our pages are the results of multiple queries.  so, given a max number 
of records per page (or total), the rows asked of query2 is max - 
query1, of query 3 max - query2 - query1, etc.  makes pagination a pain, 
but what can you do when product wants what it wants ;)


anyway, I guess I'll just figure it out.  thanks.

--Geoff




Re: missing document count?

2008-06-18 Thread Chris Hostetter

: not hard, but useful information to have handy without additional
: manipulations on my part.

: our pages are the results of multiple queries.  so, given a max number of
: records per page (or total), the rows asked of query2 is max - query1, of

in the common case, counting the number of "doc"s in a "result" is just as 
easy as reading some attribute containing the count.  It sounds like you 
have a more complicated case where what you really wnat is the count of 
how many "doc"s there are in the entire response (ie: multiple "result" 
sections) ... that count is admitedly a little more work but would also be 
completley useless to most clients if it was included in the response 
(just as the number of fields in each doc, or the total number of strings 
in the response) ... there is a lot of metadata that *could* be included 
in the response, but we don't bother when the client can compute that 
metadata just as easily as the server -- among other things, it helps keep 
the response size smaller.

This was actually one of the orriginal guiding principles of Solr: support 
features that are faster/cheaper/easier/more-efficient on the central 
server then they would be on the clients (sorting, docset caching, 
faceting, etc...)



-Hoss



Re: missing document count?

2008-06-18 Thread Geoffrey Young



Chris Hostetter wrote:

: not hard, but useful information to have handy without additional
: manipulations on my part.

: our pages are the results of multiple queries.  so, given a max number of
: records per page (or total), the rows asked of query2 is max - query1, of

in the common case, counting the number of "doc"s in a "result" is just as 
easy as reading some attribute containing the count. 


I suppose :)  in my mind, one (potentially) requires just a read, while 
the other requires some further manipulations.  but I suppose most 
modern languages have optimizations for things like array size :)


It sounds like you 
have a more complicated case where what you really wnat is the count of 
how many "doc"s there are in the entire response 


I don't know how complex it is to ask for documents in the response, but 
yes :)


(ie: multiple "result" 
sections) ... 


multiple results from multiple queries, not a single query.

but really, I wasn't planning on having anyone (solr or otherwise) 
solving my needs.  I just find it odd that I need to discern the number 
of returned results.


that count is admitedly a little more work but would also be 
completley useless to most clients if it was included in the response 


perhaps :)

(just as the number of fields in each doc, or the total number of strings 
in the response) ... there is a lot of metadata that *could* be included 
in the response, but we don't bother when the client can compute that 
metadata just as easily as the server -- among other things, it helps keep 
the response size smaller.


agreed - smaller is better.

as for client as easily as a the server, I assumed that solr was keeping 
track of the document count already, if only to see when the number of 
documents exceeds the rows parameter.  if so, all the people who care 
about number of documents in the result (which, I'll assume, is more 
than those who care about total strings in the response ;) are all 
re-computing a known value.




This was actually one of the orriginal guiding principles of Solr: support 
features that are faster/cheaper/easier/more-efficient on the central 
server then they would be on the clients (sorting, docset caching, 
faceting, etc...)


sure, I'll buy that.  but in my mind it was only exposing something solr 
already was calculating anyway.


regardless, thanks for taking the time :)

--Geoff


Re[2]: Feature idea - delete and commit from web interface ?

2008-06-18 Thread JLIST
GET makes it possible to delete from a browser address bar,
which you can not do with DELETE :)

> As for POST vs. GET - don't let REST purists hear you. :)
> Actually, isn't there a DELETE HTTP method that REST purists
> would say should be used in case of doc deletion?




Re[2]: Feature idea - delete and commit from web interface ?

2008-06-18 Thread JLIST

Sounds like web designer's fault. No permission check and no
confirmation for deletion?

> Never, never delete with a GET. The Ultraseek spider deleted 20K
> docments on an intranet once because they gave it admin perms and
> it followed the "delete this page" link on every page.




Re: Re[2]: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Walter Underwood
The spider was given an admin login so it could access all
content. Reasonable decision if the pages had been designed well.

Even with a confirmation, never delete with a GET. Use POST.
If the spider ever discovers the URL that the confirmation
uses, it will still delete the content.

Luckily, they had a backup.

wunder

On 6/18/08 1:55 PM, "JLIST" <[EMAIL PROTECTED]> wrote:

> 
> Sounds like web designer's fault. No permission check and no
> confirmation for deletion?
> 
>> Never, never delete with a GET. The Ultraseek spider deleted 20K
>> docments on an intranet once because they gave it admin perms and
>> it followed the "delete this page" link on every page.
> 
> 



Re: Re[2]: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Craig McClanahan
On Wed, Jun 18, 2008 at 1:55 PM, JLIST <[EMAIL PROTECTED]> wrote:
>
> Sounds like web designer's fault. No permission check and no
> confirmation for deletion?
>

Nope ... application designer's fault for misusing the web.  Allowing
deletes on a GET violates HTTP/1.1 requirements (not just RESTful
ones) that GET requests not have side effects, so an app that works
that way is going to mess up when HTTP caching is in use ... as lots
of people found to their chagrin when they installed Google Desktop's
caching capabilities, and the cache played by the standard HTTP rules
(GETs are supposed to be idempotent, having no side effects, so it's
just fine to issue the same GET as many times as desired.

If you want an easy way to do deletes from a browser, just set up a
little form that does a POST and includes the id of the document you
want to delete.  Then you're playing by the rules, and won't make a
fool of yourself when crawlers or caches interact with your
application.

Craig McClanahan

>> Never, never delete with a GET. The Ultraseek spider deleted 20K
>> docments on an intranet once because they gave it admin perms and
>> it followed the "delete this page" link on every page.
>
>
>


Re: scaling / sharding questions

2008-06-18 Thread Phillip Farber
This may be slightly off topic, for which I apologize, but is related to 
the question of searching several indexes as Lance describes below, quoting:


 "We also found that searching a few smaller indexes via the Solr 1.3 
Distributed Search feature is actually faster than searching one large

index, YMMV."

The wiki describing distributed search lists several limitations which 
set me to wonder about two limitations in particular and what the value 
is mainly with respect to scoring:


1) No distributed idf

Does this mean that the Lucene scoring algorithm is computed without the 
idf factor, i.e. we just get term frequency scoring?


2) Doesn't support consistency between stages, e.g. a shard index can be 
changed between STAGE_EXECUTE_QUERY and STAGE_GET_FIELDS


What does this mean or where can I find out what it means?

Thanks!

Phil




Lance Norskog wrote:

Yes, I've done this split-by-delete several times. The halved index still
uses as much disk space until you optimize it.

As to splitting policy: we use an MD5 signature as our unique ID. This has
the lovely property that we can wildcard.  'contentid:f*' denotes 1/16 of
the whole index. This 1/16 is a very random sample of the whole index. We
use this for several things. If we use this for shards, we have a query that
matches a shard's contents.

The Solr/Lucene syntax does not support modular arithmetic,and so it will
not let you query a subset that matches one of your shards.

We also found that searching a few smaller indexes via the Solr 1.3
Distributed Search feature is actually faster than searching one large
index, YMMV. So for us, a large pile of shards will be optimal anyway, so we
have to need "rebalance".

It sounds like you're not storing the data in a backing store, but are
storing all data in the index itself. We have found this "challenging".

Cheers,

Lance Norskog

-Original Message-
From: Jeremy Hinegardner [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 13, 2008 3:36 PM

To: solr-user@lucene.apache.org
Subject: Re: scaling / sharding questions

Sorry for not keeping this thread alive, lets see what we can do...

One option I've thought of for 'resharding' would splitting an index into
two by just copying it, the deleting 1/2 the documents from one, doing a
commit, and delete the other 1/2 from the other index and commit.  That is:

  1) Take original index
  2) copy to b1 and b2
  3) delete docs from b1 that match a particular query A
  4) delete docs from b2 that do not match a particular query A
  5) commit b1 and b2

Has anyone tried something like that?

As for how to know where each document is stored, generally we're
considering unique_document_id % N.  If we rebalance we change N and
redistribute, but that
probably will take too much time.That makes us move more towards a
staggered
age based approach where the most recent docs filter down to "permanent"
indexes based upon time.

Another thought we've had recently is to have many many many physical
shards, on the indexing writer side, but then merge groups of them into
logical shards which are snapshotted to reader solrs' on a frequent basis.
I haven't done any testing along these lines, but logically it seems like an
idea worth pursuing.

enjoy,

-jeremy

On Fri, Jun 06, 2008 at 03:14:10PM +0200, Marcus Herou wrote:

Cool sharding technique.

We as well are thinking of howto "move" docs from one index to another 
because we need to re-balance the docs when we add new nodes to the

cluster.
We do only store id's in the index otherwise we could have moved stuff 
around with IndexReader.document(x) or so. Luke 
(http://www.getopt.org/luke/) is able to reconstruct the indexed Document

data so it should be doable.
However I'm thinking of actually just delete the docs from the old 
index and add new Documents to the new node. It would be cool to not 
waste cpu cycles by reindexing already indexed stuff but...


And we as well will have data amounts in the range you are talking 
about. We perhaps could share ideas ?


How do you plan to store where each document is located ? I mean you 
probably need to store info about the Document and it's location 
somewhere perhaps in a clustered DB ? We will probably go for HBase for

this.
I think the number of documents is less important than the actual data 
size (just speculating). We currently search 10M (will get much much 
larger) indexed blog entries on one machine where the JVM has 1G heap, 
the index size is 3G and response times are still quite fast. This is 
a readonly node though and is updated every morning with a freshly 
optimized index. Someone told me that you probably need twice the RAM 
if you plan to both index and search at the same time. If I were you I 
would just test to index X entries of your data and then start to 
search in the index with lower JVM settings each round and when 
response times get too slow or you hit OOE then you get a rough estimate

of the bare minimum X RAM needed for Y entries.
I

Re: scaling / sharding questions

2008-06-18 Thread Yonik Seeley
On Wed, Jun 18, 2008 at 5:53 PM, Phillip Farber <[EMAIL PROTECTED]> wrote:
> Does this mean that the Lucene scoring algorithm is computed without the idf
> factor, i.e. we just get term frequency scoring?

No, it means that the idf calculation is done locally on a single shard.
With a big index that is randomly mixed, this should not have a
practical impact.

> 2) Doesn't support consistency between stages, e.g. a shard index can be
> changed between STAGE_EXECUTE_QUERY and STAGE_GET_FIELDS
>
> What does this mean or where can I find out what it means?

STAGE_EXECUTE_QUERY finds the ids of matching documents.
STAGE_GET_FIELDS retrieves the fields of matching documents.

A change to a document could possibly happen inbetween, and one would
end up retrieving a document that no longer matched the query.  In
practice, this is rarely an issue.

-Yonik


Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Jon Baer
Thanks.  Yeah took me a while to figure out I needed to do something  
like transformer="com.mycompany.solr.MyTransformer" on the entity  
before it would work ...


- Jon

On Jun 18, 2008, at 1:51 PM, Brendan Grainger wrote:


Hi,

I set up the new DataimportHandler last night to replace some custom  
import code I'd written and so far I'm loving it thank you.


I had one issue you might want to know about it. I have some solr  
extensions I've written and packaged in a jar which I place in:


solr-home/lib

as per:

http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc

This works well for my handlers but a custom Transformer I wrote and  
packaged the same way was throwing a ClassNotFoundException. I  
tracked it down to the DocBuilder.loadClass method which was just  
doing a Class.forName. Anyway, I fixed it for the moment by probably  
do something stupid and creating a SolrResourceLoader (which I  
imagine could be an instance variable, but at 3am I just wanted to  
get it working). Anyway, this fixes the problem:


 @SuppressWarnings("unchecked")
 static Class loadClass(String name) throws ClassNotFoundException {
   SolrResourceLoader loader = new SolrResourceLoader( null );
   return loader.findClass(name);
// return Class.forName(name);
 }

Brendan




Seeking suggestions - keyword related site promotion

2008-06-18 Thread JLIST
Hi all,

This is what I'm trying to do: since some sources (say,
some web sites) are more authoritative than other sources
on certain subjects, I'd like to promote those sites when
the query contains certain keywords. I'm not sure what
is the best way to implement this. I suppose I can index
the keywords in a field for all pages from that site but
this isn't very efficient, and any changes in the keyword
list would require re-indexing all pages of that site.
I wonder if there is a more efficient way that can dynamically
promote sites from a domain that is considered more related
to the queries. Any suggestion is welcome.

Thanks,
Jack



Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi,
DIH does not load class using the SolrResourceLoader. It tries a
Class.forName() with the name you provide if it fails it prepends
"org.apache.solr.handler.dataimport." and retries.

This is true for not just transformers but also for Entityprocessor,
DataSource and Evaluator

The reason for doing so is that we do not use any of the 'solr.'
packages in DIH. All our implementations fall into the default package
and we can directly use them w/o the package name.

So , if you are writing your own implementations use the default
package or provide the fully qualified class name.

--Noble

On Thu, Jun 19, 2008 at 8:09 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Thanks.  Yeah took me a while to figure out I needed to do something like
> transformer="com.mycompany.solr.MyTransformer" on the entity before it would
> work ...
>
> - Jon
>
> On Jun 18, 2008, at 1:51 PM, Brendan Grainger wrote:
>
>> Hi,
>>
>> I set up the new DataimportHandler last night to replace some custom
>> import code I'd written and so far I'm loving it thank you.
>>
>> I had one issue you might want to know about it. I have some solr
>> extensions I've written and packaged in a jar which I place in:
>>
>> solr-home/lib
>>
>> as per:
>>
>>
>> http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc
>>
>> This works well for my handlers but a custom Transformer I wrote and
>> packaged the same way was throwing a ClassNotFoundException. I tracked it
>> down to the DocBuilder.loadClass method which was just doing a
>> Class.forName. Anyway, I fixed it for the moment by probably do something
>> stupid and creating a SolrResourceLoader (which I imagine could be an
>> instance variable, but at 3am I just wanted to get it working). Anyway, this
>> fixes the problem:
>>
>>  @SuppressWarnings("unchecked")
>>  static Class loadClass(String name) throws ClassNotFoundException {
>>   SolrResourceLoader loader = new SolrResourceLoader( null );
>>   return loader.findClass(name);
>>// return Class.forName(name);
>>  }
>>
>> Brendan
>
>



-- 
--Noble Paul


Re: Re[2]: Feature idea - delete and commit from web interface ?

2008-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
The implementation  may provide a form where user can
type in  a doc id  to delete or a lucene query

if it is a  POST so be it.
But let us have the functionality

--Noble

On Thu, Jun 19, 2008 at 2:40 AM, Craig McClanahan <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 18, 2008 at 1:55 PM, JLIST <[EMAIL PROTECTED]> wrote:
>>
>> Sounds like web designer's fault. No permission check and no
>> confirmation for deletion?
>>
>
> Nope ... application designer's fault for misusing the web.  Allowing
> deletes on a GET violates HTTP/1.1 requirements (not just RESTful
> ones) that GET requests not have side effects, so an app that works
> that way is going to mess up when HTTP caching is in use ... as lots
> of people found to their chagrin when they installed Google Desktop's
> caching capabilities, and the cache played by the standard HTTP rules
> (GETs are supposed to be idempotent, having no side effects, so it's
> just fine to issue the same GET as many times as desired.
>
> If you want an easy way to do deletes from a browser, just set up a
> little form that does a POST and includes the id of the document you
> want to delete.  Then you're playing by the rules, and won't make a
> fool of yourself when crawlers or caches interact with your
> application.
>
> Craig McClanahan
>
>>> Never, never delete with a GET. The Ultraseek spider deleted 20K
>>> docments on an intranet once because they gave it admin perms and
>>> it followed the "delete this page" link on every page.
>>
>>
>>
>



-- 
--Noble Paul


Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Brendan Grainger

Hi,

I am actually providing the fully qualified classname in the  
configuration and I was still getting a ClassNotFoundException. If you  
look at the code in SolrResourceLoader they actually explicitly add  
the jars in solr-home/lib to the classloader:


static ClassLoader createClassLoader(File f, ClassLoader loader) {
if( loader == null ) {
  loader = Thread.currentThread().getContextClassLoader();
}
if (f.canRead() && f.isDirectory()) {
  File[] jarFiles = f.listFiles();
  URL[] jars = new URL[jarFiles.length];
  try {
for (int j = 0; j < jarFiles.length; j++) {
  jars[j] = jarFiles[j].toURI().toURL();
  log.info("Adding '" + jars[j].toString() + "' to Solr  
classloader");

}
return URLClassLoader.newInstance(jars, loader);
  } catch (MalformedURLException e) {
SolrException.log(log,"Can't construct solr lib class  
loader", e);

  }
}
log.info("Reusing parent classloader");
return loader;
  }


This seems to be me to be why my class is now found when I include my  
utilities jar in solr-home/lib.


Thanks
Brendan

On Jun 18, 2008, at 11:49 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



hi,
DIH does not load class using the SolrResourceLoader. It tries a
Class.forName() with the name you provide if it fails it prepends
"org.apache.solr.handler.dataimport." and retries.

This is true for not just transformers but also for Entityprocessor,
DataSource and Evaluator

The reason for doing so is that we do not use any of the 'solr.'
packages in DIH. All our implementations fall into the default package
and we can directly use them w/o the package name.

So , if you are writing your own implementations use the default
package or provide the fully qualified class name.

--Noble

On Thu, Jun 19, 2008 at 8:09 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
Thanks.  Yeah took me a while to figure out I needed to do  
something like
transformer="com.mycompany.solr.MyTransformer" on the entity before  
it would

work ...

- Jon

On Jun 18, 2008, at 1:51 PM, Brendan Grainger wrote:


Hi,

I set up the new DataimportHandler last night to replace some custom
import code I'd written and so far I'm loving it thank you.

I had one issue you might want to know about it. I have some solr
extensions I've written and packaged in a jar which I place in:

solr-home/lib

as per:


http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc

This works well for my handlers but a custom Transformer I wrote and
packaged the same way was throwing a ClassNotFoundException. I  
tracked it

down to the DocBuilder.loadClass method which was just doing a
Class.forName. Anyway, I fixed it for the moment by probably do  
something
stupid and creating a SolrResourceLoader (which I imagine could be  
an
instance variable, but at 3am I just wanted to get it working).  
Anyway, this

fixes the problem:

@SuppressWarnings("unchecked")
static Class loadClass(String name) throws ClassNotFoundException {
 SolrResourceLoader loader = new SolrResourceLoader( null );
 return loader.findClass(name);
  // return Class.forName(name);
}

Brendan







--
--Noble Paul




Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
aah!. We always assumed that people put the custom jars in the
WEB-INF/lib folder of solr webapp and hence they are automatically in
the classpath we shall make the necessary changes  .
--Noble

On Thu, Jun 19, 2008 at 10:06 AM, Brendan Grainger
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am actually providing the fully qualified classname in the configuration
> and I was still getting a ClassNotFoundException. If you look at the code in
> SolrResourceLoader they actually explicitly add the jars in solr-home/lib to
> the classloader:
>
> static ClassLoader createClassLoader(File f, ClassLoader loader) {
>if( loader == null ) {
>  loader = Thread.currentThread().getContextClassLoader();
>}
>if (f.canRead() && f.isDirectory()) {
>  File[] jarFiles = f.listFiles();
>  URL[] jars = new URL[jarFiles.length];
>  try {
>for (int j = 0; j < jarFiles.length; j++) {
>  jars[j] = jarFiles[j].toURI().toURL();
>  log.info("Adding '" + jars[j].toString() + "' to Solr
> classloader");
>}
>return URLClassLoader.newInstance(jars, loader);
>  } catch (MalformedURLException e) {
>SolrException.log(log,"Can't construct solr lib class loader", e);
>  }
>}
>log.info("Reusing parent classloader");
>return loader;
>  }
>
>
> This seems to be me to be why my class is now found when I include my
> utilities jar in solr-home/lib.
>
> Thanks
> Brendan
>
> On Jun 18, 2008, at 11:49 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> hi,
>> DIH does not load class using the SolrResourceLoader. It tries a
>> Class.forName() with the name you provide if it fails it prepends
>> "org.apache.solr.handler.dataimport." and retries.
>>
>> This is true for not just transformers but also for Entityprocessor,
>> DataSource and Evaluator
>>
>> The reason for doing so is that we do not use any of the 'solr.'
>> packages in DIH. All our implementations fall into the default package
>> and we can directly use them w/o the package name.
>>
>> So , if you are writing your own implementations use the default
>> package or provide the fully qualified class name.
>>
>> --Noble
>>
>> On Thu, Jun 19, 2008 at 8:09 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
>>>
>>> Thanks.  Yeah took me a while to figure out I needed to do something like
>>> transformer="com.mycompany.solr.MyTransformer" on the entity before it
>>> would
>>> work ...
>>>
>>> - Jon
>>>
>>> On Jun 18, 2008, at 1:51 PM, Brendan Grainger wrote:
>>>
 Hi,

 I set up the new DataimportHandler last night to replace some custom
 import code I'd written and so far I'm loving it thank you.

 I had one issue you might want to know about it. I have some solr
 extensions I've written and packaged in a jar which I place in:

 solr-home/lib

 as per:



 http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc

 This works well for my handlers but a custom Transformer I wrote and
 packaged the same way was throwing a ClassNotFoundException. I tracked
 it
 down to the DocBuilder.loadClass method which was just doing a
 Class.forName. Anyway, I fixed it for the moment by probably do
 something
 stupid and creating a SolrResourceLoader (which I imagine could be an
 instance variable, but at 3am I just wanted to get it working). Anyway,
 this
 fixes the problem:

 @SuppressWarnings("unchecked")
 static Class loadClass(String name) throws ClassNotFoundException {
  SolrResourceLoader loader = new SolrResourceLoader( null );
  return loader.findClass(name);
  // return Class.forName(name);
 }

 Brendan
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul


Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Brendan Grainger

Awesome thank you very much and thanks for the very useful contribution.

Brendan

On Jun 19, 2008, at 12:52 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



aah!. We always assumed that people put the custom jars in the
WEB-INF/lib folder of solr webapp and hence they are automatically in
the classpath we shall make the necessary changes  .
--Noble

On Thu, Jun 19, 2008 at 10:06 AM, Brendan Grainger
<[EMAIL PROTECTED]> wrote:

Hi,

I am actually providing the fully qualified classname in the  
configuration
and I was still getting a ClassNotFoundException. If you look at  
the code in
SolrResourceLoader they actually explicitly add the jars in solr- 
home/lib to

the classloader:

static ClassLoader createClassLoader(File f, ClassLoader loader) {
  if( loader == null ) {
loader = Thread.currentThread().getContextClassLoader();
  }
  if (f.canRead() && f.isDirectory()) {
File[] jarFiles = f.listFiles();
URL[] jars = new URL[jarFiles.length];
try {
  for (int j = 0; j < jarFiles.length; j++) {
jars[j] = jarFiles[j].toURI().toURL();
log.info("Adding '" + jars[j].toString() + "' to Solr
classloader");
  }
  return URLClassLoader.newInstance(jars, loader);
} catch (MalformedURLException e) {
  SolrException.log(log,"Can't construct solr lib class  
loader", e);

}
  }
  log.info("Reusing parent classloader");
  return loader;
}


This seems to be me to be why my class is now found when I include my
utilities jar in solr-home/lib.

Thanks
Brendan

On Jun 18, 2008, at 11:49 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



hi,
DIH does not load class using the SolrResourceLoader. It tries a
Class.forName() with the name you provide if it fails it prepends
"org.apache.solr.handler.dataimport." and retries.

This is true for not just transformers but also for Entityprocessor,
DataSource and Evaluator

The reason for doing so is that we do not use any of the 'solr.'
packages in DIH. All our implementations fall into the default  
package

and we can directly use them w/o the package name.

So , if you are writing your own implementations use the default
package or provide the fully qualified class name.

--Noble

On Thu, Jun 19, 2008 at 8:09 AM, Jon Baer <[EMAIL PROTECTED]> wrote:


Thanks.  Yeah took me a while to figure out I needed to do  
something like
transformer="com.mycompany.solr.MyTransformer" on the entity  
before it

would
work ...

- Jon

On Jun 18, 2008, at 1:51 PM, Brendan Grainger wrote:


Hi,

I set up the new DataimportHandler last night to replace some  
custom

import code I'd written and so far I'm loving it thank you.

I had one issue you might want to know about it. I have some solr
extensions I've written and packaged in a jar which I place in:

solr-home/lib

as per:



http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc

This works well for my handlers but a custom Transformer I wrote  
and
packaged the same way was throwing a ClassNotFoundException. I  
tracked

it
down to the DocBuilder.loadClass method which was just doing a
Class.forName. Anyway, I fixed it for the moment by probably do
something
stupid and creating a SolrResourceLoader (which I imagine could  
be an
instance variable, but at 3am I just wanted to get it working).  
Anyway,

this
fixes the problem:

@SuppressWarnings("unchecked")
static Class loadClass(String name) throws  
ClassNotFoundException {

SolrResourceLoader loader = new SolrResourceLoader( null );
return loader.findClass(name);
// return Class.forName(name);
}

Brendan







--
--Noble Paul







--
--Noble Paul




Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Chris Hostetter

: aah!. We always assumed that people put the custom jars in the
: WEB-INF/lib folder of solr webapp and hence they are automatically in
: the classpath we shall make the necessary changes  .

It would be better to use the classloader from the SolrResourceLoader ... 
that should be safe for anyone with any setup. 

>> DIH does not load class using the SolrResourceLoader. It tries a
>> Class.forName() with the name you provide if it fails it prepends
>> "org.apache.solr.handler.dataimport." and retries.
...
>> The reason for doing so is that we do not use any of the 'solr.'
>> packages in DIH. All our implementations fall into the default package
>> and we can directly use them w/o the package name.

FWIW: there isn't relaly a "solr." package ... "solr." can be used as 
an short form alias for the "likely package" when Solr resolves classes, 
where the "likely package" varies by context and there can be multiple 
options that it tries in order

DIH could do the same thing, letting short form "solr." signify that
Transformers, Evaluators, etc are in the o.a.s.handler.dataimport package.

the advantage of this over what it sounds like DIH currently does is that 
if there is an o.a.s.handler.dataimport.WizWatTransformer but someone 
wants to write their own (package less) WizWatTransformer they can and 
refer to it simply as "WizWatTransformer" (whereas to use the one that 
ships with DIH they would specify "solr.WizWatTransformer").  There's no 
ambiguity as to which one someone means unless they create a package 
called "solr" ... but then they'ed just be looking for trouble :)



-Hoss



Re: Seeking suggestions - keyword related site promotion

2008-06-18 Thread Stephen Weiss
Is there a fixed set of keywords?  If so, I suppose you could simply  
index these keywords into a field for each site (either through some  
kind of automatic parser or manually - from personal experience I  
would recommend manually unless you have tens of thousands of these  
things), and then search that field with each word in the query (with  
or).  Any site that had one of these keywords would match it if it  
were used in the query...


If there is no list here and you're just indexing all the content of  
all these sites... isn't that what Nutch is designed for?


--
Steve

On Jun 18, 2008, at 11:05 PM, JLIST wrote:


Hi all,

This is what I'm trying to do: since some sources (say,
some web sites) are more authoritative than other sources
on certain subjects, I'd like to promote those sites when
the query contains certain keywords. I'm not sure what
is the best way to implement this. I suppose I can index
the keywords in a field for all pages from that site but
this isn't very efficient, and any changes in the keyword
list would require re-indexing all pages of that site.
I wonder if there is a more efficient way that can dynamically
promote sites from a domain that is considered more related
to the queries. Any suggestion is welcome.

Thanks,
Jack





Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
We plan to use SolrResourceLoader (in the next patch) . That is the
best way to go.

But we still prefer the usage of DIH package classes without any prefix.
type="HttpDataSource"
instead of
type="solr.HttpDataSource"

But users must be able to load their classes using the "solr." format
--Noble


On Thu, Jun 19, 2008 at 10:57 AM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
>
> : aah!. We always assumed that people put the custom jars in the
> : WEB-INF/lib folder of solr webapp and hence they are automatically in
> : the classpath we shall make the necessary changes  .
>
> It would be better to use the classloader from the SolrResourceLoader ...
> that should be safe for anyone with any setup.
>
>>> DIH does not load class using the SolrResourceLoader. It tries a
>>> Class.forName() with the name you provide if it fails it prepends
>>> "org.apache.solr.handler.dataimport." and retries.
>...
>>> The reason for doing so is that we do not use any of the 'solr.'
>>> packages in DIH. All our implementations fall into the default package
>>> and we can directly use them w/o the package name.
>
> FWIW: there isn't relaly a "solr." package ... "solr." can be used as
> an short form alias for the "likely package" when Solr resolves classes,
> where the "likely package" varies by context and there can be multiple
> options that it tries in order
>
> DIH could do the same thing, letting short form "solr." signify that
> Transformers, Evaluators, etc are in the o.a.s.handler.dataimport package.
>
> the advantage of this over what it sounds like DIH currently does is that
> if there is an o.a.s.handler.dataimport.WizWatTransformer but someone
> wants to write their own (package less) WizWatTransformer they can and
> refer to it simply as "WizWatTransformer" (whereas to use the one that
> ships with DIH they would specify "solr.WizWatTransformer").  There's no
> ambiguity as to which one someone means unless they create a package
> called "solr" ... but then they'ed just be looking for trouble :)
>
>
>
> -Hoss
>
>



-- 
--Noble Paul