Re: i wanna find one crawl that can crawl with defined urls and defined data

2007-04-30 Thread Graeme Merrall

i wanna crawl http://www.amazone.com/  and just wanna product title ,
product information, writer, publisher.

and other data i wanna ignore.


How about 
http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html

or if you're prepared to wait or help out there's
http://svn.apache.org/repos/asf/labs/droids/README.TXT


numFound for facet results

2007-04-30 Thread mirko
Hi,

could you tell me what is the (simplest|elegant|fast) way of implementing
the following:

I use faceted browsing, but I limit the number of facet counts to 5 (i.e.,
facet.limit=5).

1. I would like to be able to show if there are more facet values
(this can be achieved with the trick for asking 6 values and only displaying 5
and if the 6th is non-empty obviously there are more than 5 :)

2. I would like to be able to tell how many facet values are there
total.  (This would be a value like numFound for the results).
Is there such a thing or a workaround like for 1.

thanks,
mirko


Re: numFound for facet results

2007-04-30 Thread Yonik Seeley

On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

could you tell me what is the (simplest|elegant|fast) way of implementing
the following:

I use faceted browsing, but I limit the number of facet counts to 5 (i.e.,
facet.limit=5).

1. I would like to be able to show if there are more facet values
(this can be achieved with the trick for asking 6 values and only displaying 5
and if the 6th is non-empty obviously there are more than 5 :)


That's a decent workaround.


2. I would like to be able to tell how many facet values are there
total.  (This would be a value like numFound for the results).
Is there such a thing or a workaround like for 1.


Number of facet values in the field (independent of the query), or
number of non-zero facet counts for the particular query?
The former will be relatively easy, the latter can't really be done
that efficiently.

-Yonik


Re: sorting by matched field, then title alpha

2007-04-30 Thread Simon Kahl


You can approximate it by doing something like:
A:"phrase"^10 B:"phrase"^1 C:"phrase"^1000 D:"phrase"^100
E:"phrase"^30 



Thanks for suggestion Mike.  I tried boosting like this but all docs get
slightly different scores (because of tf, idf etc) and so secondary sort on
field X has no impact.  I'm thinking now I will try a custom
SortComparatorSource impl (based on DistanceComparatorSource in Lucene In
Action) using fixed values corresponding to matches in field A, B or C etc.
Will then use that as primary sort followed by the secondary sort on field
X.

Think I will have to modify o.a.s.s.QueryParsing.parseSort to hook in custom
sort.  Is there any better way?

Kind Regards,
Simon


Re: resin faile to start with solr.

2007-04-30 Thread Ken Krugler

2007/4/29, Ken Krugler <[EMAIL PROTECTED]>:



now i test the newest solr (nothing modified)

i failed to start solr with resin 3.0


1. Which exact version of Resin? Still 3.0.23?



3.0.23

2. Just to confirm, you uncommented out the lines in web.xml

mentioned previously?



just newest solr's web.xml. i  not modifie it.


Try uncommenting out the lines in the web.xml and see if that fixes 
your problem.


-- Ken



 >2007/4/28, James liu <[EMAIL PROTECTED]>:


yes, i tried and failed.

afternoon i will redownload solr and test .

2007/4/28, Bill Au <[EMAIL PROTECTED]>:


  Have you tried using the schema.xml that is in example/solr/conf.  It
  that
  works then the problem is definitely in your schema.xml.

  Bill

  On 4/26/07, James liu < [EMAIL PROTECTED]> wrote:
  >
  > but it is ok when i use tomcat.
  >
  > 2007/4/26, Ken Krugler <[EMAIL PROTECTED]>:
  > >
  > > >3.0.23 yesterday i try and fail.
  > > >
  > > >which version u use,,,i just not use pro version.
  > >
  > > From the error below, either your schema.xml file is messed up,

or

  it
  > > might be that you still need to uncomment out the lines at the
  > > beginning of the web.xml file.
  > >
  > > These are the ones that say "Uncomment if you are trying to use a
  > > Resin version before 3.0.19"). Even though you're using a later
  > > version of Resin, I've had lots of issues with their XML parsing.
  > >
  > > -- Ken
  > >
  > >
  > >
  > > >
  > > >2007/4/26, Bill Au <[EMAIL PROTECTED]>:
  > > >>
  > > >>Have you tried resin 3.0.x?  3.1 is a development branch so it

is

  less
  > > >>stable as 3.0.
  > > >>
  > > >>Bill
  > > >>
  > > >>On 4/19/07, James liu <[EMAIL PROTECTED] > wrote:
  > > >>>
  > > >>>  It work well when i use tomcat with solr
  > > >>>
  > > >>>  now i wanna test resin,,,i use resin-3.1.0
  > > >>>
  > > >>>  now it show me
  > > >>>
  > > >>>  [03:47:34.047] WebApp[http://localhost:8080] starting
  > > >>>  [03:47:34.691 ] WebApp[http://localhost:8080/resin-doc]
  starting
  > > >>>  [03:47:34.927] WebApp[http://localhost:8080/solr1] starting
  > > >>>  [03:47:35.051] SolrServlet.init()
  > > >>>  [03:47:35.077] Solr home set to '/usr/solrapp/solr1/'
  > > >>>  [03:47:35.077] user.dir=/tmp/resin-3.1.0/bin
  > > >>>  [03:47:35.231] Loaded SolrConfig: solrconfig.xml
  > > >>>  [03:47:35.522] adding requestHandler standard=
  > > >>solr.StandardRequestHandler
  > > >>>  [03:47: 35.621] adding requestHandler dismax=
  > solr.DisMaxRequestHandler
  > > >>>  [03:47:35.692] adding requestHandler partitioned=
  > > >>solr.DisMaxRequestHandler
  > > >>>  [03:47: 35.721] adding requestHandler instock=
  > > solr.DisMaxRequestHandler
  > > >>>  [03:47:35.819] Opening new SolrCore at /usr/solrapp/solr1/,
  > > >>>  dataDir=/usr/solrapp/solr1/data
  > > >>>  [03:47:35.884] Reading Solr Schema
  > > >>  > [03:47:35.916] Schema name=example
  > > >>>  [03:47:35.929] org.apache.solr.core.SolrException: Schema
  Parsing
  > > Failed
  > > >>  > [03:47:35.929]  at
  org.apache.solr.schema.IndexSchema.readConfig(
  > > >>>  IndexSchema.java:441)
  > > >>>  [03:47:35.929]  at org.apache.solr.schema.IndexSchema.(
  > > >>>  IndexSchema.java:69)
  > > >>>  [03:47:35.929]  at org.apache.solr.core.SolrCore.(
  > SolrCore.java
  > > >>:191)
  > > >>>
  > > >>>
  > > >>>
  > > >>>  --
  > > >>>  regards
  > > >>  > jl
  > >
  > > --
  > > Ken Krugler
  > > Krugle, Inc.
  > > +1 530-210-6378
  > > "Find Code, Find Answers"
  > >
  >
  >
  >
  > --
  > regards
  > jl
  >





--
regards
jl





--
regards
jl



--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"





--
regards
jl



--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"


RE: EmbeddedSolr class from Wiki

2007-04-30 Thread Chris Hostetter

: :you could even have the postCommit hook of your writer trigger a commit
: :call on your readers so they reopen the newly updated index.
:
: Thanks, I need "separate JVMs" so "writer triggers a commit call on readers"
: is slightly unclear... I want to use separate applications, webmodule with
: reader, and standalone writer (it could be webmodule too, but with different
: JEE context; similar to separate JVMs).

postCommit and postOptimize hooks can be subclass of SolrEventListener so
you can trigger arbitrary jva code if you want to write your own (use JMS,
or make an HTTP call, whatever)

the RunExecutableListener that ships with Solr would be the easiest thing
to do ... just have it execute the "commit" command line script on your
slave (which will make it reopen the index you just modified)



-Hoss



Re: resin faile to start with solr.

2007-04-30 Thread Chris Hostetter

: >>1. Which exact version of Resin? Still 3.0.23?

: >2. Just to confirm, you uncommented out the lines in web.xml
: >>mentioned previously?

: Try uncommenting out the lines in the web.xml and see if that fixes
: your problem.

Ken: I'm not very familiar withteh problem you are describing, would you
mind adding a short section about it to the wiki? ..

http://wiki.apache.org/solr/SolrResin



-Hoss



Re: sorting by matched field, then title alpha

2007-04-30 Thread Chris Hostetter

: Think I will have to modify o.a.s.s.QueryParsing.parseSort to hook in custom
: sort.  Is there any better way?

If you write a custom SortComparatorSource, then the easiest way to use it
would probably be to write your own subclass of TextField and override the
getSortField method to construct a SortField that uses it.



-Hoss



Re: resin faile to start with solr.

2007-04-30 Thread Ryan McKinley

Chris Hostetter wrote:

: >>1. Which exact version of Resin? Still 3.0.23?

: >2. Just to confirm, you uncommented out the lines in web.xml
: >>mentioned previously?

: Try uncommenting out the lines in the web.xml and see if that fixes
: your problem.

Ken: I'm not very familiar withteh problem you are describing, would you
mind adding a short section about it to the wiki? ..

http://wiki.apache.org/solr/SolrResin




If you are running the trunk version, resin should start fine w/o any 
changes.


solr1.1 had xml parsing issues (even for resin post 3.0.19)
https://issues.apache.org/jira/browse/SOLR-92

Otherwise, uncomment the "resin 3.0.19" message in web.xml:


  






Re: numFound for facet results

2007-04-30 Thread Erik Hatcher


On Apr 30, 2007, at 11:16 AM, Yonik Seeley wrote:


On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


2. I would like to be able to tell how many facet values are there
total.  (This would be a value like numFound for the results).
Is there such a thing or a workaround like for 1.


Number of facet values in the field (independent of the query), or
number of non-zero facet counts for the particular query?
The former will be relatively easy, the latter can't really be done
that efficiently.


I'm sure that the need is the latter.   At least for me that would be  
helpful.


Even if the faceting had a feature to still compute all the facets  
but limit the response to a given amount and provide a total (given  
the current constraints) facet values it'd at least reduce the  
communication over the wire.  I think that'd make a big difference in  
performance for one of my applications where we have an unusually  
large number of facet values.


Erik



Delete from Solr index...

2007-04-30 Thread escher2k

I am trying to remove documents from my index using "delete by query".
However when I did this, the deleted
items seem to remain. This is the format of the XML file I am using -

load_id:20070424150841
load_id:20070425145301
load_id:20070426145301
load_id:20070427145302
load_id:20070428145301
load_id:20070429145301

When I do the deletes individually, it seems to work (i.e. create each of
the above in a separate file). Does this
mean that each delete query request has to be executed separately ?

Thanks.

-- 
View this message in context: 
http://www.nabble.com/Delete-from-Solr-index...-tf3673529.html#a10264940
Sent from the Solr - User mailing list archive at Nabble.com.



Faceted count syntax (exclude zeros)...

2007-04-30 Thread escher2k

I am trying to execute a faceted count on a field called "load_id" and want
to exclude 0s. The URL below
doesn't seem to be excluding zeros. 
http://localhost:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=load_id&facet=true&facet.limit=-1&facet.field=load_id&facet.mincount=1&rows=0

Result (relevant part of XML):


   0
   0
   80
   81
   77
   62
   31061
  


Thanks.
-- 
View this message in context: 
http://www.nabble.com/Faceted-count-syntax-%28exclude-zeros%29...-tf3673535.html#a10264961
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete from Solr index...

2007-04-30 Thread Ryan McKinley

escher2k wrote:

I am trying to remove documents from my index using "delete by query".
However when I did this, the deleted
items seem to remain. This is the format of the XML file I am using -

load_id:20070424150841
load_id:20070425145301
load_id:20070426145301
load_id:20070427145302
load_id:20070428145301
load_id:20070429145301

When I do the deletes individually, it seems to work (i.e. create each of
the above in a separate file). Does this
mean that each delete query request has to be executed separately ?



correct, delete (unlike ) only accepts one command.

Just to note, if "load_id" is your unique key, you could also use:
 20070424150841

This will give you better performance and does not commit the changes 
until you explicitly send 


Re: Delete from Solr index...

2007-04-30 Thread escher2k

Thanks Ryan. I need to use query since I am deleting a range of documents.
>From your
comment, I wasn't sure if one doesn't need to do an explicit commit when
using delete by query.
Does delete by query not need an explicit commit.

Thanks.


ryan mckinley wrote:
> 
> escher2k wrote:
>> I am trying to remove documents from my index using "delete by query".
>> However when I did this, the deleted
>> items seem to remain. This is the format of the XML file I am using -
>> 
>> load_id:20070424150841
>> load_id:20070425145301
>> load_id:20070426145301
>> load_id:20070427145302
>> load_id:20070428145301
>> load_id:20070429145301
>> 
>> When I do the deletes individually, it seems to work (i.e. create each of
>> the above in a separate file). Does this
>> mean that each delete query request has to be executed separately ?
>> 
> 
> correct, delete (unlike ) only accepts one command.
> 
> Just to note, if "load_id" is your unique key, you could also use:
>   20070424150841
> 
> This will give you better performance and does not commit the changes 
> until you explicitly send 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Delete-from-Solr-index...-tf3673529.html#a10265040
Sent from the Solr - User mailing list archive at Nabble.com.



Specifying no-ops...

2007-04-30 Thread escher2k

I want to capture information about the user who is executing a particular
search. Is there a way to specify in Solr that certain fields should just be
treated as pass through and not processed ? This way I can use arbitrary
params to do better logging. 

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Specifying-no-ops...-tf3673559.html#a10265041
Sent from the Solr - User mailing list archive at Nabble.com.