Re: i wanna find one crawl that can crawl with defined urls and defined data
i wanna crawl http://www.amazone.com/ and just wanna product title , product information, writer, publisher. and other data i wanna ignore. How about http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html or if you're prepared to wait or help out there's http://svn.apache.org/repos/asf/labs/droids/README.TXT
numFound for facet results
Hi, could you tell me what is the (simplest|elegant|fast) way of implementing the following: I use faceted browsing, but I limit the number of facet counts to 5 (i.e., facet.limit=5). 1. I would like to be able to show if there are more facet values (this can be achieved with the trick for asking 6 values and only displaying 5 and if the 6th is non-empty obviously there are more than 5 :) 2. I would like to be able to tell how many facet values are there total. (This would be a value like numFound for the results). Is there such a thing or a workaround like for 1. thanks, mirko
Re: numFound for facet results
On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: could you tell me what is the (simplest|elegant|fast) way of implementing the following: I use faceted browsing, but I limit the number of facet counts to 5 (i.e., facet.limit=5). 1. I would like to be able to show if there are more facet values (this can be achieved with the trick for asking 6 values and only displaying 5 and if the 6th is non-empty obviously there are more than 5 :) That's a decent workaround. 2. I would like to be able to tell how many facet values are there total. (This would be a value like numFound for the results). Is there such a thing or a workaround like for 1. Number of facet values in the field (independent of the query), or number of non-zero facet counts for the particular query? The former will be relatively easy, the latter can't really be done that efficiently. -Yonik
Re: sorting by matched field, then title alpha
You can approximate it by doing something like: A:"phrase"^10 B:"phrase"^1 C:"phrase"^1000 D:"phrase"^100 E:"phrase"^30 Thanks for suggestion Mike. I tried boosting like this but all docs get slightly different scores (because of tf, idf etc) and so secondary sort on field X has no impact. I'm thinking now I will try a custom SortComparatorSource impl (based on DistanceComparatorSource in Lucene In Action) using fixed values corresponding to matches in field A, B or C etc. Will then use that as primary sort followed by the secondary sort on field X. Think I will have to modify o.a.s.s.QueryParsing.parseSort to hook in custom sort. Is there any better way? Kind Regards, Simon
Re: resin faile to start with solr.
2007/4/29, Ken Krugler <[EMAIL PROTECTED]>: now i test the newest solr (nothing modified) i failed to start solr with resin 3.0 1. Which exact version of Resin? Still 3.0.23? 3.0.23 2. Just to confirm, you uncommented out the lines in web.xml mentioned previously? just newest solr's web.xml. i not modifie it. Try uncommenting out the lines in the web.xml and see if that fixes your problem. -- Ken >2007/4/28, James liu <[EMAIL PROTECTED]>: yes, i tried and failed. afternoon i will redownload solr and test . 2007/4/28, Bill Au <[EMAIL PROTECTED]>: Have you tried using the schema.xml that is in example/solr/conf. It that works then the problem is definitely in your schema.xml. Bill On 4/26/07, James liu < [EMAIL PROTECTED]> wrote: > > but it is ok when i use tomcat. > > 2007/4/26, Ken Krugler <[EMAIL PROTECTED]>: > > > > >3.0.23 yesterday i try and fail. > > > > > >which version u use,,,i just not use pro version. > > > > From the error below, either your schema.xml file is messed up, or it > > might be that you still need to uncomment out the lines at the > > beginning of the web.xml file. > > > > These are the ones that say "Uncomment if you are trying to use a > > Resin version before 3.0.19"). Even though you're using a later > > version of Resin, I've had lots of issues with their XML parsing. > > > > -- Ken > > > > > > > > > > > >2007/4/26, Bill Au <[EMAIL PROTECTED]>: > > >> > > >>Have you tried resin 3.0.x? 3.1 is a development branch so it is less > > >>stable as 3.0. > > >> > > >>Bill > > >> > > >>On 4/19/07, James liu <[EMAIL PROTECTED] > wrote: > > >>> > > >>> It work well when i use tomcat with solr > > >>> > > >>> now i wanna test resin,,,i use resin-3.1.0 > > >>> > > >>> now it show me > > >>> > > >>> [03:47:34.047] WebApp[http://localhost:8080] starting > > >>> [03:47:34.691 ] WebApp[http://localhost:8080/resin-doc] starting > > >>> [03:47:34.927] WebApp[http://localhost:8080/solr1] starting > > >>> [03:47:35.051] SolrServlet.init() > > >>> [03:47:35.077] Solr home set to '/usr/solrapp/solr1/' > > >>> [03:47:35.077] user.dir=/tmp/resin-3.1.0/bin > > >>> [03:47:35.231] Loaded SolrConfig: solrconfig.xml > > >>> [03:47:35.522] adding requestHandler standard= > > >>solr.StandardRequestHandler > > >>> [03:47: 35.621] adding requestHandler dismax= > solr.DisMaxRequestHandler > > >>> [03:47:35.692] adding requestHandler partitioned= > > >>solr.DisMaxRequestHandler > > >>> [03:47: 35.721] adding requestHandler instock= > > solr.DisMaxRequestHandler > > >>> [03:47:35.819] Opening new SolrCore at /usr/solrapp/solr1/, > > >>> dataDir=/usr/solrapp/solr1/data > > >>> [03:47:35.884] Reading Solr Schema > > >> > [03:47:35.916] Schema name=example > > >>> [03:47:35.929] org.apache.solr.core.SolrException: Schema Parsing > > Failed > > >> > [03:47:35.929] at org.apache.solr.schema.IndexSchema.readConfig( > > >>> IndexSchema.java:441) > > >>> [03:47:35.929] at org.apache.solr.schema.IndexSchema.( > > >>> IndexSchema.java:69) > > >>> [03:47:35.929] at org.apache.solr.core.SolrCore.( > SolrCore.java > > >>:191) > > >>> > > >>> > > >>> > > >>> -- > > >>> regards > > >> > jl > > > > -- > > Ken Krugler > > Krugle, Inc. > > +1 530-210-6378 > > "Find Code, Find Answers" > > > > > > -- > regards > jl > -- regards jl -- regards jl -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers" -- regards jl -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers"
RE: EmbeddedSolr class from Wiki
: :you could even have the postCommit hook of your writer trigger a commit : :call on your readers so they reopen the newly updated index. : : Thanks, I need "separate JVMs" so "writer triggers a commit call on readers" : is slightly unclear... I want to use separate applications, webmodule with : reader, and standalone writer (it could be webmodule too, but with different : JEE context; similar to separate JVMs). postCommit and postOptimize hooks can be subclass of SolrEventListener so you can trigger arbitrary jva code if you want to write your own (use JMS, or make an HTTP call, whatever) the RunExecutableListener that ships with Solr would be the easiest thing to do ... just have it execute the "commit" command line script on your slave (which will make it reopen the index you just modified) -Hoss
Re: resin faile to start with solr.
: >>1. Which exact version of Resin? Still 3.0.23? : >2. Just to confirm, you uncommented out the lines in web.xml : >>mentioned previously? : Try uncommenting out the lines in the web.xml and see if that fixes : your problem. Ken: I'm not very familiar withteh problem you are describing, would you mind adding a short section about it to the wiki? .. http://wiki.apache.org/solr/SolrResin -Hoss
Re: sorting by matched field, then title alpha
: Think I will have to modify o.a.s.s.QueryParsing.parseSort to hook in custom : sort. Is there any better way? If you write a custom SortComparatorSource, then the easiest way to use it would probably be to write your own subclass of TextField and override the getSortField method to construct a SortField that uses it. -Hoss
Re: resin faile to start with solr.
Chris Hostetter wrote: : >>1. Which exact version of Resin? Still 3.0.23? : >2. Just to confirm, you uncommented out the lines in web.xml : >>mentioned previously? : Try uncommenting out the lines in the web.xml and see if that fixes : your problem. Ken: I'm not very familiar withteh problem you are describing, would you mind adding a short section about it to the wiki? .. http://wiki.apache.org/solr/SolrResin If you are running the trunk version, resin should start fine w/o any changes. solr1.1 had xml parsing issues (even for resin post 3.0.19) https://issues.apache.org/jira/browse/SOLR-92 Otherwise, uncomment the "resin 3.0.19" message in web.xml:
Re: numFound for facet results
On Apr 30, 2007, at 11:16 AM, Yonik Seeley wrote: On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: 2. I would like to be able to tell how many facet values are there total. (This would be a value like numFound for the results). Is there such a thing or a workaround like for 1. Number of facet values in the field (independent of the query), or number of non-zero facet counts for the particular query? The former will be relatively easy, the latter can't really be done that efficiently. I'm sure that the need is the latter. At least for me that would be helpful. Even if the faceting had a feature to still compute all the facets but limit the response to a given amount and provide a total (given the current constraints) facet values it'd at least reduce the communication over the wire. I think that'd make a big difference in performance for one of my applications where we have an unusually large number of facet values. Erik
Delete from Solr index...
I am trying to remove documents from my index using "delete by query". However when I did this, the deleted items seem to remain. This is the format of the XML file I am using - load_id:20070424150841 load_id:20070425145301 load_id:20070426145301 load_id:20070427145302 load_id:20070428145301 load_id:20070429145301 When I do the deletes individually, it seems to work (i.e. create each of the above in a separate file). Does this mean that each delete query request has to be executed separately ? Thanks. -- View this message in context: http://www.nabble.com/Delete-from-Solr-index...-tf3673529.html#a10264940 Sent from the Solr - User mailing list archive at Nabble.com.
Faceted count syntax (exclude zeros)...
I am trying to execute a faceted count on a field called "load_id" and want to exclude 0s. The URL below doesn't seem to be excluding zeros. http://localhost:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=load_id&facet=true&facet.limit=-1&facet.field=load_id&facet.mincount=1&rows=0 Result (relevant part of XML): 0 0 80 81 77 62 31061 Thanks. -- View this message in context: http://www.nabble.com/Faceted-count-syntax-%28exclude-zeros%29...-tf3673535.html#a10264961 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete from Solr index...
escher2k wrote: I am trying to remove documents from my index using "delete by query". However when I did this, the deleted items seem to remain. This is the format of the XML file I am using - load_id:20070424150841 load_id:20070425145301 load_id:20070426145301 load_id:20070427145302 load_id:20070428145301 load_id:20070429145301 When I do the deletes individually, it seems to work (i.e. create each of the above in a separate file). Does this mean that each delete query request has to be executed separately ? correct, delete (unlike ) only accepts one command. Just to note, if "load_id" is your unique key, you could also use: 20070424150841 This will give you better performance and does not commit the changes until you explicitly send
Re: Delete from Solr index...
Thanks Ryan. I need to use query since I am deleting a range of documents. >From your comment, I wasn't sure if one doesn't need to do an explicit commit when using delete by query. Does delete by query not need an explicit commit. Thanks. ryan mckinley wrote: > > escher2k wrote: >> I am trying to remove documents from my index using "delete by query". >> However when I did this, the deleted >> items seem to remain. This is the format of the XML file I am using - >> >> load_id:20070424150841 >> load_id:20070425145301 >> load_id:20070426145301 >> load_id:20070427145302 >> load_id:20070428145301 >> load_id:20070429145301 >> >> When I do the deletes individually, it seems to work (i.e. create each of >> the above in a separate file). Does this >> mean that each delete query request has to be executed separately ? >> > > correct, delete (unlike ) only accepts one command. > > Just to note, if "load_id" is your unique key, you could also use: > 20070424150841 > > This will give you better performance and does not commit the changes > until you explicitly send > > -- View this message in context: http://www.nabble.com/Delete-from-Solr-index...-tf3673529.html#a10265040 Sent from the Solr - User mailing list archive at Nabble.com.
Specifying no-ops...
I want to capture information about the user who is executing a particular search. Is there a way to specify in Solr that certain fields should just be treated as pass through and not processed ? This way I can use arbitrary params to do better logging. Thanks. -- View this message in context: http://www.nabble.com/Specifying-no-ops...-tf3673559.html#a10265041 Sent from the Solr - User mailing list archive at Nabble.com.