Re: Looking for help with Solr implementation

2011-03-02 Thread composite
Hi, I am freelancer based in New Delhi, India. I have just completed a project in Apache-solr, for a bioinformatics company. The project involved, among other things, importing 46 million records from mysql database to create solr indexes and developing a user interface for doing searches (with au

Re: memory leak during undeploying

2011-03-02 Thread Chris Hostetter
: When I did heap analysis, the culprit always seems to : be TimeLimitedCollector thread. Because of this, considerable amount of : classes are not getting unloaded. ... : > > There are couple of JIRA's related to this: : > > https://issues.apache.org/jira/browse/LUCENE-2237, : > > https:/

RE: Solr under Tomcat

2011-03-02 Thread Thumuluri, Sai
Thank you - I found it. -Original Message- From: rajini maski [mailto:rajinima...@gmail.com] Sent: Thursday, March 03, 2011 12:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr under Tomcat Sai, The index directory will be in your Solr_home//Conf//data directory.. The path f

Re: Solr under Tomcat

2011-03-02 Thread rajini maski
Sai, The index directory will be in your Solr_home//Conf//data directory.. The path for this directory need to be given where ever you want to by changing the data-dir path in config XML that is present in the same //conf folder . You need to stop tomcat service to delete this directory and t

RE: Understanding multi-field queries with q and fq

2011-03-02 Thread Bob Sandiford
Have you looked at the 'qf' parameter? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com  _ http://www.cosugi.org/  > -Original Message- > From: mrw [mailto:mikerobertsw...@gmail.com] > Sent: Wednesday, March

Re: MLT with boost

2011-03-02 Thread Koji Sekiguchi
(11/03/03 2:54), Mark wrote: High level overview. We have items and we have sellers. The scoring of our documents is such that our boost functions outweight the pure lucene term/query scoring. Our boost functions basically take into account how "good" the seller is. Now for MLT searches we wou

Re: multiple localParams for each query clause

2011-03-02 Thread Roman Chyla
Thanks Jonathan, this will be useful -- in the meantime, I have implemented the query rewriting, using the QueryParsing.toString() utility as an example. On Wed, Mar 2, 2011 at 5:40 PM, Jonathan Rochkind wrote: > Not per clause, no. But you can use the "nested queries" feature to set > local para

Re: sort by price puts unknown prices first

2011-03-02 Thread Yonik Seeley
On Wed, Mar 2, 2011 at 4:19 PM, Scott K wrote: > On Wed, Mar 2, 2011 at 12:21, Chris Hostetter > wrote: >> historicly it has been because of a fundemental limitation in how the >> Lucene FieldCache has historicly worked where the array backed FieldCaches >> use the default numeric value (ie: 0)

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Chris Hostetter
: even in this WIP-State? if so .. i'll try one tomorrow evening after work When in doubt, remember Yonik's Law Of Patches... http://wiki.apache.org/solr/HowToContribute?highlight=law+of+patches#Contributing_Code_.28Features.2C_Big_Fixes.2C_Tests.2C_etc29 A half-baked patch in Jira,

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Robert Muir
On Wed, Mar 2, 2011 at 5:34 PM, Stefan Matheis wrote: > Robert, > > even in this WIP-State? if so .. i'll try one tomorrow evening after work > Its totally up to you, sometimes it can be useful to upload a partial or WIP solution to an issue: as Hoss mentioned its a good way to get feedback and a

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Stefan Matheis
mrw, you mean a field like here (http://files.mathe.is/solr-admin/02_query.png) on the right side, between meta-navigation and plain solr-xml response? actually it's just to display the computed url, but if so .. we could use a larger field for that, of course :) Regards Stefan Am 02.03.2

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Stefan Matheis
Robert, even in this WIP-State? if so .. i'll try one tomorrow evening after work Regards Stefan Am 02.03.2011 22:02, schrieb Robert Muir: On Wed, Mar 2, 2011 at 3:47 PM, Stefan Matheis wrote: Any Questions, Ideas, Thoughts outta there? Please, let me know :) My only question would be:

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Chris Hostetter
: given that fact that my java-knowledge is sort of non-existing .. my idea was : to rework the Solr Admin Interface. Contributions of all kinds are welcome! : Actually it's completly work-in-progress .. but i'm interested in what you : guys think. Right direction? Completly Wrong, just drop it?

Dismax, q, q.alt, and defaultSearchField?

2011-03-02 Thread mrw
We have two banks of Solr nodes with identical schemas. The data I'm searching for is in both banks. One has defaultSearchField set to field1, the other has defaultSearchField set to field2. We need to support both user queries and facet queries that have no user content. For the latter, it app

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread mrw
Looks nice. Might be also worth it to create a page with large query field for pasting in complete URL-encoded queries that cross cores, etc. I did that at work (via ASP.net) so we could paste in queries from logs and debug them. We tend to use that quite a bit. Cheers Stefan Matheis wrote:

Re: memory leak during undeploying

2011-03-02 Thread Search Learn
Thanks for the suggestions. Tomcat does release permgen memory with appropriate jvm options and configuration settings ( clearReferencesStopTimerThreads, clearReferencesThreadLocals). When I did heap analysis, the culprit always seems to be TimeLimitedCollector thread. Because of this, considerable

Re: sort by price puts unknown prices first

2011-03-02 Thread Scott K
On Wed, Mar 2, 2011 at 12:21, Chris Hostetter wrote: > historicly it has been because of a fundemental limitation in how the > Lucene FieldCache has historicly worked where the array backed FieldCaches > use the default numeric value (ie: 0) when docs have no value (but in the > case of Strings, t

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Robert Muir
On Wed, Mar 2, 2011 at 3:47 PM, Stefan Matheis wrote: > Any Questions, Ideas, Thoughts outta there? Please, let me know :) > My only question would be: would you mind creating a JIRA issue with your modifications? I was just yesterday looking at this admin stuff and thinking, man this could rea

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Stefan Matheis
Hey Markus, actually it's CC BY 3.0 - Yusuke Kamiyamane created the "Fugue Icons" (http://p.yusukekamiyamane.com/) Regards Stefan Am 02.03.2011 21:46, schrieb Markus Jelsma: Nice! It makes multi core navigation a lot easier. What license do the icons have? Hi List, given that fact that my

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Chris Hostetter
: I wonder if what doesn't work is trying to set an explicit relative path : there, instead of using the baked in default "data". If you set an explicit : relative path, is it relative to the current core solr.home, or to the main : solr.home? it's realtive the current working dir of the process

Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Markus Jelsma
Nice! It makes multi core navigation a lot easier. What license do the icons have? > Hi List, > > given that fact that my java-knowledge is sort of non-existing .. my > idea was to rework the Solr Admin Interface. > > Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, > bu

Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread Stefan Matheis
Hi List, given that fact that my java-knowledge is sort of non-existing .. my idea was to rework the Solr Admin Interface. Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, but it was an idea few weeks ago - and i would like to contrib something, a thing which has to b

Re: memory leak during undeploying

2011-03-02 Thread Markus Jelsma
Hi, I remember reading somewhere that undeploying an application in Tomcat won't release memory, thus repeating the cycle will indeed exhaust the permgen. You could enable garbage collection of the permgen. HotSpot can do this for you but it depends on using CMS which you might not want to us

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Jonathan Rochkind
I wonder if what doesn't work is trying to set an explicit relative path there, instead of using the baked in default "data". If you set an explicit relative path, is it relative to the current core solr.home, or to the main solr.home? Let's try it to see Yep, THAT's what doesn't work, an

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
That's great, just what I needed, I was debugging and was expecting to see something like this. i'll look through the SVN history to see in which version it was added. Thanks On Wednesday, March 2, 2011, Yonik Seeley wrote: > On Wed, Mar 2, 2011 at 2:43 PM, Ofer Fort wrote: >> I didn't see this

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Jonathan Rochkind
Meanwhile, I'm having trouble getting the expected behavior at all. I'll try to give the right details (without overwhelming with too many), if anyone can see what's going on. Solr 1.4.1. Multi-core. 'Main' solr home with solr.xml at /opt/solr/solr_indexer/solr.xml The solr.xml includes actu

Re: dismax query with no/empty/*:* q parameter?

2011-03-02 Thread mrw
Ah...so I need to be doing &q.alt=*:* &fq=:. Of course, now that you showed me what I look for, I also see the explanation in the Packt book. Sheesh. Thanks for the tip! Chris Hostetter-3 wrote: > > : For standard query handler fq-only queries, we used q=*:*. However, > with > : dismax, t

Re: memory leak during undeploying

2011-03-02 Thread François Schiettecatte
Hi I get the same problem on tomcat with other applications, so this does not appear to be limited to SOLR. I got the error on tomcat 6 and 7. The only solution I found was to kill tomcat and start it again. François On Mar 2, 2011, at 2:28 PM, Search Learn wrote: > Hello, > We currently depl

Re: memory leak during undeploying

2011-03-02 Thread François Schiettecatte
Hi I get the same problem on tomcat with other applications, so this does not appear to be limited to SOLR. I got the error on tomcat 6 and 7. The only solution I found was to kill tomcat and start it again. François On Mar 2, 2011, at 2:28 PM, Search Learn wrote: > Hello, > We currently depl

Re: sort by price puts unknown prices first

2011-03-02 Thread Chris Hostetter
: When I sort by price ascending, documents with no price are listed : first. I would like them listed last. I tried adding the : sortMissingLast flag, even though it says it is only for strings, but it works for any field type *backed* by a string, including the SortableIntField (and it's breat

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
I didn't see this behavior, running solr 1.4.1, was that implemented after this release? On Wednesday, March 2, 2011, Yonik Seeley wrote: > On Wed, Mar 2, 2011 at 1:58 PM, Ofer Fort wrote: >> Thanks, >> But each query tries to see if there is something new since the last result >> that was found

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Mike Sokolov
Yes - I commented out the element in solrconfig.xml and then got the expected behavior: the core used a data subdirectory in the core subdirectory. It seems like the problem arises from using the solrconfig.xml that's distributed as example/solr/conf/solrconfig.xml The solrconfig.xml's in

Re: Understanding multi-field queries with q and fq

2011-03-02 Thread Sujit Pal
This could probably be done using a custom QParser plugin? Define the pattern like this: String queryTemplate = "title:%Q%^2.0 body:%Q%"; then replace the %Q% with the value of the Q param, send it through QueryParser.parse() and return the query. -sujit On Wed, 2011-03-02 at 11:28 -0800, mrw

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
I'm guessing what i was describing is a short-circuit evaluation and i see that lucene doesn't have it: http://lucene.472066.n3.nabble.com/Short-circuit-in-query-td738551.html Still would love to hear any suggestions for my type of query ofer On Wed, Mar 2, 2011 at 8:58 PM, Ofer Fort wrote: >

memory leak during undeploying

2011-03-02 Thread Search Learn
Hello, We currently deploy and undeploy solr web application potentially hundred's of times during a typical day. when the solr is undeployed, its classes are not getting unloaded and eventually we are running into permgen error. There are couple of JIRA's related to this: https://issues.apache.org

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
Thanks, But each query tries to see if there is something new since the last result that was found, so rounding things will return the same documents over and over again, till we reach to the next rounded point. Could i use the document id somehow? or something else that's bigger than my last se

Re: Formatting the XML returned

2011-03-02 Thread Markus Jelsma
If you're confortable with XSL you can create a transformer and use Solr's XSLTResponseWriter to do the job. http://wiki.apache.org/solr/XsltResponseWriter > Hi all, > > This list has proven itself quite useful since I got started with Solr. I'm > wondering if it is possible to dictate the XML t

sort by price puts unknown prices first

2011-03-02 Thread Scott K
When I sort by price ascending, documents with no price are listed first. I would like them listed last. I tried adding the sortMissingLast flag, even though it says it is only for strings, but it did not help. Why doesn't sortMissingLast work on non-strings? This seems like a very common issue, bu

Formatting the XML returned

2011-03-02 Thread Brian Lamb
Hi all, This list has proven itself quite useful since I got started with Solr. I'm wondering if it is possible to dictate the XML that is returned by a search? Right now it seems very inefficient in that it is formatted like: Val Val Etc. I would like to change it so that it reads something li

Re: dismax query with no/empty/*:* q parameter?

2011-03-02 Thread Chris Hostetter
: For standard query handler fq-only queries, we used q=*:*. However, with : dismax, that returns 0 results. Are fq-only queries possible with dismax? they are if you use the q.alt param. http://wiki.apache.org/solr/DisMaxQParserPlugin#q.alt -Hoss

Re: Efficient boolean query

2011-03-02 Thread Yonik Seeley
On Wed, Mar 2, 2011 at 2:43 PM, Ofer Fort wrote: > I didn't see this behavior, running solr 1.4.1, was that implemented > after this release? I think so. It's implemented now in BooleanWeight.scorer() for (Weight w : weights) { BooleanClause c = cIter.next(); Scorer subSc

Re: Understanding multi-field queries with q and fq

2011-03-02 Thread mrw
Anyone understand how to do boolean logic across multiple fields? Dismax is nice for searching multiple fields, but doesn't necessarily support our syntax requirements. eDismax appears to be not available until Solr 3.1. In the meantime, it looks like we need to support applying the user's q

Re: Efficient boolean query

2011-03-02 Thread Yonik Seeley
On Wed, Mar 2, 2011 at 1:58 PM, Ofer Fort wrote: > Thanks, > But each query tries to see if there is something new since the last result > that was found, so rounding things will return the same documents over  and > over again, till we reach to the next rounded point. > > Could i use the document

'Registering' a query / Percolation

2011-03-02 Thread Baillie, Robert
Hi, I wondered if anyone knew if there are capabilities in Solr to 'register' a query much like Elasticsearch's 'percolation' functionality. I.E. Instruct Solr that you are interested in documents that match a given query and then have Solr notify you (through whatever callback mechanism is speci

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
timestamp is of type: On Wed, Mar 2, 2011 at 8:11 PM, Ofer Fort wrote: > you are correct that my query is a tange one, probably should have > mentioned it in the first post. > this is the debug data: > > > > > > 0 > 4173 > > on > on > > 0 > timestamp:[2011-02-01T00:00:00Z TO NO

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
you are correct that my query is a tange one, probably should have mentioned it in the first post. this is the debug data: 0 4173 on on 0 timestamp:[2011-02-01T00:00:00Z TO NOW] AND oferiko 2.2 10 timestamp:[2011-02-01T00:00:00Z TO NOW] AND oferiko timestamp:[2011-02-0

dismax query with no/empty/*:* q parameter?

2011-03-02 Thread mrw
For standard query handler fq-only queries, we used q=*:*. However, with dismax, that returns 0 results. Are fq-only queries possible with dismax? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-query-with-no-empty-q-parameter-tp2619170p2619170.html Sen

Re: MLT with boost

2011-03-02 Thread Mark
mlt.boost - [true/false] set if the query will be boosted by the interesting term relevance. This is not the same as boost functions: http://wiki.apache.org/solr/DisMaxQParserPlugin#bf_.28Boost_Functions.29 On 3/2/11 7:45 AM, Markus Jelsma wrote: There is a mlt.boost parameter. On Wednesda

Re: MLT with boost

2011-03-02 Thread Mark
High level overview. We have items and we have sellers. The scoring of our documents is such that our boost functions outweight the pure lucene term/query scoring. Our boost functions basically take into account how "good" the seller is. Now for MLT searches we would like to incorporate this s

Re: Efficient boolean query

2011-03-02 Thread Yonik Seeley
On Wed, Mar 2, 2011 at 12:11 PM, Ofer Fort wrote: > Hey all, > I have an index with a lot of documents with the term X and no documents > with the term Y. > If i query for X it take a few seconds and returns the results. > If I query for Y it takes a millisecond and returns an empty set. > If i qu

Re: Solr Sharding and idf

2011-03-02 Thread Jae Joo
Yes, I knew that the ticket is still open. This is why I am looking for the solutions now. 2011/3/2 Tomás Fernández Löbbe > Hi Jae, this is the Jira created for the problem of IDF on distributed > search: > > https://issues.apache.org/jira/browse/SOLR-1632 > > It's still open > > On Wed, Mar 2,

Re: Efficient boolean query

2011-03-02 Thread Ofer Fort
Thanks, I tried it in the past and found out that my hit ratio was pretty low, so it doesn't help most of my queries ofer On Wed, Mar 2, 2011 at 7:16 PM, Geert-Jan Brits wrote: > If you often query X as part of several other queries (e.g: X | X AND Y | > X AND Z) > you might consider putting

Re: Efficient boolean query

2011-03-02 Thread Geert-Jan Brits
If you often query X as part of several other queries (e.g: X | X AND Y | X AND Z) you might consider putting X in a filter query ( http://wiki.apache.org/solr/CommonQueryParameters#fq) leading to: q=*:*&fq=X q=Y&fq=X q=Z&fq=X Filter queries are cached seperately which means that after the firs

Re: design help

2011-03-02 Thread Bill Bell
You might want to hire a consultant. Tika can deal with Word documents. Ids needs to be unique. One index might work, not sure based on your info below. For database you need to use a Java db thin connector to SQL server. Throw the jar in the lib directory and restart. Then setup dih settings t

Re: Boost function problem with disquerymax

2011-03-02 Thread Yonik Seeley
On Wed, Mar 2, 2011 at 11:34 AM, Gastone Penzo wrote: > HI, > for search i use disquery max > and a i want to boost a field with bf parameter like: > ...&bf=boost_has_img^5& > the boost_has_img field of my document is 3: > 3 > if i see the results in debug query mode i can see: >   0.0 = (MATC

Efficient boolean query

2011-03-02 Thread Ofer Fort
Hey all, I have an index with a lot of documents with the term X and no documents with the term Y. If i query for X it take a few seconds and returns the results. If I query for Y it takes a millisecond and returns an empty set. If i query for Y AND X it takes a few seconds and returns an empty set

Re: Solr Sharding and idf

2011-03-02 Thread Tomás Fernández Löbbe
Hi Jae, this is the Jira created for the problem of IDF on distributed search: https://issues.apache.org/jira/browse/SOLR-1632 It's still open On Wed, Mar 2, 2011 at 1:48 PM, Upayavira wrote: > As I understand it there is, and the best you can do is keep the same > number of docs per shard, an

Re: Solr Sharding and idf

2011-03-02 Thread Upayavira
As I understand it there is, and the best you can do is keep the same number of docs per shard, and keep your documents randomised across shards. That way you'll minimise the chances of suffering from distributed IDF issues. Upayavira On Wed, 02 Mar 2011 10:10 -0500, "Jae Joo" wrote: > Is there

Re: Problem - Help me with DataImport

2011-03-02 Thread Matias Alonso
Stefan, Thank you very much! It works perfect... Any idea for the other question? Someone? Matias. 2011/3/2 Stefan Matheis > Matias, > > for indexing constant/static values .. try > http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer > > Regards > Stefan > > On Wed, Mar 2, 201

Re: multiple localParams for each query clause

2011-03-02 Thread Jonathan Rochkind
Not per clause, no. But you can use the "nested queries" feature to set local params for each nested query instead. Which is in fact one of the most common use cases for local params. &q=_query_:"{type=x q.field=z}something" AND _query_:"{!type=database}something" URL encode that whole thin

Boost function problem with disquerymax

2011-03-02 Thread Gastone Penzo
HI, for search i use disquery max and a i want to boost a field with bf parameter like: ...&bf=boost_has_img^5& the boost_has_img field of my document is 3: 3 if i see the results in debug query mode i can see: 0.0 = (MATCH) FunctionQuery(int(boost_has_img)), product of: 0.0 = int(bo

Re: Indexed, but cannot search

2011-03-02 Thread Brian Lamb
So here's something interesting. I did a delta import this morning and it looks like I can do a global search across those fields. I'll do another full import and see if that fixed the problem. I had done a fullimport after making this change but it seems like another reindex is in order. On Wed,

Re: solr different sizes on master and slave

2011-03-02 Thread Mike Franon
Thanks you very much for this info, that helps a lot! On Wed, Mar 2, 2011 at 10:05 AM, Jayendra Patil wrote: > Hi Mike, > > There was an issue with the Snappuller wherein it fails to clean up > the old index directories on the slave side. > https://issues.apache.org/jira/browse/SOLR-2156 > > Th

Re: MLT with boost

2011-03-02 Thread Markus Jelsma
There is a mlt.boost parameter. On Wednesday 02 March 2011 16:28:35 dar...@ontrenet.com wrote: > I think what's being asked is obvious, in that, they want to modify the > sorted relevancy of the results of MLT. Where, instead of (or in addition > to) sorting by the mlt score, some modified functio

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Nestor Oviedo
Hi everyone! I've been following this thread and I realized we've constructed something similar to "Crawl Anywhere". The main difference is that our project is oriented to the digital libraries and digital repositories context. Specifically related to metadata collection from multiple sources, info

Re: MLT with boost

2011-03-02 Thread darren
I think what's being asked is obvious, in that, they want to modify the sorted relevancy of the results of MLT. Where, instead of (or in addition to) sorting by the mlt score, some modified function/subquery can be used to further sort the results. One example. You run a MLT query against a docum

Re: Indexed, but cannot search

2011-03-02 Thread Markus Jelsma
Please also provide analysis part of fieldType text. You can also use Luke to inspect the index. http://localhost:8983/solr/admin/luke?fl=globalField&numTerms=100 On Wednesday 02 March 2011 16:09:33 Brian Lamb wrote: > Here are the relevant parts of schema.xml: > > multiValued="true"/> > glob

multiple localParams for each query clause

2011-03-02 Thread Roman Chyla
Hi, Is it possible to set local arguments for each query clause? example: {!type=x q.field=z}something AND {!type=database}something I am pulling together result sets coming from two sources, Solr index and DB engine - however I realized that local parameters apply only to the whole query - so

Re: Groupped results

2011-03-02 Thread Jayendra Patil
Hi Rok, If I understood the use case rightly, Grouping of the results are possible in Solr http://wiki.apache.org/solr/FieldCollapsing Probably, you can create new fields with the combination for the groups and use the field collapsing feature to group the results. Id Type1Type2Title Grou

Solr Sharding and idf

2011-03-02 Thread Jae Joo
Is there still issue regarding distributed idf in sharding environment in Solr 1.4 or 4.0? If yes, any suggestions to resolve it? Thanks, Jae

Re: Indexed, but cannot search

2011-03-02 Thread Brian Lamb
Here are the relevant parts of schema.xml: globalField This is what is returned when I search: - 0 1 - Mammal true - Mammal Mammal globalField:mammal globalField:mammal LuceneQParser - 1.0 - 1.0 - 1.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 - 0.0 -

Re: solr different sizes on master and slave

2011-03-02 Thread Jayendra Patil
Hi Mike, There was an issue with the Snappuller wherein it fails to clean up the old index directories on the slave side. https://issues.apache.org/jira/browse/SOLR-2156 The patch can be applied to fix the issue. You can also delete the old index directories, except for the current one which is m

Re: Solr under Tomcat

2011-03-02 Thread Savvas-Andreas Moysidis
Hi Sai, You can find your index files at: {%TOMCAT_HOME}\solr\data\index If you want to clear the index just delete the whole index directory. Regards, - Savvas On 2 March 2011 14:09, Thumuluri, Sai wrote: > Good Morning, > We have deployed Solr 1.4.1 under Tomcat and it works great, however I

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Hi, No, it doesn't. It looks like to be a apache httpclient 3.x limitation. https://issues.apache.org/jira/browse/HTTPCLIENT-579 Dominique Le 02/03/11 15:04, Thumuluri, Sai a écrit : Dominique, Does your crawler support NTLM2 authentication? We have content under SiteMinder which uses NTLM2 a

Re: solr different sizes on master and slave

2011-03-02 Thread Markus Jelsma
Yes. But keep in mind that Solr may be actually using an index. directory for its live search. See either the replication.properties file or consult the replication page to see what index directory it uses. If it uses an index. directory you can safely move it to index and remove or modify repl

Re: Problem - Help me with DataImport

2011-03-02 Thread Stefan Matheis
Matias, for indexing constant/static values .. try http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer Regards Stefan On Wed, Mar 2, 2011 at 2:46 PM, Matias Alonso wrote: > Good Morning, > > > First, sorry for my poor english. > > > I trying to index “blogs” (rss) to my solr, so I

Solr under Tomcat

2011-03-02 Thread Thumuluri, Sai
Good Morning, We have deployed Solr 1.4.1 under Tomcat and it works great, however I cannot find where the index (directory) is created. I set solr home in web.xml under /webapps/solr/WEB-INF/, but not sure where the data directory is. I have a need where I need to completely index the site and it

Re: multi-core solr, specifying the data directory

2011-03-02 Thread Nagendra Nagarajayya
HI Jonathan: Did you try : This should create the indexes under some_core/data or you can make datadir relative to some_core dir. Regards, - NN http://solr-ra.tgels.com http://rankingalgorithm.tgels.com On 3/1/2011 7:21 AM, Jonathan Rochkind wrote: I did try that, yes. I tried that fi

RE: [ANNOUNCE] Web Crawler

2011-03-02 Thread Thumuluri, Sai
Dominique, Does your crawler support NTLM2 authentication? We have content under SiteMinder which uses NTLM2 and that is posing challenges with Nutch? -Original Message- From: Dominique Bejean [mailto:dominique.bej...@eolya.fr] Sent: Wednesday, March 02, 2011 6:22 AM To: solr-user@lucene

Re: solr different sizes on master and slave

2011-03-02 Thread Mike Franon
Is it ok if I just delete the old copies manually? or maybe run a script that does it? On Tue, Mar 1, 2011 at 7:47 PM, Markus Jelsma wrote: > Indeed, the slave should not have useless copies but it does, at least in > 1.4.0, i haven't seen it in 3.x, but that was just a small test that did not >

Re: solr different sizes on master and slave

2011-03-02 Thread Mike Franon
Right now I have the slave polling every 10 seconds, becuase we want to make sure they stay in sync. I have users who will do post directly from a web application. But I do notice it syncs very quick, becuase usually the update is only one or two records at a time. I am thinking maybe 10 seconds

Re: MLT with boost

2011-03-02 Thread Koji Sekiguchi
(11/03/02 0:23), Mark wrote: Is it possible to add function queries/boosts to the results that are by MLT? If not out of the box how would one go about achieving this functionality? Thanks Beside the point, why do you need such function? If you give us more information/background of your need

Problem - Help me with DataImport

2011-03-02 Thread Matias Alonso
Good Morning, First, sorry for my poor english. I trying to index “blogs” (rss) to my solr, so I´m using a dataImportHandler for this. I can´t index the date and I don´t no how to index static values (constant) in a Field. When I make a “Full Import” it doesn´t index the docs; if I delete the

Re: Split analysis

2011-03-02 Thread Markus Jelsma
There is an updateRequestProcessorChain you can use to execute some processors. Check de page for deduplication, it already has methods for creating signatures but you can easily create your own if you have to. Use copyField to copy the value to a non-analyzed field (string) and obtain the orig

Split analysis

2011-03-02 Thread dan sutton
Hi All, I have a requirement to analyze a field with a series of filters, calculate a 'signature' then concatenate with the original input e.g. input => 'this is the input' tokenized and filtered, input becomes say 'this input' => 12ef5e (signature) so the final output indexed is:

Groupped results

2011-03-02 Thread Rok Rejc
I have an index with a number of documents. For example (this example is representative and contains many others fields): IdType1Type2Title 1abxfg 2acabd 3adthm 4baefd 5bbikj 6bcazd ... I want to query an index on

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Paul Libbrecht
VIewing the indexing result, which is a part of what you are describing I think, is a nice job for such an indexing framework. Do you guys know whether such feature is already out there? paul Le 2 mars 2011 à 12:20, Geert-Jan Brits a écrit : > Hi Dominique, > > This looks nice. > In the past

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Hi, The crawler comes with a extendible document processing pipeline. If you know java libraries or web services for 'wrapper induction' processing, it is possible to implement a dedicated stage in the pipeline. Dominique Le 02/03/11 12:20, Geert-Jan Brits a écrit : Hi Dominique, This look

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Aditya, The crawler is not open source and won't be in the next future. Anyway, I have to change the license because it can be use for any personal or commercial projects. Sincerely, Dominique Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Lukas, I am thinking about it but no decision yet. Anyway, in next release, I will provide source code of pipeline stages and connectors as samples. Dominique Le 02/03/11 10:01, Lukáš Vlček a écrit : Hi, is there any plan to open source it? Regards, Lukas [OT] I tried HuriSearch, input "

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Geert-Jan Brits
Hi Dominique, This looks nice. In the past, I've been interested in (semi)-automatically inducing a scheme/wrapper from a set of example webpages (often called 'wrapper induction' is the scientific field) . This would allow for fast scheme-creation which could be used as a basis for extraction. L

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
Rosa, In the pipeline, there is a stage that extract the text from the original document (PDF, HTML, ...). It is possible to plug scripts (Java 6 compliant) in order to keep only relevant parts of the document. See http://www.wiizio.com/confluence/display/CRAWLUSERS/DocTextExtractor+stage Do

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Dominique Bejean
David, The UI was not the only reason that make me choose to write a totaly new crawler. After eliminating candidate crawlers due to various reasons (inactive project, ...), Nutch and Heritrix where the 2 crawlers in my short list of possible candidates to be use. In my mind, the crawler and

design help

2011-03-02 Thread Ken Foskey
I have read the solr book and the other book is on its way for me to read. I need some help in the mean time. a) Using the example solr system how do I send the word document using curl into the system.I want to have the ID as the full path of the document. I have tried various commands

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread findbestopensource
Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't know how far yours would be different from the rest. Your license states that it is not open source but it is free for personnel use. Regards Aditya ww

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Lukáš Vlček
Hi, is there any plan to open source it? Regards, Lukas [OT] I tried HuriSearch, input "Java" into search field, it returned a lot of references to coldfusion error pages. May be a recrawl would help? On Wed, Mar 2, 2011 at 1:25 AM, Dominique Bejean wrote: > Hi, > > I would like to announce Cr

Re: [ANNOUNCE] Web Crawler

2011-03-02 Thread Rosa (Anuncios)
Nice job! It would be good to be able to extract specific data from a given page via XPATH though. Regards, Le 02/03/2011 01:25, Dominique Bejean a écrit : Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document pro

Re: indexing mysql dateTime/timestamp into solr date field

2011-03-02 Thread cyang2010
It turn out you don't need to use dateFormatTransformer at all. The reason why the timestamp mysql column fail to be inserted to solr is because in schema.xml i mistakenly set "index=false, stored=false". Of course that won't make it come to index at all. No wonder schema browser always show no

Re: how to debug dataimporthandler

2011-03-02 Thread Stefan Matheis
Hey, normally .. if i have problems with dih: * i start having a look at the mysql-query-log, to check which queries are executed. * re-run the query myself, verify the return data * Activate http://wiki.apache.org/solr/DataImportHandler#LogTransformer and log the important data, check console ou