help needed on solr-uima integration

2011-10-23 Thread Xue-Feng Yang
Hi, After google online, some parts in the "puzzle" still missing. The best is to find a simple example showing the whole process. Is there any example like apache-uima/examples/descriptors/tutorial/ex3 RoomNumber and DateTime integrated into solr?  In particular, how to feed "text" into solr f

Re: Implement Custom Soundex

2011-10-23 Thread Paul Libbrecht
Momo, if you have the conversion text to tokens then all you need to do is implement a custom analyzer, deploy it inside the solr webapp, then plug it into the schema. Is that the part that is hard? I thought the wiki was helpful there but may some other issue is holding you. One zoology of suc

RE: Implement Custom Soundex

2011-10-23 Thread Momo..Lelo ..
thank you for this information. > Subject: Re: Implement Custom Soundex > From: p...@hoplahup.net > Date: Sun, 23 Oct 2011 10:58:49 +0200 > To: solr-user@lucene.apache.org > > Momo, > > if you have the conversion text to tokens then all you need to do is > implement a custom analyzer, deploy

Update document field with solrj

2011-10-23 Thread hadi
I want to edit document filed in solr,for example edit the author name,so i use the following code in solrj: params.set("literal.author","anaconda") but the author multivalued="true" in schema and because of that "anaconde" is not replace with it's previous name and add to the end of the author n

Re: questions about autocommit & committing documents

2011-10-23 Thread darul
May someone explain me different use case when both or only one AutoCommit parameters is filled ? I really need to understand it. For example with these configurations : 1 or 1000 or 1 1000 Thanks to everyone -- View this message in context: http://lucen

Re: Selective Result Grouping

2011-10-23 Thread Martijn v Groningen
> The current grouping functionality using group.field is basically > all-or-nothing: all documents will be grouped by the field value or none > will. So there would be no way to, for example, collapse just the videos or > images like they do in google. When using the group.field option values must

Re: Solr indexing plugin: skip single faulty document?

2011-10-23 Thread Erick Erickson
Some work has been done in this general area, see SOLR-445. That might give you some pointers Best Erick On Mon, Oct 17, 2011 at 11:00 AM, samuele.mattiuzzo wrote: > Hi all, as far as i know, when solr finds a faulty document (inside an xml > containing let say 1000 docs) it skips the whole

Re: multiple document types in a core

2011-10-23 Thread Erick Erickson
Yes, stored fields are placed verbatim for every doc. But I wonder at the utility of trying to share stored information. The stored info is put in certain files in the index, see: http://lucene.apache.org/java/3_0_2/fileformats.html#file-names and the files that store data are pretty much irreleva

Re: use lucene to create index(with synonym) and solr query index

2011-10-23 Thread Erick Erickson
I'm not quite sure what you're asking, but the values returned for documents to the client are the *stored* values, not the indexed values. So your synonyms will never be returned as part of a document. Does that help? Best Erick On Wed, Oct 19, 2011 at 4:23 AM, cmd wrote: > 1.use lucene to cre

Re: Find Documents with field = maxValue

2011-10-23 Thread Erick Erickson
Right, but consider the general case. You could potentially return every document in your index in a single packet with this functionality. I suspect that this is an edge case that you'll have to 1> implement the two-or-more query solution 2> write your own component that investigates the terms in

Re: where is solr data import handler looking for my file?

2011-10-23 Thread Erick Erickson
I think you need to back up and state the problem you're trying to solve. Offhand, it looks as though you're trying to do something with DIH that it wasn't intended to do. But that's just a guess since the details of what you're trying to do are so sparse... Best Erick On Wed, Oct 19, 2011 at 10:

Re: Dismax and phrases

2011-10-23 Thread Erick Erickson
Hmmm dismax is, indeed, different. Note that dismax doesn't respect the default operator at all, so don't be mislead there. Could you paste the debug output for both the queries? Perhaps something will jump out at us. Best Erick On Thu, Oct 20, 2011 at 11:08 AM, Hyttinen Lauri wrote: > Thank yo

Re: Question about near query order

2011-10-23 Thread Erick Erickson
Just to chime in here... You will get different results for "A B"~2 and "B A"~2. In the simple two-term case, changing the order requires an extra move(s). There's a very good explanation of this in Lucene In Action II. Best Erick On Thu, Oct 20, 2011 at 3:35 PM, Jason, Kim wrote: > Which one is

Re: how to handle large relational data in Solr

2011-10-23 Thread Erick Erickson
In addition to Otis' suggestion, think about using multivalued fields with an increment gap of, say, 100 (assuming your accessories had less than 100 fields). Then you can do proximity searches with a size < 100 (e.g. "red swing"~90) would not match across your multiple entries If this is clea

Re: OS Cache - Solr

2011-10-23 Thread Erick Erickson
Think about using cores rather than instances if you really must have this kind of separation. Otherwise you might have much better luck combining these into a single index. Best Erick On Fri, Oct 21, 2011 at 7:07 AM, Sujatha Arun wrote: > Yes its same ,we have a base static schema and wherever

Question about dismax and score boost with date

2011-10-23 Thread Craig Stadler
Solr Specification Version: 1.4.0 Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:33:40 Lucene Specification Version: 2.9.1 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25 precisionStep="6" positionIncrementGap="0"/> stored="false" omitNorms="true"

Re: inconsistent results when faceting on multivalued field

2011-10-23 Thread Erick Erickson
I think the key here is you are a bit confused about what the multiValued thing is all about. The fq clause says, essentially, "restrict all my search results to the documents where 1213206 occurs in sou_codeMetier. That's *all* the fq clause does. Now, by saying facet.field=sou_codeMetier you're

Re: SOLRNET combine LocalParams with SolrMultipleCriteriaQuery?

2011-10-23 Thread Erick Erickson
Hmmm, this is the Java forum, you might get a faster respons on the Solr .net users list Especially since I don't find any reference to SolrMultipleCriteriaQuery in the Java 3.x code Best Erick On Fri, Oct 21, 2011 at 1:44 PM, Grüger, Joscha wrote: > Hello, > > does anybody know how to c

Re: Can Solr handle large text files?

2011-10-23 Thread Erick Erickson
Also be aware that by default Solr is configured to only index the first 10,000 lines of text. See maxFieldLength in solrconfig.xml Best Erick On Fri, Oct 21, 2011 at 7:34 PM, Peter Spam wrote: > Thanks for your note, Anand.  What was the maximum chunk size for you?  Could > you post the releva

Re: Date boosting with dismax question

2011-10-23 Thread Erick Erickson
Have you seen this? http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents Best Erick On Sat, Oct 22, 2011 at 3:26 AM, Craig Stadler wrote: > Solr Specification Version: 1.4.0 > Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 > 12:33:4

Re: Date boosting with dismax question

2011-10-23 Thread Craig Stadler
Yes I have and I cannot get it to work. Perhaps something is out of version for my setup? I tried for 3 hours to get ever example I could find to work. - Original Message - From: "Erick Erickson" To: Sent: Sunday, October 23, 2011 5:07 PM Subject: Re: Date boosting with dismax questi

Re: Update document field with solrj

2011-10-23 Thread Erick Erickson
You cannot update a single field in a document in Solr, you need to replace the entire document. multiValued is irrelevant to this problem.. Or did I misunderstand your problem? Best Erick On Sun, Oct 23, 2011 at 1:32 PM, hadi wrote: > I want to edit document filed in solr,for example edit the

Re: questions about autocommit & committing documents

2011-10-23 Thread Erick Erickson
A full commit of all pending documents is performed whenever the first trigger is reached. So, maxdocs = 1000. Max time=1 minute. Index a packet with 999 docs. Index another packet with 50 documents immediately after. One commit of 1049 documents happens Index a packet of 999 docs. Do nothin

Re: Date boosting with dismax question

2011-10-23 Thread Erick Erickson
Define "not working". Show what you're getting and what you expect to find. Show your data. Note that the example given boosts on quite coarse dates, it *tends* to make documents published in a particular *year* score higher. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Er

Re: where is solr data import handler looking for my file?

2011-10-23 Thread Fred Zimmerman
Figured it out. See step 12 in http://business.zimzaz.com/wordpress/2011/10/how-to-clone-wikipedia-mirror-and-index-wikipedia-with-solr/. Thanks! On Sun, Oct 23, 2011 at 1:31 PM, Erick Erickson wrote: > I think you need to back up and state the problem you're trying to > solve. Offhand, it look

schema.xml bloat?

2011-10-23 Thread Fred Zimmerman
Hi, it seems from my limited experience thus far that as new data types are added, schema.xml will tend to become bloated with many different field and fieldtype definitions. Is this a problem in real life, and if so, what strategies are used to address it? FredZ

Re: schema.xml bloat?

2011-10-23 Thread Erik Hatcher
On Oct 23, 2011, at 19:34 , Fred Zimmerman wrote: > it seems from my limited experience thus far that as new data types are > added, schema.xml will tend to become bloated with many different field and > fieldtype definitions. Is this a problem in real life, and if so, what > strategies are used

questions on query format

2011-10-23 Thread Memory Makers
Hi, I've spent quite some time reading up on the query format and can't seem to solve this problem: 1. If send solr the following query: q={!lucene}profile_description:* I get what I would expect. 2. If send solr the following query: q=*:* I get nothing just: Would appreciate some

Re: schema.xml bloat?

2011-10-23 Thread Fred Zimmerman
So, basically, yes, it is a real problem and there is no designed solution? e.g. optional sub-schema files that can be turned off and on? On Sun, Oct 23, 2011 at 6:38 PM, Erik Hatcher wrote: > > On Oct 23, 2011, at 19:34 , Fred Zimmerman wrote: > > it seems from my limited experience thus far th

Re: schema.xml bloat?

2011-10-23 Thread Erik Hatcher
On Oct 23, 2011, at 20:23 , Fred Zimmerman wrote: > So, basically, yes, it is a real problem and there is no designed solution? Hmmm problem? Not terribly so, is it? Certainly I'm more for a de-XMLification of configuration myself though. And we probably should bake-in all the basic fi

data-import problem

2011-10-23 Thread Radha Krishna Reddy
Hi, I am trying to comfigure solr on aws ubuntu instance.I have mysql on a different server.so i created a ssh tunnel for mysql on port 3309. Download the mysql jdbc driver and copied it to lib folder. *I edited the example/solr/conf/solrconfig.xml* data-config.xml *example/solr/conf/da

Re: questions on query format

2011-10-23 Thread Ahmet Arslan
> 2. If send solr the following query: >   q=*:* > >   I get nothing just: >     name="response" numFound="0" start="0" > maxScore="0.0"/> name="highlighting"/> > > Would appreciate some insight into what is going on. If you are using dismax as query parser, then *:* won't function as match all

Re: Dismax and phrases

2011-10-23 Thread Hyttinen Lauri
On 10/23/2011 09:34 PM, Erick Erickson wrote: Hmmm dismax is, indeed, different. Note that dismax doesn't respect the default operator at all, so don't be mislead there. Could you paste the debug output for both the queries? Perhaps something will jump out at us. Best Erick Thank you Erick. I'

Re: Want to support "did you mean xxx" but is Chinese

2011-10-23 Thread Floyd Wu
Hi Li Li, Thanks for your detail explanation. Basically I have similar implementation like yours. I just want to know if there is a better and total solution. I'll keep trying and see if I have any improvement that can share with you and the community. Any idea or advice are welcome . Floyd 2