Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Jan Høydahl
What operating system? Are you using spellchecker with buildOnCommit? Anything special in your Update Chain? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 12. apr. 2012, at 06:45, Rohit wrote: > We recently migrated from solr3.

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Tirthankar Chatterjee
Hi Rohit, What would be the average size of your documents and also can you please share your idea of having 2 cores in the master. I just wanted to know the reasoning behind the design. Thanks in advance Tirthankar On Apr 12, 2012, at 3:19 AM, Jan Høydahl wrote: > What operating system? > A

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Tirthankar Chatterjee
Hi Rohit, Can you please check the solrconfig.xml in 3.5 and compare it with 3.1 if there are any warming queries specified while opening the searchers after a commit. Thanks, Tirthankar On Apr 12, 2012, at 3:30 AM, Tirthankar Chatterjee wrote: > Hi Rohit, > What would be the average size of

RE: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Rohit
Hi Tirthankar, The average size of documents would be a few Kb's this is mostly tweets which are being saved. The two cores are storing different kind of data and nothing else. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Tirthankar Chat

RE: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Rohit
Operating system in linux ubuntu. No not using spellchecker Only language detection in my update chain. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: 12 April 2012 12:50 To: solr-user@luc

Re: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Tirthankar Chatterjee
thanks Rohit.. for the information. On Apr 12, 2012, at 4:08 AM, Rohit wrote: > Hi Tirthankar, > > The average size of documents would be a few Kb's this is mostly tweets > which are being saved. The two cores are storing different kind of data and > nothing else. > > Regards, > Rohit > Mobile:

Problem to integrate Solr in Jetty (the first example in the Apache Solr 3.1 Cookbook)

2012-04-12 Thread Bastian Hepp
Hi, I'm using Apache Solr 3.5.0 and Jetty 8.1.2 with Windows 7. (Versions in the Book used... Solr 3.1, Jetty 6.1.26) I've tried to get Solr running with Jetty. - I copied the jetty.xml and the webdefault.xml from the example Solr. - I copied the solr.war to webapps - I copied the solr directory

Re: Facets involving multiple fields

2012-04-12 Thread Marc SCHNEIDER
Hi, Thanks for your answer. Let's say I have to fields : 'keywords' and 'short_title'. For these fields I'd like to make a faceted search : if 'Computer' is stored in at least one of these fields for a document I'd like to get it added in my results. doc1 => keywords : 'Computer' / short_title : '

Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of "Windjacke" (probably "wind jacket" in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a "Jacke" ("jacket") so that a query for "Jacke" would include the "Windjacke" document in its result set. It appears to me that such

Re: Large Index and OutOfMemoryError: Map failed

2012-04-12 Thread Michael McCandless
Your largest index has 66 segments (690 files) ... biggish but not insane. With 64K maps you should be able to have ~47 searchers open on each core. Enabling compound file format (not the opposite!) will mean fewer maps ... ie should improve this situation. I don't understand why Solr defaults t

codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello, We're using a sorted index in order to implement early termination efficiently over an index of hundreds of millions of documents. As of now, we're using the default codecs coming with Lucene 4, but we believe that due to the fact that the docids are sorted, we should be able to do much bet

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Given an input of "Windjacke" (probably "wind jacket" in English), > I'd like the code that prepares the data for the index (tokenizer > etc) to understand that this is a "Jacke" ("jacket") so that a > query for "Jacke" would include the "Windjacke" document in its > result set. > > It appears t

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one "neglected topics" to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the de-compound

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: > Given an input of "Windjacke" (probably "wind jacket" in English), I'd > like the code that prepares the data for the index (tokenizer etc) to > understand that this is a "Jacke" ("jacket")

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-12 Thread pcrao
Hi Mikhail Khludnev, Thank you for the reply. I think the index is getting corrupted because StreamingUpdateSolrServer is keeping reference to some index files that are being deleted by EmbeddedSolrServer during commit/optimize process. As a result when I Index(Full) using EmbeddedSolrServer and t

Re: Lexical analysis tools for German language data

2012-04-12 Thread Valeriy Felberg
If you want that query "jacke" matches a document containing the word "windjacke" or "kinderjacke", you could use a custom update processor. This processor could search the indexed text for words matching the pattern ".*jacke" and inject the word "jacke" into an additional field which you can searc

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes,

Solr Scoring

2012-04-12 Thread Kissue Kissue
Hi, I have a field in my index called itemDesc which i am applying EnglishMinimalStemFilterFactory to. So if i index a value to this field containing "Edges", the EnglishMinimalStemFilterFactory applies stemming and "Edges" becomes "Edge". Now when i search for "Edges", documents with "Edge" score

two structures in solr

2012-04-12 Thread tkoomzaaskz
Hi all, I'm a solr newbie, so sorry if I do anything wrong ;) I want to use SOLR not only for fast text search, but mainly to create a very fast search engine for a high-traffic system (MySQL would not do the job if the db grows too big). I need to store *two big structures* in SOLR: projects an

Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Erick Erickson
WordDelimiterFilterFactory will _almost_ do what you want by setting things like catenateWords=0 and catenateNumbers=1, _except_ that the punctuation will be removed. So 12.34 -> 1234 ab,cd -> ab cd is that "close enough"? Otherwise, writing a simple Filter is probably the way to go. Best Erick

Dismax request handler differences Between Solr Version 3.5 and 1.4

2012-04-12 Thread mechravi25
Hi, We are currently using solr (version 1.4.0.2010.01.13.08.09.44). we have a strange situation in dismax request handler. when we search for a keyword and append qt=dismax, we are not getting the any results. The solr request is as follows: http://local:8983/solr/core2/select/?q=Bank&version=2.

Re: Facets involving multiple fields

2012-04-12 Thread Erick Erickson
facet.query=keywords:computer short_title:computer seems like what you're asking for. On Thu, Apr 12, 2012 at 3:19 AM, Marc SCHNEIDER wrote: > Hi, > > Thanks for your answer. > Let's say I have to fields : 'keywords' and 'short_title'. > For these fields I'd like to make a faceted search : if 'Co

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling
Paul, nearly two years ago I requested an evaluation license and tested BASIS Tech Rosette for Lucene & Solr. Was working excellent but the price much much to high. Yes, they also have compound analysis for several languages including German. Just configure your pipeline in solr and setup the pr

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
You could use SolrCloud (for the automatic scaling) and just mount a fuse[1] HDFS directory and configure solr to use that directory for its data. [1] https://ccp.cloudera.com/display/CDHDOC/Mountable+HDFS On Thu, 2012-04-12 at 16:04 +0300, Ali S Kureishy wrote: > Hi, > > I'm trying to setup a

is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2
hello everyone, can people give me their thoughts on this. currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term tokeni

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Von: Valeriy Felberg > If you want that query "jacke" matches a document containing the word > "windjacke" or "kinderjacke", you could use a custom update processor. > This processor could search the indexed text for words matching the > pattern ".*jacke" and inject the word "jacke" into an addi

Re: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
Hi, We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the dictionary for the Hunspell

Further questions about behavior in ReversedWildcardFilterFactory

2012-04-12 Thread neosky
I ask the question in http://lucene.472066.n3.nabble.com/A-little-onfusion-with-maxPosAsterisk-tt3889226.html However, when I do some implementation, I get a further questions. 1. Suppose I don't use ReversedWildcardFilterFactory in the index time, it seems that Solr doesn't allow the leading wild

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Ali S Kureishy
Thanks Darren. Actually, I would like the system to be homogenous - i.e., use Hadoop based tools that already provide all the necessary scaling for the lucene index (in terms of throughput, latency of writes/reads etc). Since SolrCloud adds its own layer of sharding/replication that is outside Had

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Von: Markus Jelsma > We've done a lot of tests with the HyphenationCompoundWordTokenFilter > using a from TeX generated FOP XML file for the Dutch language and > have seen decent results. A bonus was that now some tokens can be > stemmed properly because not all compounds are listed in the > dic

RE: Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Darren Govoni
Solrcloud or any other tech specific replication isnt going to 'just work' with hadoop replication. But with some significant custom coding anything should be possible. Interesting idea. br>--- Original Message --- On 4/12/2012 09:21 AM Ali S Kureishy wrote:Thanks Darren. Actually, I

Re: Question about solr.WordDelimiterFilterFactory

2012-04-12 Thread Jian Xu
Erick, Thank you for your response!  The problem with this approach is that searching for "12:34" will also match "12.34" which is not what I want. From: Erick Erickson To: solr-user@lucene.apache.org; Jian Xu Sent: Thursday, April 12, 2012 8:01 AM Subject:

RE: SOLR 3.3 DIH and Java 1.6

2012-04-12 Thread randolf.julian
Thanks guys for all the help. We moved to an upgraded O.S. version and the java script worked. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3905583.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 3.4 with nTiers >= 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Dmitry Kan
Can anyone help me out with this? Is this too complicated / unclear? I could share more detail if needed. On Wed, Apr 11, 2012 at 3:16 PM, Dmitry Kan wrote: > Hello, > > Hopefully this question is not too complex to handle, but I'm currently > stuck with it. > > We have a system with nTiers, tha

Re: Error

2012-04-12 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists You haven't said whether, for instance, you're using trunk which is the only version that supports the "termfreq" function. Best Erick On Thu, Apr 12, 2012 at 4:08 AM, Abhishek tiwari wrote: > http://xyz.com:8080/newschema/mainsearch

Import null values from XML file

2012-04-12 Thread randolf.julian
We import an XML file directly to SOLR using a the script called post.sh in the exampledocs. This is the script: FILES=$* URL=http://localhost:8983/solr/update for f in $FILES; do echo Posting file $f to $URL curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8' echo done #

Re: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the "s" in "Weinachtsbaum" (Weinachten/Baum). Internal nouns should be recapitalized, like "Baum" above. Some compounds probably should not be decompounded, like "Fahrrad

[Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Lyuba Romanchuk
Hi, I need to configure the solr so that the opened searcher will see a new document immidiately after it was adding to the index. And I don't want to perform commit each time a new document is added. I tried to configure maxDocs=1 under autoSoftCommit in solrconfig.xml but it didn't help. Is

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
> Von: Walter Underwood > German noun decompounding is a little more complicated than it might > seem. > > There can be transformations or inflections, like the "s" in > "Weinachtsbaum" (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is "Fugenmorph

Re: [Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Mark Miller
On Apr 12, 2012, at 11:28 AM, Lyuba Romanchuk wrote: > Hi, > > > > I need to configure the solr so that the opened searcher will see a new > document immidiately after it was adding to the index. > > And I don't want to perform commit each time a new document is added. > > I tried to configu

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Mark Miller
Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config schema.xml You must have a _version_ field defined: On Apr 11, 2012, at 9:10 AM, Benson Margulies wrote: > I didn't have a _version_ field, since nothing in the schema says that > it's required! > > On Wed,

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : >> Some compounds probably should not be decompounded, like "Fahrrad" >> (farhren/Rad). With a dictionary-based stemmer, you might decide to >> avoid decompounding for words in the dictionary. > > Good point. More or less, Fahrrad is generally ab

Re: Problem to integrate Solr in Jetty (the first example in the Apache Solr 3.1 Cookbook)

2012-04-12 Thread Shawn Heisey
On 4/12/2012 2:21 AM, Bastian Hepp wrote: When I try to start I get this error message: C:\\jetty-solr>java -jar start.jar java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(U

Re: Large Index and OutOfMemoryError: Map failed

2012-04-12 Thread Mark Miller
On Apr 12, 2012, at 6:07 AM, Michael McCandless wrote: > Your largest index has 66 segments (690 files) ... biggish but not > insane. With 64K maps you should be able to have ~47 searchers open > on each core. > > Enabling compound file format (not the opposite!) will mean fewer maps > ... ie s

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Benson Margulies
On Thu, Apr 12, 2012 at 11:56 AM, Mark Miller wrote: > Please see the documentation: > http://wiki.apache.org/solr/SolrCloud#Required_Config Did I fail to find this in google or did I just goad you into a writing job? I'm inclined to write a JIRA asking for _version_ to be configurable just lik

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 8:46 AM, Michael Ludwig wrote: > I remember from my linguistics studies that the terminus technicus for > these is "Fugenmorphem" (interstitial or joint morpheme). That is some excellent linguistic jargon. I'll file that with "hapax legomenon". If you don't highlight, you ca

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma
On Thursday 12 April 2012 18:00:14 Paul Libbrecht wrote: > Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit : > >> Some compounds probably should not be decompounded, like "Fahrrad" > >> (farhren/Rad). With a dictionary-based stemmer, you might decide to > >> avoid decompounding for words in the dic

Re: AW: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood
On Apr 12, 2012, at 9:00 AM, Paul Libbrecht wrote: > More or less, Fahrrad is generally abbreviated as Rad. > (even though Rad can mean wheel and bike) A synonym could handle this, since "farhren" would not be a good match. It is judgement call, but this seems more like an equivalence "Fahrrad =

Re: codecs for sorted indexes

2012-04-12 Thread Michael McCandless
Do you mean you are pre-sorting the documents (by what criteria?) yourself, before adding them to the index? In which case... you should already be seeing some benefits (smaller index size) than had you "randomly" added them (ie the vInts should take fewer bytes), I think. (Probably the savings w

Re: Error

2012-04-12 Thread Abhishek tiwari
i am using 3.4 solr version... please assist... On Thu, Apr 12, 2012 at 8:41 PM, Erick Erickson wrote: > Please review: > > http://wiki.apache.org/solr/UsingMailingLists > > You haven't said whether, for instance, you're using trunk which > is the only version that supports the "termfreq" functio

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-12 Thread Shawn Heisey
On 4/12/2012 4:52 AM, pcrao wrote: I think the index is getting corrupted because StreamingUpdateSolrServer is keeping reference to some index files that are being deleted by EmbeddedSolrServer during commit/optimize process. As a result when I Index(Full) using EmbeddedSolrServer and then do Inc

Re: [Solr 4.0] Is it possible to do soft commit from code and not configuration only

2012-04-12 Thread Lyuba Romanchuk
Hi Mark, Thank you for reply. I tried to normalize data like in relational databases: - there are some types of documents where \ - documents with the same type have the same fields - documents with not equal types may have different fields - but all documents have "type" fi

Re: Error

2012-04-12 Thread Erick Erickson
The "termfreq" function is only valid for trunk. You're using 3.4. Since 'termfreq' is not recognized, Solr gets confused. Best Erick On Thu, Apr 12, 2012 at 10:20 AM, Abhishek tiwari wrote: > i am using 3.4 solr version... please assist... > > On Thu, Apr 12, 2012 at 8:41 PM, Erick Erickson >

Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread Shawn Heisey
On 4/12/2012 7:27 AM, geeky2 wrote: currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term tokenization or possibly othe

Re: Problem to integrate Solr in Jetty (the first example in the Apache Solr 3.1 Cookbook)

2012-04-12 Thread Bastian Hepp
Thanks Shawn, I think I'll stay with the build in. I had problems with Solr Cell, but I could fix it. Greetings, Bastian Am 12. April 2012 18:02 schrieb Shawn Heisey : > > Bastian, > > The jetty.xml included with Solr is littered with org.mortbay class > references, which are appropriate for Jet

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Mark Miller
google must not have found it - i put that in a month or so ago I believe - at least weeks. As you can see, there is still a bit to fill in, but it covers the high level. I'd like to add example snippets for the rest soon. On Thu, Apr 12, 2012 at 12:04 PM, Benson Margulies wrote: > On Thu, Apr 12

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Chris Hostetter
: Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config : : schema.xml : : You must have a _version_ field defined: : : Seems like this is the kind of thing that should make Solr fail hard and fast on SolrCore init if it sees you are running in cloud mode and y

Re: solr 3.4 with nTiers >= 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Mikhail Khludnev
Dmitry, The last NPE in HighlightingComponent is just a sad coding issue. few rows later we can see that developer expected to have some docs not found // remove nulls in case not all docs were able to be retrieved rb.rsp.add("highlighting", SolrPluginUtils.removeNulls(new SimpleOrderedMap(a

Re: solr 3.4 with nTiers >= 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Yonik Seeley
On Wed, Apr 11, 2012 at 8:16 AM, Dmitry Kan wrote: > We have a system with nTiers, that is: > > Solr front base ---> Solr front --> shards Although the architecture had this in mind (multi-tier), all of the pieces are not yet in place to allow it. The errors you see are a direct result of that.

RE: solr 3.5 taking long to index

2012-04-12 Thread Rohit
Thanks for pointing these out, but I still have one concern, why is the Virtual Memory running in 300g+? Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: 12 April 2012 11:58 To: so

RE: Solr 3.5 takes very long to commit gradually

2012-04-12 Thread Rohit
Thanks for pointing these out, but I still have one concern, why is the Virtual Memory running in 300g+? Regards, Rohit -Original Message- From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] Sent: 12 April 2012 13:43 To: solr-user@lucene.apache.org Subject: Re: Solr 3.5 takes

Re: term frequency outweighs exact phrase match

2012-04-12 Thread alxsss
In that case documents 1 and 2 will not be in the results. We need them also be shown in the results but be ranked after those docs with exact match. I think omitting term frequency in calculating ranking in phrase queries will solve this issue, but I do not see that such a parameter in configs.

Re: solr 3.4 with nTiers >= 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Dmitry Kan
Mikhail, Thanks for sharing your thoughts. Yes I have tried checking for NULL and the entire chain of queries between tiers seems to work. But I suspect, that some docs will be missing. In principle, unless there is an OutOfMemory or a shard down, the doc ids should be retrieving valid documents.

Wildcard searching

2012-04-12 Thread Kissue Kissue
Hi, I am using the edismax query handler with solr 3.5. From the Solr admin interface when i do a wildcard search with the string: edge*, all documents are returned with exactly the same score. When i do the same search from my application using SolrJ to the same solr instance, only a few document

Re: solr 3.4 with nTiers >= 2: usage of ids param causes NullPointerException (NPE)

2012-04-12 Thread Dmitry Kan
Thanks Yonik, This is what I expected. How big the change would be, if I'd start just with Query and Highlight components? Did the change to QueryComponent I made make any sense to you? It would of course mean a custom solution, which I'm willing to contribute as a patch (in case anyone interested

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Mark Miller
I think someone already made a JIRA issue like that. I think Yonik might have had an opinion about it that I cannot remember right now. On Thu, Apr 12, 2012 at 2:21 PM, Chris Hostetter wrote: > > : Please see the documentation: > http://wiki.apache.org/solr/SolrCloud#Required_Config > : > : schem

Re: Wildcard searching

2012-04-12 Thread Kissue Kissue
Correction, this difference betweeen Solr admin scores and SolrJ scores happens with leading wildcard queries e.g. *edge On Thu, Apr 12, 2012 at 8:13 PM, Kissue Kissue wrote: > Hi, > > I am using the edismax query handler with solr 3.5. From the Solr admin > interface when i do a wildcard searc

Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2
>> You end up with one multivalued field, which means that you can only have one analyzer chain. << actually two of the three fields being considered for combination in to a single field ARE multivalued fields. would this be an issue? >> With separate fields, each field can be analyzed differ

Re: Suggester not working for digit starting terms

2012-04-12 Thread jmlucjav
Well now I am really lost... 1. yes I want to suggest whole sentences too, I want the tokenizer to be taken into account, and apparently it is working for me in 3.5.0?? I get suggestions that are like "foo bar abc". Maybe what you mention is only for file based dictionaries? I am using the field

searching across multiple fields using edismax - am i setting this up right?

2012-04-12 Thread geeky2
hello all, i just want to check to make sure i have this right. i was reading on this page: http://wiki.apache.org/solr/ExtendedDisMax, thanks to shawn for educating me. *i want the user to be able to fire a requestHandler but search across multiple fields (itemNo, productType and brand) WITHOUT

Re: Responding to Requests with Chunks/Streaming

2012-04-12 Thread Mikhail Khludnev
Hello Developers, I just want to ask don't you think that response streaming can be useful for things like OLAP, e.g. is you have sharded index presorted and pre-joined by BJQ way you can calculate counts in many cube cells in parallel? Essential distributed test for response streaming just passed

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Yonik Seeley
On Thu, Apr 12, 2012 at 2:21 PM, Chris Hostetter wrote: > > : Please see the documentation: > http://wiki.apache.org/solr/SolrCloud#Required_Config> : > > : schema.xml > : > : You must have a _version_ field defined: > : > : > > Seems like this is the kind of thing that should make Solr fail har

[ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
12 April 2012, Apache Solr™ 3.6.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.6.0. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, facet

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Chris Hostetter
: Off the top of my head: : _version_ is needed for solr cloud where a leader forwards updates to : replicas, unless you're handing update distribution yourself or : providing pre-built shards. : _version_ is needed for realtime-get and optimistic locking : : We should document for sure... but at

RE: [ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Petersen
I think this page needs updating... it says it's not out yet. https://wiki.apache.org/solr/Solr3.6 -Original Message- From: Robert Muir [mailto:rm...@apache.org] Sent: Thursday, April 12, 2012 1:33 PM To: d...@lucene.apache.org; solr-user@lucene.apache.org; Lucene mailing list; anno

Re: [ANNOUNCE] Apache Solr 3.6 released

2012-04-12 Thread Robert Muir
Hi, Just edit it! its a wiki page anyone can edit! There are probably other out of date ones too On Thu, Apr 12, 2012 at 5:57 PM, Robert Petersen wrote: > I think this page needs updating...  it says it's not out yet. > > https://wiki.apache.org/solr/Solr3.6 > > > -Original Message-

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Benson Margulies
I'm probably confused, but it seems to me that the case I hit does not meet any of Yonik's criteria. I have no replicas. I'm running SolrCloud in the simple mode where each doc ends up in exactly one place. I think that it's just a bug that the code refuses to do the local deletion when there's n

Re: codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello Michael, Yes, we are pre-sorting the documents before adding them to the index. We have a score associated to every document (not an IR score but a document-related score that reflects its "importance"). Therefore, the document with the biggest score will have the lowest docid (we add it fir

Re: I've broken delete in SolrCloud and I'm a bit clueless as to how

2012-04-12 Thread Benson Margulies
On Thu, Apr 12, 2012 at 2:14 PM, Mark Miller wrote: > google must not have found it - i put that in a month or so ago I believe - > at least weeks. As you can see, there is still a bit to fill in, but it > covers the high level. I'd like to add example snippets for the rest soon. Mark, is it all

Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread Shawn Heisey
On 4/12/2012 1:37 PM, geeky2 wrote: can you elaborate on this and how EDisMax would preclude the need for copyfield? i am using extended dismax now in my response handlers. here is an example of one of my requestHandlers edismax all 5 itemNo^1.0 *:*

Re: Solr Scoring

2012-04-12 Thread Erick Erickson
No, I don't think there's an OOB way to make this happen. It's a recurring theme, "make exact matches score higher than stemmed matches". Best Erick On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue wrote: > Hi, > > I have a field in my index called itemDesc which i am applying > EnglishMinimalStem

Re: Solr Scoring

2012-04-12 Thread Walter Underwood
It is easy. Create two fields, text_exact and text_stem. Don't use the stemmer in the first chain, do use the stemmer in the second. Give the text_exact a bigger weight than text_stem. wunder On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote: > No, I don't think there's an OOB way to make this

Re: two structures in solr

2012-04-12 Thread Erick Erickson
You have to take off your DB hat when using Solr ... There is no problem at all having documents in the same index that are of different types. There is no penalty for field definitions that aren't used. That is, you can easily have two different types of documents in the same index. It's all abo

Re: solr 3.5 taking long to index

2012-04-12 Thread Shawn Heisey
On 4/12/2012 12:42 PM, Rohit wrote: Thanks for pointing these out, but I still have one concern, why is the Virtual Memory running in 300g+? Solr 3.5 uses MMapDirectoryFactory by default to read the index. This does an mmap on the files that make up your index, so their entire contents are s

Re: Dismax request handler differences Between Solr Version 3.5 and 1.4

2012-04-12 Thread Erick Erickson
Then I suspect your solrconfig is different or you're using a *slightly* different URL. When you specify defType=dismax, you're NOT going to the "dismax" requestHandler. You're specifying a "dismax" style parser, and Solr expects that you're going to provide all the parameters on the URL. To whit:

Re: Further questions about behavior in ReversedWildcardFilterFactory

2012-04-12 Thread Erick Erickson
There is special handling build into Solr (but not Lucene I don't think) that deals with the reversed case, that's probably the source of your differences. Leading wildcards are extremely painful if you don't do some trick like Solr does with the reversed stuff. In order to run, you have to spin t

Re: Suggester not working for digit starting terms

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 3:52 PM, jmlucjav wrote: > Well now I am really lost... > > 1. yes I want to suggest whole sentences too, I want the tokenizer to be > taken into account, and apparently it is working for me in 3.5.0?? I get > suggestions that are like "foo bar abc".  Maybe what you mention

Re: Import null values from XML file

2012-04-12 Thread Erick Erickson
What does "treated as null" mean? Deleted from the doc? The problem here is that null-ness is kind of tricky. What behaviors do you want out of Solr in the NULL case? You can drop this out of the document by writing a custom updateHandler. It's actually quite simple to do. Best Erick On Thu, Apr

Re: codecs for sorted indexes

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas wrote: > Hello Michael, > > Yes, we are pre-sorting the documents before adding them to the index. We > have a score associated to every document (not an IR score but a > document-related score that reflects its "importance"). Therefore, the

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-12 Thread Erick Erickson
Looks good on a quick glance. There are a couple of things... 1> there's no need for the "qt" param _if_ you specify the name as "/partItemNoSearch", just use blahblah/solr/partItemNoSearch There's a JIRA about when/if you need at. Either will do, it's up to you which you prefer. 2> I'd consider

Re: Solr Scoring

2012-04-12 Thread Erick Erickson
GAH! I had my head in "make this happen in one field" when I wrote my response, without being explicit. Of course Walter's solution is pretty much the standard way to deal with this. Best Erick On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood wrote: > It is easy. Create two fields, text_exact a

Re: solr hangs

2012-04-12 Thread Peter Markey
Thanks for the response. I have given a size of 8gb for the instance and has only around few thousands of documents (with 15 fields each having small amount of data)..apparently the problem is the process (solr jetty instance) is consuming lots of threads...one time it consumed around 50k threads a

Re: Solr Http Caching

2012-04-12 Thread Chris Hostetter
: Are any of you using Solr Http caching? I am interested to see how people : use this functionality. I have an index that basically changes once a day : at midnight. Is it okay to enable Solr Http caching for such an index and : set the max age to 1 day? Any potential issues? : : I am using solr

Re: Does the lucene can read the index file from solr?

2012-04-12 Thread a sd
hi,neosky, how to do? i need this way too. thanks On Thu, Apr 12, 2012 at 9:35 PM, neosky wrote: > Thanks!I will try again > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Does-the-lucene-can-read-the-index-file-from-solr-tp3902927p3905364.html > Sent from the Solr - U

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-12 Thread Otis Gospodnetic
Hello Ali, > I'm trying to setup a large scale *Crawl + Index + Search *infrastructure > using Nutch and Solr/Lucene. The targeted scale is *5 Billion web pages*, > crawled + indexed every *4 weeks, *with a search latency of less than 0.5 > seconds. That's fine.  Whether it's doable with any te

Re: term frequency outweighs exact phrase match

2012-04-12 Thread Chris Hostetter
: I use solr 3.5 with edismax. I have the following issue with phrase : search. For example if I have three documents with content like : : 1.apache apache : 2. solr solr : 3.apache solr : : then search for apache solr displays documents in the order 1,.2,3 : instead of 3, 2, 1 because term fr

RE: solr 3.5 taking long to index

2012-04-12 Thread Rohit
The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.

Re: solr 3.5 taking long to index

2012-04-12 Thread Shawn Heisey
On 4/12/2012 8:42 PM, Rohit wrote: The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue. For good performance, S

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-12 Thread pcrao
Hi Shawn, Thanks for sharing your opinion. Mikhail Khludnev, what do you think of Shawn's opinion? Thanks, PC Rao. -- View this message in context: http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3907223.html Sent from the Solr - User mailing list

Re: Trouble handling Unit symbol

2012-04-12 Thread Rajani Maski
Hi All, I tried to index with UTF-8 encode but the issue is still not fixed. Please see my inputs below. *Indexed XML:* 0.100 µ *Search Query - * BODY:µ numfound : 0 results obtained. *What can be the reason for this? How do i need to make search query so that the abov

  1   2   >