date:20120123

Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Maxim Veksler

I'm planning on having 1 Master and multiple slaves (cloud based, slaves are going up / down randomly). The slaves should be constantly available, meaning searching performance should optimally not be affected by the updates at all. It's unclear to me how the Cluster based replication works, does

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Andrew Harvey

We found that optimising too often killed our slave performance. An optimise will cause you to merge and ship the whole index rather than just the relevant portions when you replicate. The change on our slaves in terms of IO and CPU as well as RAM was marked. Andrew Sent on the run. On 23/

Re: Tika0.10 language identifier in Solr3.5.0

2012-01-23 Thread Ted Dunning

Jan's point that keeping different fields can make some statistical issues more correct is sound. The basic idea is that a common word in a rare language should be treated as a common word if you are working in that language. The simplest way to make that happen is by having a different field for

Solr Indexing Running Time 32bit vs 64bit

2012-01-23 Thread Husain, Yavar

I was running 32 bit Java (JDK, JRE & Tomcat) on my 64 bit Windows. For indexing I was not able to allocate more than 1.5GB Heap Space on my machine. Each time my tomcat process used to touch the upper bound (i.e. 1.5GB) very quickly so I thought of working on 64 bit Java/Tomcat. Now I dont see

Filtering search results by an external set of values

2012-01-23 Thread John, Phil (CSS)

Hi, We're building quite a large shared index of resources, using Solr. The application that makes use of these resources is a multitenant one (i.e., many customers using the same index). For resources that are "private" to a customer, it's fairly easy to tag a document with their customer ID a

Re: Getting a word count frequency out of a page field

2012-01-23 Thread solr user

Thanks for the article. I am indexing each page of a document as if it were a document. I think the answer is to configure SOLR for use of the TermVector Component: http://wiki.apache.org/solr/TermVectorComponent I have not tried it yet, but someone told me on StackExchange forum to try this on

Re: Parameter for database host in DIH?

2012-01-23 Thread Chantal Ackermann

Hi wunder, for us, it works with internal dots when specifying the properties in $SOLR_HOME/[core]/conf/solrcore.properties: like this: db.url=xxx db.user=yyy db.passwd=zzz $SOLR_HOME/[core]/conf/data-config.xml: Cheers, Chantal On Sat, 2012-01-21 at 01:01 +0100, Walter Underwood wrote: >

Re: Validating solr user query

2012-01-23 Thread Chantal Ackermann

Hi Dipti, just to make sure: are you aware of http://wiki.apache.org/solr/DisMaxQParserPlugin This will handle the user input in a very conventional and user friendly way. You just have to specify on which fields you want it to search. With the 'mm' parameter you have a powerfull option to speci

Re: Search within words

2012-01-23 Thread Lee Carroll

check your defaultOperator, ensure its OR On 23 January 2012 05:56, jawedshamshedi wrote: > Hi > Thanks for the reply.. > I am using NGramFilterFactory for this. But it's not working as desired. > Like I have a field article_type that has been indexed using the below > mentioned field type. > >

Re: Filtering search results by an external set of values

2012-01-23 Thread Jan Høydahl

Hi, Do you have any kind of "group" membership for you users? If you have, a resource's list of security access tokens could be smaller and avoid re-indexing most resources when adding "normal" users which mostly belong to groups. The common way is to add filters on the query. You may do it you

Re: Trying to understand SOLR memory requirements

2012-01-23 Thread Lee Carroll

on selection issue another query to get your additional data (if i follow what you want) On 22 January 2012 18:53, Dave wrote: > I take it from the overwhelming silence on the list that what I've asked is > not possible? It seems like the suggester component is not well supported > or understood,

Highlighting stopwords

2012-01-23 Thread O. Klein

Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Is there a way to do this? -- View this message in context: http://lucene.472066.n3.nab

Re: Improving Solr Spell Checker Results

2012-01-23 Thread David Radunz

Hey, Thanks for that, I have uploaded a new patch as advised. Cheers, David On 23/01/2012 1:01 PM, Erick Erickson wrote: David: There's some good info here: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches But the short form is to go into solr_home and issue this command

Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-23 Thread Shashi Kant

You can update the document in the index quite frequently. IDNK what your requirement is, another option would be to boost query time. On Sun, Jan 22, 2012 at 5:51 AM, Bing Li wrote: > Dear Shashi, > > Thanks so much for your reply! > > However, I think the value of PageRank is not a static one.

RE: Improving Solr Spell Checker Results

2012-01-23 Thread Dyer, James

David, Thank you for taking the time to evaluate SOLR-2585. Perhaps the title of the issue advertises more than it delivers? (The name is borrowed from a section in the first book listed here: http://wiki.apache.org/lucene-java/InformationRetrieval) In any case, I think SOLR-2585 is a step

Limiting term frequency in a document to a specific term

2012-01-23 Thread solr user

0 down vote favorite share [fb] share [tw] What is the proper query URL to limit the term frequency to just one term in a document? Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/

edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Michael Jakl

Hi, I've been wondering why some of my queries did not return the results I expected. A debugQuery resulted in the following: "java"^0.0 OR "haskell"^0.0 OR "python"^0.0 OR ("ruby"^0.0) AND (("programming"^0.0)) OR "programming language"^0.0 OR "code coding"^0.0 OR -"mobile"^0.0 OR -"android"^0.0

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Erick Erickson

In general, do not optimize unless you 1> have a very static index 2> actually test the search performance afterwards. First, as Andrew says, optimizing will force a complete copy of the entire index at replication. If you do NOT optimize, only the most recent segments to be written are copied. S

Re: Search within words

2012-01-23 Thread Erick Erickson

Please provide more info. In particular what is the output when you attach &debugQuery=on? Best Erick On Mon, Jan 23, 2012 at 5:11 AM, Lee Carroll wrote: > check your defaultOperator, ensure its OR > > On 23 January 2012 05:56, jawedshamshedi wrote: >> Hi >> Thanks for the reply.. >> I am using

Re: Filtering search results by an external set of values

2012-01-23 Thread Erick Erickson

A second, but arguably quite expert option, is to use the no-cache option. See: https://issues.apache.org/jira/browse/SOLR-2429 The idea here is that you can specify that a filter is "expensive" and it will only be run after all the other filters & etc have been applied. Furthermore, it will not b

Re: edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Erick Erickson

Count your parentheses (anyone here speak Lisp?) I think that + is outside the entire clause, meaning it's saying that there is a single mandatory clause, and it's the whole thing But boosting by 0.0 is probably a really bad thing. This may be dropping all the scores to 0, which means "no matc

Re: Solr Cores

2012-01-23 Thread Erick Erickson

You can have a large number of cores, some people have multiple hundreds. Having multiple cores is preferred over having multiple JVMs since it's more efficient at sharing system resources. If you're running a 32 bit JVM, you are limited in the amount of memory you can let the JVM use, so that's a

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Maxim Veksler

Wonderful input. Thank you very much Erick. One question, I've been told that Solr supports an operation mode of multi core where you build the index on the master (optimize or not) then pass it to the "stand by" core on the slaves. Once the synchronization is complete you switch on the slave betw

Re: Solr Cluster - Is it wise to run optimize() on the master after each update

2012-01-23 Thread Erick Erickson

My first reaction is that, unless you have a specific use-case, this is unnecessary. When using a slave the Solr replication goes on in the background. Autowarming also is carried out in the background. Only when the autowarming is done are queries sent to the new (internal-to-solr) searcher. All w

Oregon (OR) cities facet query issue, maybe related to OR being a reserved word?

2012-01-23 Thread asi123

Hi, I would really appreciate any hint/guide to fix this query issue. A Java webapp hits solr with a query that does not returns any result but works for other states. (FL, CA for instance) >From logs: [code] solr path=/select params={facet=on&facet.mincount=5&facet.sort=count&q=listing.property.s

Re: edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Michael Jakl

Hi! On Mon, Jan 23, 2012 at 18:42, Erick Erickson wrote: > Count your parentheses (anyone here speak Lisp?) I think that + > is outside the entire clause, meaning it's saying that there is > a single mandatory clause, and it's the whole thing You're right in that case it's the whole query. P

Re: Oregon (OR) cities facet query issue, maybe related to OR being a reserved word?

2012-01-23 Thread Ahmet Arslan

> I would really appreciate any hint/guide to fix this query > issue. A Java > webapp hits solr with a query that does not returns any > result but works for > other states. (FL, CA for instance) > From logs: > [code] > solr path=/select > params={facet=on&facet.mincount=5&facet.sort=count&q=listin

Re: edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Erick Erickson

Right. Essentially, the precedence is given to AND, so this is parsed as though it were python OR (ruby AND programming) OR "programming language" Best Erick On Mon, Jan 23, 2012 at 10:55 AM, Michael Jakl wrote: > Hi! > > On Mon, Jan 23, 2012 at 18:42, Erick Erickson wrote: >> Count your parent

ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

2012-01-23 Thread Wayne W

Hi, Im been trying to figure this out now for a few days and I'm just not getting anywhere, so any pointers would be MOST welcome. I'm in the process of upgrading from 1.3 to the latest and greatest version of Solr and I'm getting there slowly. However I have this (final) problem that when sending

Re: Limiting term frequency in a document to a specific term

2012-01-23 Thread Ahmet Arslan

> Below is an example query to search for the term frequency > in a document, > but it is returning the frequency for all the terms. > > [ > http://localhost:8983/solr/select/?fl=documentPageId&q=documentPageId:49667.3&qt=tvrh&tv.tf=true&tv.fl=contents][1 > ] > > I would like to be able to limit

Hierarchical faceting in UI

2012-01-23 Thread Yuhao

I have some hierarchical data that I want to represent in the Solr UI (/browse). I've read through many discussions on this topic, including http://wiki.apache.org/solr/HierarchicalFaceting and http://packtlib.packtpub.com/library/9781849516068/ch06lvl1sec09 . However, I didn't see a solution

Re: edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Michael Jakl

On Mon, Jan 23, 2012 at 22:05, Erick Erickson wrote: > Right. Essentially, the precedence is given to AND, so this is parsed > as though it were python OR (ruby AND programming) OR "programming language" That's exactly what I'd expect, but the problem is that "ruby" is marked as mandatory, that i

Solr Java client API

2012-01-23 Thread jingjung Ng

Hi, I implemented the facet using query.addFacetQuery query.addFilterQuery to facet on: gender:male state:DC This works fine. How can I facet on multi-values using Solrj API, like following: gender:male gender:female state:DC I've tried, but return 0. Can anyone help ? Thanks, -jingjung n

Re: Highlighting stopwords

2012-01-23 Thread Koji Sekiguchi

(12/01/23 23:14), O. Klein wrote: Im using trunk and FVH and eventhough I filter stopwords when searching, I would like to highlight stopwords in fragments. Using a different field without the stopwords filter did not have the desired effect. Please provide more info. In particular, how your qu

RE: Oregon (OR) cities facet query issue, maybe related to OR being a reserved word?

2012-01-23 Thread Ritzman, James

Hello, I'm no expert here (just started learning/using Solr a few months ago) but I ran into the same issue of needing to search for and facet on the OR abbreviation. What worked for me was to double-escape OR (a la :\\OR) for queries and single escape (:\OR) when doing a facet query. The pag

Re: ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0

2012-01-23 Thread Jan Høydahl

Hi, It's because lowernames=true by default in solrconfig.xml, and it will convert any "-" into "_" in field names. So try adding a request parameter &lowernames=false or change the default in solrconfig.xml. Alternatively, leave as is but name your fields project_id and company_id :) http://w

Re: Hierarchical faceting in UI

2012-01-23 Thread darren

On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao wrote: > Programmatically, something like this might work: for each facet field, > add another hidden field that identifies its parent. Then, program > additional logic in the UI to show only the facet terms at the currently > selected level. For

Re: Hierarchical faceting in UI

2012-01-23 Thread Johannes Goll

another way is to store the original hierarchy in a sql database (in the form: id, parent_id, name, level) and in the Lucene index store the complete hierarchy (from root to leave node) for each document in one field using the ids of the sql database. In that way you can get documents at any level

Re: Highlighting stopwords

2012-01-23 Thread O. Klein

Koji Sekiguchi wrote > > (12/01/23 23:14), O. Klein wrote: >> Im using trunk and FVH and eventhough I filter stopwords when searching, >> I >> would like to highlight stopwords in fragments. Using a different field >> without the stopwords filter did not have the desired effect. > > Please provi

hot deploy of newer version of solr schema in production

2012-01-23 Thread roz dev

Hi All, I need community's feedback about deploying newer versions of solr schema into production while existing (older) schema is in use by applications. How do people perform these things? What has been the learning of people about this. Any thoughts are welcome. Thanks Saroj

Re: Ngram autocompleter and term frequency boosting

2012-01-23 Thread Cuong Hoang

Thanks for your replies. I can't apply index-time boost because I don't know the term frequencies in advance. Additionally, new documents come in every few minutes which make maintaining term frequencies outside Solr a difficult task. Facet prefix would probably help in this case. I thought there

Size of index to use shard

2012-01-23 Thread Anderson vasconcelos

Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 40

Re: edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Erick Erickson

Well, at root the Lucene query parser makes no claim of enforcing boolean logic. Think in terms of MUST, SHOULD and NOT instead. Here's a good writeup... http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ Best Erick On Mon, Jan 23, 2012 at 2:43 PM, Michael Jakl wrote: > On

Re: CLOSE_WAIT after connecting to multiple shards from a primary shard

2012-01-23 Thread Ranveer

Hi Mukund, Since I am getting this issue for long time, I had done some hit and run. In my case I am connecting the local tomcat server using solrJ. SolrJ has max connection perhost 20 and per client 2. As I have heavy load and lots of dependency on solr so it seems very low. To increase the

How to increase maxConnection and maxConnectionPerHost in SolrJ

2012-01-23 Thread Jonty Rhods

Hi All, I have two tomcat on same server. One is for Solr and other is my application server. I am conneting solr server with solrj from application server. As I am connecting locally so the default connection seems to be very less. My server stop responding every few hour only up when I reset the

Re: java.net.SocketException: Too many open files

2012-01-23 Thread Jonty Rhods

Hi Kuli, Did you get the solution of this problem? I am still facing this problem. Please help me to overcome this problem. regards On Wed, Oct 26, 2011 at 1:16 PM, Michael Kuhlmann wrote: > Hi; > > we have a similar problem here. We already raised the file ulimit on the > server to 4096, but

Re: CLOSE_WAIT after connecting to multiple shards from a primary shard

2012-01-23 Thread Mikhail Khludnev

Hello, AFAIK by setting connectionManager.closeIdleConnections(0L); you preventing your http connecitons from caching aka disabling keep-alive. If you increase it enough you won't see many CLOSE_WAIT connections. Some explanation and solution for jdk's http client (URL Connection), not for your

Re: edismax/dismax/Lucene Query Parser converts some fields to be "mandatory"

2012-01-23 Thread Michael Jakl

On Tue, Jan 24, 2012 at 06:27, Erick Erickson wrote: > Well, at root the Lucene query parser makes no claim of > enforcing boolean logic. Think in terms of MUST, SHOULD > and NOT instead. > > Here's a good writeup... > > http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ Hi,

48 matches

Mail list logo