Re: Java heap space exception in 4.2.1

2013-05-27 Thread Jam Luo
I have the same problem. at 4.1 ,a solr instance could take 8,000,000,000 doc. but at 4.2.1, a instance only take 400,000,000 doc, it will oom at facet query. the facet field was token by space. May 27, 2013 11:12:55 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeExcep

Re: Java heap space exception in 4.2.1

2013-05-27 Thread Jam Luo
I am sorry about a type mistake 8,000,000,000 -> 800,000,000 2013/5/27 Jam Luo > I have the same problem. at 4.1 ,a solr instance could take 8,000,000,000 > doc. but at 4.2.1, a instance only take 400,000,000 doc, it will oom at > facet query. the facet field was token by space. > > May 27,

Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

2013-05-27 Thread Dotan Cohen
On Sun, May 26, 2013 at 8:16 PM, Jack Krupansky wrote: > The only comment I was trying to make here is the relationship between the > RemoveDuplicatesTokenFilterFactory and the KeywordRepeatFilterFactory. > > No, stemmed terms are not considered the same text as the original word. By > definition,

Re: Indexing message module

2013-05-27 Thread Gora Mohanty
On 27 May 2013 12:58, Arkadi Colson wrote: > Hi > > We would like to index our messages system. We should be able to search for > messages for specific recipients due to performance issues on our databases. > But the message is of course the same for all receipients and the message > text should b

Indexing message module

2013-05-27 Thread Arkadi Colson
Hi We would like to index our messages system. We should be able to search for messages for specific recipients due to performance issues on our databases. But the message is of course the same for all receipients and the message text should be saved only once! Is it possible to have some kin

Re: Overlapping onDeckSearchers=2

2013-05-27 Thread heaven
Hi, thanks for the response. Seems like this is the case because there are no any other applications that could fire commit/optimize calls. All commits are triggered by Solr and the optimize is triggered by a cron task. Because of all that it looks like a bug in Solr. It probably should not run co

Re: Indexing message module

2013-05-27 Thread Arkadi Colson
Yes indeed... Thx! On 05/27/2013 09:33 AM, Gora Mohanty wrote: On 27 May 2013 12:58, Arkadi Colson wrote: Hi We would like to index our messages system. We should be able to search for messages for specific recipients due to performance issues on our databases. But the message is of course th

RE: Tika: How can I import automatically all metadata without specifiying them explicitly

2013-05-27 Thread Gian Maria Ricci
Thanks for the help. @Alexandre: Thanks for the suggestion, I'll try to use an ExtractingRequestHandler, I thought that I was missing some DIH option :). @Erik: I'm interested in knowing them all to do various form of analysis. I have documents coming from heterogeneous sources and I'm interested

Application connecting to SOLR cloud

2013-05-27 Thread sathish_ix
Hi, We have setup the SOLR cloud with zookeeper. Zookeeper (localhost:8000) 1 shard (localhost:9000) 2 Replica (localhost:9001,localhost:9002) Question : We load the solr index from Relational DB using DIH, Based on solr cloud documentation the request to load the data will be forwarded

Re: Overlapping onDeckSearchers=2

2013-05-27 Thread Jack Krupansky
The intent is that optimize is obsolete and should no longer be used, especially with tiered merge policy running. In other words, merging should be occurring on the fly in Lucene now. What release of Solr are you running? -- Jack Krupansky -Original Message- From: heaven Sent: Monda

Re: How can I import automatically all metadata without specifiying them explicitly

2013-05-27 Thread Jack Krupansky
Setting the uprefix parameter of SolrCell (ERH) to something like "attr_" will result in all metatdata attributes that are not named in the Solr schema being given the "attr_" prefix to their metadata attribute names. For example, curl "http://localhost:8983/solr/update/extract?literal.id=doc-

RE: Tika: How can I import automatically all metadata without specifiying them explicitly

2013-05-27 Thread Alexandre Rafalovitch
Standalone Tika can also run in a network server mode. That increases data roundtrips but gives you more options. Even in .net . Regards, Alex On 27 May 2013 04:22, "Gian Maria Ricci" wrote: > Thanks for the help. > > @Alexandre: Thanks for the suggestion, I'll try to use an > ExtractingR

Re: Java heap space exception in 4.2.1

2013-05-27 Thread Erick Erickson
400M docs is quite a large number of documents for a single piece of hardware, and if you're faceting over a large number of unique values, this will chew up memory. So it's not surprising that you're seeing OOMs, I suspect you just have too many documents on a single machine.. Best Erick On Mo

Re: Application connecting to SOLR cloud

2013-05-27 Thread Erick Erickson
There's no requirement to send the document to any leader, send updates to any node in the system. The documents will be automatically forwarded to the appropriate leaders. You may be getting confused by the "leader aware" solr client stuff. It's slightly more efficient to send updates to the lead

Re: Note on The Book

2013-05-27 Thread Koji Sekiguchi
Hi Jack, I'd like to ask as a person who contributed a case study article about "Automatically acquiring synonym knowledge from Wikipedia" to the book. (13/05/24 8:14), Jack Krupansky wrote: To those of you who may have heard about the Lucene/Solr book that I and two others are writing on Luce

A strange RemoteSolrException

2013-05-27 Thread Hans-Peter Stricker
Hello, I'm writing my first little Solrj program, but don't get it running because of an RemoteSolrException: Server at http://localhost:8983/solr returned non ok status:404 The server is definitely running and the url works in the browser. I am working with Solr 4.3.0. This is my source code

Re: Note on The Book

2013-05-27 Thread Jack Krupansky
If you would like to Solr-ize your contribution, that would be great. The focus of the book will be hard-core Solr. -- Jack Krupansky -Original Message- From: Koji Sekiguchi Sent: Monday, May 27, 2013 8:07 AM To: solr-user@lucene.apache.org Subject: Re: Note on The Book Hi Jack, I'd

Re: index multiple files into one index entity

2013-05-27 Thread Alexandre Rafalovitch
You did not open source it by any chance? :-) Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous

Re: index multiple files into one index entity

2013-05-27 Thread Yury Kats
No, the implementation was very specific to my needs. On 5/27/2013 8:28 AM, Alexandre Rafalovitch wrote: > You did not open source it by any chance? :-) > > Regards, >Alex.

using solr for web page classification

2013-05-27 Thread Rajesh Nikam
Hello, I am working on implementation of system to categorize URLs/Web Pages. I would have categories like ... Adult Health Business Arts Home Science I am looking at how Lucence/Solr could help me out to achive this. I came across links that mention MoreLik

sourceId of JMX

2013-05-27 Thread 菅沼 嘉一
Hello Our team faced the problem regarding the sourceId of JMX when getting the information of JMX from tomcat manager. Command: curl http://localhost:${PORT}/manager/jmxproxy?qry=solr:type=documentCache,* Here is the error log (tomcat/manager log).

Solr/Lucene Analayzer That Writes To File

2013-05-27 Thread Furkan KAMACI
Hi; I want to use Solr for an academical research. One step of my purpose is I want to store tokens in a file (I will store it at a database later) and I don't want to index them. For such kind of purposes should I use core Lucene or Solr? Is there an example for writing a custom analyzer and just

Re: Solr/Lucene Analayzer That Writes To File

2013-05-27 Thread Rafał Kuć
Hello! Take a look at custom posting formats. For example here is a nice post showing what you can do with Lucene SimpleText codec: http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html However please remember that it is not advised to use that codec in production environmen

Re: Overlapping onDeckSearchers=2

2013-05-27 Thread Yonik Seeley
On Mon, May 27, 2013 at 7:11 AM, Jack Krupansky wrote: > The intent is that optimize is obsolete and should no longer be used That's incorrect. People need to understand the cost of optimize, and that it's use is optional. It's up to the developer to figure out of the benefits of calling optimiz

Re: A strange RemoteSolrException

2013-05-27 Thread Shalin Shekhar Mangar
I downloaded solr 4.3.0, started it up with java -jar start.jar (from inside the example directory) and executed your program. No exceptions are thrown. Is there something you did differently? On Mon, May 27, 2013 at 5:45 PM, Hans-Peter Stricker wrote: > Hello, > > I'm writing my first little So

RE: Overlapping onDeckSearchers=2

2013-05-27 Thread Markus Jelsma
forceMerge is very useful if you delete a significant portion of an index. It can take a very long time before any merge policy decides to finally merge them all away, especially for a static or infrequently changing index. Also, having a lot of deleted docs in the index can be an issue if your

Re: Overlapping onDeckSearchers=2

2013-05-27 Thread Jack Krupansky
As the wiki does say: "if at all ... Segments are normally merged over time anyway (as determined by the merge policy), and optimize just forces these merges to occur immediately." So, the only real question here is if the optimize really does lie outside the "if at all" category and whether "

Re: sourceId of JMX

2013-05-27 Thread Shalin Shekhar Mangar
This is a bug. The sourceId should have been removed from the SolrDynamicMBean. I'll create an issue. On Mon, May 27, 2013 at 6:39 PM, 菅沼 嘉一 wrote: > Hello > > Our team faced the problem regarding the sourceId of JMX when getting the > information of JMX from tomcat manager. > > Command: > curl

Re: Overlapping onDeckSearchers=2

2013-05-27 Thread heaven
I am on 4.2.1 @Yonik Seeley I do understand the cost and run it once per 24 hours and perhaps later this interval will be increased up to a few days. In general I am optimizing not to merge the fragments but to remove deleted docs. My index refreshes quickly and number of deleted docs could reach

AW: Core admin action "CREATE" fails to persist some settings in solr.xml with Solr 4.3

2013-05-27 Thread André Widhani
I created SOLR-4862 ... I found no way to assign the ticket to somebody though (I guess it is is under "Workflow", but the button is greyed out). Thanks, André

Re: sourceId of JMX

2013-05-27 Thread Shalin Shekhar Mangar
I opened https://issues.apache.org/jira/browse/SOLR-4863 On Mon, May 27, 2013 at 7:35 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > This is a bug. The sourceId should have been removed from the > SolrDynamicMBean. I'll create an issue. > > > On Mon, May 27, 2013 at 6:39 PM, 菅沼 嘉一

Re: Distributed query: strange behavior.

2013-05-27 Thread Luis Cappa Banda
Hello, guys! Well, I've done some tests and I think that there exists some kind of bug related with distributed search. Currently I'm setting a key field that it's impossible to be duplicated, and I have experienced the same wrong behavior with numFound field while changing rows parameter. Has any

Re: A strange RemoteSolrException

2013-05-27 Thread Hans-Peter Stricker
Yes, I started it up with java -Dsolr.solr.home=example-DIH/solr -jar start.jar. Without the java options I don't get the expections neither! (I should have checked.) What now? -- From: "Shalin Shekhar Mangar" Sent: Monday, May 27, 2013 3:58 P

Re: A strange RemoteSolrException

2013-05-27 Thread Shawn Heisey
On 5/27/2013 6:15 AM, Hans-Peter Stricker wrote: > I'm writing my first little Solrj program, but don't get it running because > of an RemoteSolrException: Server at http://localhost:8983/solr returned non > ok status:404 > > The server is definitely running and the url works in the browser. >

Re: A strange RemoteSolrException

2013-05-27 Thread Shawn Heisey
On 5/27/2013 8:24 AM, Hans-Peter Stricker wrote: > Yes, I started it up with java -Dsolr.solr.home=example-DIH/solr -jar > start.jar. That explains it. See my other reply. The solr.xml file for example-DIH does not have a defaultCoreName attribute. Thanks, Shawn

Re: A strange RemoteSolrException

2013-05-27 Thread Hans-Peter Stricker
Dear Shawn, dear Shalin, thanks for your valuable replies! Could/should I have known better (by reading more carefully the manual)? I'll try to fix it - and I am confident that it will work! Best regards Hans -- From: "Shawn Heisey" Sent: Mond

RE: Tika: How can I import automatically all metadata without specifiying them explicitly

2013-05-27 Thread Gian Maria Ricci
Thanks a lot, other useful hints, and probably standalone Tika could be a solution. I've another little question: how can I express filters in DIH configuration to run import of the server incrementally? Actually I've two distinct scenario. In first scenario I've documents stored inside datab

[blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-27 Thread Koji Sekiguchi
Hello, Sorry for cross post. I just wanted to announce that I've written a blog post on how to create synonyms.txt file automatically from Wikipedia: http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html Hope that the article gives someone a good experience! koji

Re: A strange RemoteSolrException

2013-05-27 Thread Shawn Heisey
On 5/27/2013 8:34 AM, Hans-Peter Stricker wrote: > Dear Shawn, dear Shalin, > > thanks for your valuable replies! > > Could/should I have known better (by reading more carefully the manual)? I just looked at the wiki. The SolrJ wiki page doesn't mention using the core name, which I find surpris

Re: Note on The Book

2013-05-27 Thread Koji Sekiguchi
Now my contribution can be read on soleami blog in English: Automatically Acquiring Synonym Knowledge from Wikipedia http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html koji (13/05/27 21:16), Jack Krupansky wrote: If you would like to Solr-ize your contributio

Specifiy colums to return for mlt results

2013-05-27 Thread Achim Domma
Hi, I'm executing a search and retrieve more like this results. For the search results, I can specify the columns to be returned via the "fl" parameter. The "mlt.fl" parameter defines the columns to be used for similarity calculation. The mlt-results see to return the columns specified in "fl"

Problems with DIH in Solrj

2013-05-27 Thread Hans-Peter Stricker
I start the SOLR example with java -Dsolr.solr.home=example-DIH/solr -jar start.jar and run public static void main(String[] args) { String url = "http://localhost:8983/solr/rss";; SolrServer server; SolrQuery query; try { server = new HttpSolrServer

Re: Problems with DIH in Solrj

2013-05-27 Thread Shalin Shekhar Mangar
Your program is not specifying a command. You need to add: query.setParam("command", "full-import"); On Mon, May 27, 2013 at 9:31 PM, Hans-Peter Stricker wrote: > I start the SOLR example with > > java -Dsolr.solr.home=example-DIH/solr -jar start.jar > > and run > > public static void main(Stri

Re: Problems with DIH in Solrj

2013-05-27 Thread Hans-Peter Stricker
Marvelous!! Once again: where could/should I have read this? What kinds of concepts/keywords are "command" and "full-import"? (Couldn't find them in any config file. Where are they explained?) Anyway: Now it works like a charm! Thanks Hans

Re: Problems with DIH in Solrj

2013-05-27 Thread Shalin Shekhar Mangar
Details about the DataImportHandler are on the wiki: http://wiki.apache.org/solr/DataImportHandler In general, the SolrJ client just makes HTTP requests to the corresponding Solr APIs so you need to learn about the http parameters for the corresponding solr component. The solr wiki is your best b

Re: Problems with DIH in Solrj

2013-05-27 Thread Shawn Heisey
On 5/27/2013 10:20 AM, Hans-Peter Stricker wrote: > Marvelous!! > > Once again: where could/should I have read this? What kinds of > concepts/keywords are "command" and "full-import"? (Couldn't find them > in any config file. Where are they explained?) > > Anyway: Now it works like a charm! http

Unable to start solr 4.3

2013-05-27 Thread Gian Maria Ricci
I've a test VM where I usually test solr installation. In that VM I already configured solr4.0 and everything went good. Today I download the 4.3 version, unpack everything, configuring TOMCAT as I did for the 4.0 version but the application does not start, and in catilina log I find only May 2

Re: Unable to start solr 4.3

2013-05-27 Thread Alexandre Rafalovitch
The usual answer (which may or may not be relevant) is that Solr 4.3 has moved the logging libraries around and you need to copy specific library implementations to your Tomcat lib files. If that sounds as a possible, search the mailing list for a number of detailed discussions on this topic. Rega

Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
Hi. Searching terms with wildcard in their start, is solved with ReversedWildcardFilterFactory. But, what about terms with wildcard in both start AND end? This query is heavy, and I want to disallow such queries from my users. I'm looking for a way to cause these queries to fail. I guess there i

Re: Unable to start solr 4.3

2013-05-27 Thread Shawn Heisey
On 5/27/2013 12:00 PM, Alexandre Rafalovitch wrote: > The usual answer (which may or may not be relevant) is that Solr 4.3 has > moved the logging libraries around and you need to copy specific library > implementations to your Tomcat lib files. If that sounds as a possible, > search the mailing li

RE: Unable to start solr 4.3

2013-05-27 Thread Gian Maria Ricci
Thanks, I'll check :) -- Gian Maria Ricci Mobile: +39 320 0136949 -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Monday, May 27, 2013 8:00 PM To: solr-user@lucene.apache.org; alkamp...@nablasoft.com Subject: Re: Unable to start solr 4.3 The usual

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
You are right that starting to parse the query before the query component can get soon very ugly and complicated. You should take advantage of the flex parser, it is already in lucene contrib - but if you are interested in the better version, look at https://issues.apache.org/jira/browse/LUCENE-501

RE: Unable to start solr 4.3

2013-05-27 Thread Gian Maria Ricci
Thanks a lot, it seems that probably solr won't start because of all the log libraries missing. Once I copied all needed log libraries inside c:\tomcat\libs solr started with no problem. If other person are interested, here is the link on the wiki that states changes in logging library in solr

Re: Keeping a rolling window of indexes around solr

2013-05-27 Thread Otis Gospodnetic
Hi, SolrCloud now has the same index aliasing as Elasticsearch. I can't lookup the link now but Zoie from LinkedIn has Hourglass, which is uses for circular buffer sort of index setup if I recall correctly. Otis Solr & ElasticSearch Support http://sematext.com/ On May 24, 2013 10:26 AM, "Saikat

Re: Keeping a rolling window of indexes around solr

2013-05-27 Thread Alexandre Rafalovitch
But how is Hourglass going to help Solr? Or is it a portable implementation? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't se

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
Thanks Roman. Based on some of your suggestions, will the steps below do the work? * Create (and register) a new SearchComponent * In its prepare method: Do for Q and all of the FQs (so this SearchComponent should run AFTER QueryComponent, in order to see all of the FQs) * Create org.apache.lucene

RE: sourceId of JMX

2013-05-27 Thread 菅沼 嘉一
Thank you, Shalin. I'll see it. >-Original Message- >From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] >Sent: Monday, May 27, 2013 11:11 PM >To: solr-user@lucene.apache.org >Subject: Re: sourceId of JMX > >I opened https://issues.apache.org/jira/browse/SOLR-4863 > > >On Mon, May

Solr 4.3.0 geo search with multiple coordinates

2013-05-27 Thread Eric Grobler
Hi Solr experts, I have a solr 4.3 schema and xml data 51.1164,6.9612 52.3473,9.77564 If I run this query: fq={!geofilt pt=51.11,6.9 sfield=location_geo d=20} I get no result. But if I remove the second geo line and only have this geo coordinate it works: 51.1164,6.9612 *Thus it seems that

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
Hi Issac, it is as you say, with the exception that you create a QParserPlugin, not a search component * create QParserPlugin, give it some name, eg. 'nw' * make a copy of the pipeline - your component should be at the same place, or just above, the wildcard processor also make sure you are setti

Re: Solr 4.3.0 geo search with multiple coordinates

2013-05-27 Thread Eric Grobler
I think I found the reason/bug the type was wrong, it should be On Tue, May 28, 2013 at 1:37 AM, Eric Grobler wrote: > Hi Solr experts, > > I have a solr 4.3 schema > "solr.SpatialRecursivePrefixTreeFieldType" geo="true" distErrPct="0.025" > maxDistErr="0.09" units="degrees" /> > > multi

RE: sourceId of JMX

2013-05-27 Thread 菅沼 嘉一
Shalin, We tried use it after removing staticStats.add("sourceId"), it seems going with no problem. Do you know any other side effects by removing it ? Regards suganuma >-Original Message- >From: 菅沼 嘉一 [mailto:yo_sugan...@waku-2.com] >Sent: Tuesday, May 28, 2013 9:30 AM >To: solr-user@l

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
I don't want to affect on the (correctness of the) real query parsing, so creating a QParserPlugin is risky. Instead, If I'll parse the query in my search component, it will be detached from the real query parsing, (obviously this causes double parsing, but assume it's OK)... On Tue, May 28, 2013

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-27 Thread Rajesh Nikam
Hello Koji, This is seems pretty useful post on how to create synonyms file. Thanks a lot for sharing this ! Have you shared source code / jar for the same so at it could be used ? Thanks, Rajesh On Mon, May 27, 2013 at 8:44 PM, Koji Sekiguchi wrote: > Hello, > > Sorry for cross post. I jus

Re: sourceId of JMX

2013-05-27 Thread Shalin Shekhar Mangar
Suganuma, No, there shouldn't be any side effects. On Tue, May 28, 2013 at 7:13 AM, 菅沼 嘉一 wrote: > Shalin, > > We tried use it after removing staticStats.add("sourceId"), it seems going > with no problem. > Do you know any other side effects by removing it ? > > Regards > suganuma > > >-Or

Re: Benchmarking Solr

2013-05-27 Thread Otis Gospodnetic
Hi Benson, We typically use https://github.com/sematext/ActionGenerator As a matter of fact, we are using it right now to test one of our search clusters... Otis -- Solr & ElasticSearch Support http://sematext.com/ On Sun, May 26, 2013 at 10:38 AM, Benson Margulies wrote: > I'd like to run

RE: sourceId of JMX

2013-05-27 Thread 菅沼 嘉一
Thank you, Shalin. >-Original Message- >From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] >Sent: Tuesday, May 28, 2013 2:22 PM >To: solr-user@lucene.apache.org >Subject: Re: sourceId of JMX > >Suganuma, > >No, there shouldn't be any side effects. > > >On Tue, May 28, 2013 at 7:13

delta-import tweaking?

2013-05-27 Thread Kristian Rink
Folks; playing with Solr and an existing (legacy) RDBMS structure which we can't change much, I am trying to figure out how to best make Solrs full/delta import work for me. A few thoughts: (a) The usual tutorials outline something like WHERE LASTMODIFIED > '${dih.last_index_time} in order to

Re: multiple cache for same field

2013-05-27 Thread J Mohamed Zahoor
It does not seem to be memory footprint also ? looks too high for my index. ./zahoor On 20-May-2013, at 10:55 PM, Jason Hellman wrote: > Most definitely not the number of unique elements in each segment. My 32 > document sample index (built from the default example docs data) has the > fol

HyperLogLog for Solr

2013-05-27 Thread J Mohamed Zahoor
Hi Has anyone tried using HLL for using finding unique values of a field in solr? I am planning to use them to facet count on certain fields to reduce memory footprint. ./Zahoor

Re: Indexing message module

2013-05-27 Thread Arkadi Colson
Is it ok to just change the multivalue attribute to true and reindex the message module data? There are also other modules indexed on the same schema with multivalued = false. Will it become a problem? BR, Arkadi On 05/27/2013 09:33 AM, Gora Mohanty wrote: On 27 May 2013 12:58, Arkadi Colson