Re: suggestions w.r.t Issue with Collections API in 4.1

2013-02-14 Thread Mark Miller
I don't know - by chance, I'm actually doing about the same sequence of events right now with Solr 4.1, and the cores are running fine… What do the logs say? - Mark On Feb 14, 2013, at 10:18 PM, Anirudha Jadhav wrote: > *1.empty Zookeeper* > *2.empty index directories for solr* > *3.empty sol

How to make this work with SOLR ( LUCENE-2899 : Add OpenNLP Analysis capabilities as a module)

2013-02-14 Thread Vinay B,
I'm trying to explore Parts-Of-Speech tagging with SOLR. Firstly, am I right in assuming that OpenNLP integration is the right direction in which to proceed? With respect to getting OpenNLP to work with SOLR ( http://wiki.apache.org/solr/OpenNLP ) , I tried following the instructions , only to be

suggestions w.r.t Issue with Collections API in 4.1

2013-02-14 Thread Anirudha Jadhav
*1.empty Zookeeper* *2.empty index directories for solr* *3.empty solr.xml* *3.1 upload / link cfg in zookeeper for test collection* *4*.* start 4 solr servers on different machines* *5. Access server* : i see that's ok *6. CREATE collection* http://hostname:15000/solr/admin/collections?a

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Otis Gospodnetic
You could run Lucene benchmark stuff and compare. Or look at ActionGenerator from Sematext on Github which you could also use for performance testing and comparing. Otis Solr & ElasticSearch Support http://sematext.com/ On Feb 14, 2013 10:56 AM, "Michael Della Bitta" < michael.della.bi...@appinion

Fetching the date based on lastupdate

2013-02-14 Thread ballusethuraman
Hi, I am having a column called 'lastUpdate' in my solr which will contain last updated date. Now i want fetch last 24 lastupdated dates from that column. How to do this??? Querying the solr server with the following URL fetches me the result , http://localhost/solr/MC_10701_catalogEntry/q=lastU

Re: Query question

2013-02-14 Thread Jack Krupansky
Use the edismax query parser and set the PF, PF2, and PF3 parameters so that adjacent pairs and triples of query terms will get "phrase boosted". See: http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 http://wiki.apache.org/solr/ExtendedDisMax#pf2_.28Phrase_bigram_fields.29 -- J

Query question

2013-02-14 Thread dm_tim
Howdy, I have a straight-forward index that contains a "name" field. I am currently taking a string of text, tokenizing it into individual strings and making a query out of them all against the "name" field. Note that the name field is split up by a whitespace tokenizer and a lower case filter du

Re: long QTime for big index

2013-02-14 Thread Mou
We have two boxes, they are really nice servers, 32 core cpu, 192 G memory with both RAID arrays and fusion IOs. But each of them running two instances of Solr, one for indexing and the other for searching.Search index is on fusion IO card. Each instance has 11 cores and a small core for making in

Re: Solr 3.3.0 - Random CPU problem

2013-02-14 Thread federico.wachs
I took your advice, waited for the servers to go down then: [ec2-user@zuk-solr-slave-02 ~]$ ps -wwwf -p 10131 UIDPID PPID C STIME TTY TIME CMD tomcat 10131 1 17 23:00 ?00:03:13 /usr/sbin/sshd This doesn't say much :( What should I do know? -- View this mess

Re: long QTime for big index

2013-02-14 Thread alxsss
Hi, It is curious to know how many linux boxes do you have and how many cores in each of them. It was my understanding that solr puts in the memory all documents found for a keyword, not the whole index. So, why it must be faster with more cores, when number of selected documents from many sepa

Re: long QTime for big index

2013-02-14 Thread Mou
Just to close this discussion , we solved the problem by splitting the index. It turned out that distributed search with 12 cores are faster than searching two cores. All queries ,tomcat configuration, jvm configuration remain same. Now queries are served in milliseconds. On Thu, Jan 31, 2013 at

Re: fatest way to rebuild Solr index

2013-02-14 Thread Mingfeng Yang
Shawn, Awesome. Exactly something I am looking for. Thanks! Ming On Thu, Feb 14, 2013 at 12:00 PM, Shawn Heisey wrote: > On 2/14/2013 12:46 PM, Mingfeng Yang wrote: > >> I have a few Solr indexes, each with 20-200 millions documents, which were >> indexed by querying multiple PostgreSQL data

Re: fatest way to rebuild Solr index

2013-02-14 Thread Shawn Heisey
On 2/14/2013 12:46 PM, Mingfeng Yang wrote: I have a few Solr indexes, each with 20-200 millions documents, which were indexed by querying multiple PostgreSQL databases. If I do rebuild the index by the same way, it would take a few months, because the PostgresSQL query is slow. Now, I need to

fatest way to rebuild Solr index

2013-02-14 Thread Mingfeng Yang
I have a few Solr indexes, each with 20-200 millions documents, which were indexed by querying multiple PostgreSQL databases. If I do rebuild the index by the same way, it would take a few months, because the PostgresSQL query is slow. Now, I need to do the following changes to all indexes. 1. de

RE: What should focus be on hardware for solr servers?

2013-02-14 Thread Toke Eskildsen
Steve Rowe [sar...@gmail.com] wrote: > On Feb 14, 2013, at 11:24 AM, Walter Underwood wrote: > > Laptop disks are slower than the EC2 disks. > My laptop disk is an SSD. So it's not a disk? ...Sorry, couldn't resist. Unfortunately Amazon only has two SSD-backed solutions and they are #3 and #2

RE: Can't determine Sort Order: 'prijs ASC', pos=5

2013-02-14 Thread Chris Hostetter
: I think the order needs to be in lowercase. Try "asc" instead of "ASC". Should be trivial to support uppercase ASC and DESC as well, not sure why no one thought of adding that before... https://issues.apache.org/jira/browse/SOLR-4458 ...patches welcome -Hoss

Re: Combining Solr score with customized user ratings for a document

2013-02-14 Thread Timothy Potter
Oops - that's definitely not the link I meant to give ;-) Here's the link from slideshare: http://www.slideshare.net/thelabdude/boosting-documents-in-solr-lucene-revolution-2011 In there we used Mahout to calculate recommendation scores and then loaded them using external file field. Cheers, Tim

Re: Combining Solr score with customized user ratings for a document

2013-02-14 Thread Timothy Potter
Start by looking at Solr's external file field and http://www.linkedin.com/profile/view?id=18807864&trk=tab_pro On Thu, Feb 14, 2013 at 6:24 AM, Á_o wrote: > Well, thinking a bit more, the second solution is not practical. > > If Solr retrieves, say, 1.000 documents, I would have to navigate

Re: How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Works perfectly. Thank you. I didn't know this tokenizer does nothing before :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500p4040507.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Upayavira
You can use a KeywordTokenizerFactory, which will tokenise into a single term, and then do your lowercasing. Does that get you what you want? Upayavira On Thu, Feb 14, 2013, at 05:11 PM, Bing Hua wrote: > Hi, > > I don't want the field to be tokenized because Solr doesn't support > sorting > on

Re: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread Steve Rowe
Hi Peter, Your "original query" didn't make it to the mailing list. You're experiencing a long-standing nabble bug: nabble eats code. (I've told them about it a couple of times, but the problem persists, so I guess they're not interested in fixing it.) My suggestion: don't use nabble for pos

How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Hi, I don't want the field to be tokenized because Solr doesn't support sorting on a tokenized field. In order to do case insensitive sorting I need to copy a field to a lowercase but not tokenized field. How to define this? I did below but it says I need to specify a tokenizer or a class for ana

RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread PeterKerk
Ok, something went wrong with posting the code,since I did not escape the quotes and ampersands. I tried your code, but nu luck. Here's the original query I'm trying to execute. What characters do I need to escape? I thought only the < and > characters? Thanks! -- View this message in contex

Re: Multi Core / On demand loading

2013-02-14 Thread vybe3142
Thanks, We run SOLR 4.0 in production. Yesterday, I ported our configuration to 4.1 on my local workstation. I just looked at the SOLR-4400 fix versions and as per the info, I might wait till 4.2 before porting. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-Core-On-

Re: compare two shards.

2013-02-14 Thread Michael Della Bitta
If you can spare the load of a long request, I'd do an unsorted query for everything, non-paged. I'd dump that into a line-per-row format and use something like Apache Hive to do the analysis. Michael Della Bitta Appinions 18 East 41st Street, 2nd

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Steve Rowe
On Feb 14, 2013, at 11:24 AM, Walter Underwood wrote: > Laptop disks are slower than the EC2 disks. My laptop disk is an SSD.

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Michael Della Bitta
Just for sake of comparison, http://www.ec2instances.info/ At the low end, EC2 CPUs come in 1, 2, 2.5, and 3.25 unit sizes. A m2.xlarge uses 3.25 unit CPUs, so one would have to step up to the high storage, high IO, or cluster compute nodes to do better than that at single threaded tasks. Good th

RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread Dyer, James
No, you still have to fix problems with data-config.xml. Just that prior to 4.0-alpha if you started solr with a problem in the config, you had no way to fix it and refreshing without restarting solr (or at least doing a core reload). With 4.0, you can fix your config file and just retry. I t

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Walter Underwood
Just using a single CPU (log processing with Python), my MacBook Pro (2GHz Intel Core i7) is twice as fast as an m2.xlarge EC2 instance. Laptop disks are slower than the EC2 disks. EC2 is for quantity, not quality. wunder On Feb 14, 2013, at 5:10 AM, Jack Krupansky wrote: > That raises the qu

Re: compare two shards.

2013-02-14 Thread Paul
I do a brute-force regression test where I read all the documents from shard 1 and compare them to documents in shard 2. I had to have all the fields stored to do that, but in my case that doesn't change the size of the index much. So, in other words, I do a search for a page's worth of documents

RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread PeterKerk
Ok, but I restarted solr several times and the issue still occurs. So my guess is that the entity I added contains errors: 50' END as PriceCategory From products) Select PriceCategory, Count(*) as Cnt From Categorized Group By PriceCategory "> Or are you saying tha

RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread Dyer, James
This looks like https://issues.apache.org/jira/browse/SOLR-2115 , which was fixed for 4.0-Alpha . Bascially, if you do not put a data-config.xml file in the "defaults" section in solrconfig.xml, or if your config file has any errors, you won't be able to use DIH unless you fix the problem and r

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Michael Della Bitta
Or perhaps we should develop our own, Solr-based benchmark... Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Feb 14, 2013 at 10:54 AM, Michael Della Bi

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Michael Della Bitta
My dual-core, HT-enabled Dell Latitude from last year has this CPU: model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz bogomips: 4988.65 An m3.xlarge reports: model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz bogomips : 4000.14 I tried running geekbench and phoronx-test-suite and fa

RE: Solr 4.1.0 not using solrcore.properties ?

2013-02-14 Thread Dyer, James
Daniel, This bug has already been recorded and hopefully will be fixed in time for 4.2. See https://issues.apache.org/jira/browse/SOLR-4361 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Daniel Rijkhof [mailto:daniel.rijk...@gmail.com] Sent: Wednesday, Febr

Re: MockAnalyzer in Lucene: attach stemmer or any custom filter?

2013-02-14 Thread Robert Muir
MockAnalyzer is really just MocKTokenizer+MockTokenFilter+ Instead you just define your own analyzer chain using MockTokenizer. This is the way all lucene's own analysis tests work: e.g. http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/test/org/apache/lucene/analysis

Re: Maximum Number of Records In Index

2013-02-14 Thread Macroman
Partial updates is nothing as clever as I may have made it sound, it is just changing a record value , for example last name from Smith to Jones, that's my partial update. No errors at all in indexing, I have not yet checked the logs , but the DIH output counts show no errors, here is an example

Re: Combining Solr score with customized user ratings for a document

2013-02-14 Thread Á_____o
Well, thinking a bit more, the second solution is not practical. If Solr retrieves, say, 1.000 documents, I would have to navigate through ALL (maybe less with some reasonable upper limit) of them to recalculate the scores and reorder them according to the new score although the Web App is going t

Re: Multi Core / On demand loading

2013-02-14 Thread Erick Erickson
Almost forgot. Do be aware of https://issues.apache.org/jira/browse/SOLR-4400. This came to light under an absurd load of opening/closing transient cores, which only means it won't show up until you go into production. The fix is on both trunk and 4x. On Thu, Feb 14, 2013 at 7:46 AM, Erick Eric

Re: Solr 4.1.0 not using solrcore.properties ?

2013-02-14 Thread Erick Erickson
Daniel: It would be great if you would go ahead and edit the Wiki, all you have to do is create a signon. Having just gone through the pain of figuring this out, you're best positioned to know how to warn others! Best Erick On Thu, Feb 14, 2013 at 4:56 AM, Daniel Rijkhof wrote: > James, > > I'

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Jack Krupansky
That raises the question of how your average professional notebook computer (PC or Mac or Linux) compares to a garden-variety cloud server such as an Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document ingestion rate or how many documents you can load before load and/or q

Re: Most common query

2013-02-14 Thread Ahmet Arslan
Hi, If I am not mistaken I saw some open jira to collect queries and calculate popular searches etc. Some commercial solutions exist: http://sematext.com/search-analytics/index.html http://soleami.com/blog/soleami-start_en.html --- On Wed, 2/13/13, ROSENBERG, YOEL (YOEL)** CTR ** wrote: Fr

Re: Multi Core / On demand loading

2013-02-14 Thread Erick Erickson
I updated this page: http://wiki.apache.org/solr/CoreAdmin, look for "transientCacheSize" and "loadOnStartup". Be aware that this is somewhat in flux, but anything you find please report! Man, oh man, do I have a lot of documentation to do on all this once the dust settles Erick On Wed, Feb

Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Erick Erickson
One data point: I can comfortably index and search the Wikipedia dump (11M articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty queries, but Erick On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro wrote: > Excellent, thank you very much for the reply! > > On Wed, Feb 13, 201

Re: Most common query

2013-02-14 Thread Erick Erickson
If I'm understanding your quetion correctly, you have to build that out yourself. Solr doesn't store the searches, nor the results. Hmm, though if you keep the Solr logs around you can reconstruct the queries from them although it takes a bit of work. The other place would be your servelet contain

get filterCache in Component

2013-02-14 Thread Markus Jelsma
Hi, We need to get the filterCache in a Component but SolrIndexSearcher.getCache(String name) does not return it. It seems the filterCache is not added to cacheMap and can therefore not be returned. SolrCache filterCache = rb.req.getSearcher().getCache("filterCache"); Will always return null.

JMX generation number is wrong

2013-02-14 Thread Aristedes Maniatis
I'm trying to monitor the state of a master-slave Solr4.1 cluster. I can easily get the generation number of the slaves using JMX like this: solr/{corename}/org.apache.solr.handler.ReplicationHandler/generation That works fine. However on the master, this number is always 1. Which makes it

Re: Index-time synonyms and trailing wildcard issue

2013-02-14 Thread Johannes Rodenwald
Hello Jack, Thanks for your answer, it helped me gaining a deeper understandig what happens at index time, and finding a solution myself: It seems that putting the synonym filter in both filter chains (index and query), setting expand="false", and putting the desired synonym first in the row,

solr 4.1 spatial with JTS - spatial query withitin a WKT polygon contained within another query ...

2013-02-14 Thread Pires, Guilherme
Hello Everyone, I've been integrating Solr 4.1 into a Web GIS solution and it's working great. I have implemented JTS within Solr 4.1 and indexed thousands of WKT polygons provided by XML document genereated by a GE's GIS Core system. Everything seems to working out great. Now I have a feature

Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread PeterKerk
On all products I have I want to implement a price range filter. Since this pricerange is applied on the entire population and not on a single product, my assumption was that it would not make sense to define this within the "shopitem" entity, but rather under the document "shopitems". So that's wh

RE: Why a phrase is getting searched against default fields in solr

2013-02-14 Thread Pragyanshis Pattanaik
Yes i did some changes with the requesthandler.I have added edismax and removed the df field specified there and Now its working as i expected. Thanks for the help ahmet. > Date: Thu, 14 Feb 2013 01:31:14 -0800 > From: iori...@yahoo.com > Subject: RE: Why a phrase is getting searched against defa

Re: Why SolrInputDocument use a LinkedHashMap

2013-02-14 Thread Andre Bois-Crettez
Almost. I did not benchmark it but tend to believe this http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html : "iteration over the collection-views of a LinkedHashMap requires time proportional to the /size/ of the map, regardless of its capacity. Iteration over a HashMap is like

Re: How-to get date of indexing process

2013-02-14 Thread Miguel
Thanks Markus I didn't know that page. It's all I need it. Thanks again El 14/02/2013 10:47, Markus Jelsma escribió: See: admin/luke?show=index or the admin UI. -Original message- From:Miguel Sent: Thu 14-Feb-2013 10:45 To: solr-user@lucene.apache.org Subject: How-to get d

Re: Solr 4.1.0 not using solrcore.properties ?

2013-02-14 Thread Daniel Rijkhof
James, I'm not completely sure, and i have not tested the following: .last_index_time might also not be accessible... Daniel On Thu, Feb 14, 2013 at 12:47 AM, Daniel Rijkhof wrote: > James, > > I debugged it until I found where things go 'wrong'. > > Apparently the current implementation Varia

RE: How-to get date of indexing process

2013-02-14 Thread Markus Jelsma
See: admin/luke?show=index or the admin UI. -Original message- > From:Miguel > Sent: Thu 14-Feb-2013 10:45 > To: solr-user@lucene.apache.org > Subject: How-to get date of indexing process > > Hi everybody > >    I am looking for the way to get date of last indexing process or comm

How-to get date of indexing process

2013-02-14 Thread Miguel
Hi everybody I am looking for the way to get date of last indexing process or commit event that it happened in my Solr server. I found a possible solution to add timestamp field , for example: || But, I would like a solution without modify the schema of Solr server. I checked statistics p

Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Gora Mohanty
On 14 February 2013 14:42, Bayu Widyasanyata wrote: > On Thu, Feb 14, 2013 at 3:53 PM, Gora Mohanty wrote: > >> 3. Depending on how you installed Solr, there should be a folder >> like webapps/solr/WEB-INF/ . In that folder, edit web.xml, and >> add and tags. The entries >> for the

RE: Why a phrase is getting searched against default fields in solr

2013-02-14 Thread Ahmet Arslan
Hi, instead of &edismax=true can you try &defType=edismax ahmet --- On Thu, 2/14/13, Pragyanshis Pattanaik wrote: > From: Pragyanshis Pattanaik > Subject: RE: Why a phrase is getting searched against default fields in solr > To: "solr Forum" > Date: Thursday, February 14, 2013, 10:21 AM > It

Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Bayu Widyasanyata
On Thu, Feb 14, 2013 at 3:53 PM, Gora Mohanty wrote: > 3. Depending on how you installed Solr, there should be a folder > like webapps/solr/WEB-INF/ . In that folder, edit web.xml, and > add and tags. The entries > for the latter should match the entries in step 1. > One thing that

Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Gora Mohanty
On 14 February 2013 14:05, Bayu Widyasanyata wrote: > Hi, > > I'm sure it's an "old" question.. > I just want protecting Admin page (/solr) with Basic Authentication. > But I can't found fine answer yet out there. > > I use Solr 4.1 with Apache Tomcat/7.0.35. [...] The easiest way to do this with

How to protect Solr 4.1 Admin page?

2013-02-14 Thread Bayu Widyasanyata
Hi, I'm sure it's an "old" question.. I just want protecting Admin page (/solr) with Basic Authentication. But I can't found fine answer yet out there. I use Solr 4.1 with Apache Tomcat/7.0.35. Could anyone give me a quick hints or links? Thanks in advance! -- wassalam, [bayu]

Re: Boost Specific Phrase

2013-02-14 Thread Ahmet Arslan
Hi Hemant, I think your use case would be useful for relevancy tuning. It could be implemented as either SearchComponent or QParserPlugin. Edismax query parser has pf2 pf3 parameters can remedy to some degree. Probably edismax extension will be best place to put it. Similar to https://issues.

RE: Why a phrase is getting searched against default fields in solr

2013-02-14 Thread Pragyanshis Pattanaik
It is returning me all the documents which contains the phrase as it is searching against Defaultfield.my default field is like below I have defined SearchableField as default field. Thanks,Pragyanshis > Date: Wed, 13 Feb 2013 23:18:06 -0800 > From: iori...@yahoo.com > Subject: Re: Why a

Re: replication problems with solr4.1

2013-02-14 Thread Amit Nithian
I may be missing something but let me go back to your original statements: 1) You build the index once per week from scratch 2) You replicate this from master to slave. My understanding of the way replication works is that it's meant to only send along files that are new and if any files named the

Re: Anyone else see this error when running unit tests?

2013-02-14 Thread Amit Nithian
Okay so I think I found a solution if you are a maven user and don't mind forcing the test codec to Lucene40 then do the following: Add this to your pom.xml under the " " section org.apache.maven.plugins maven-surefire-plugin 2.13 -Dtests.codec=Lucene40 I