Re: Clone (or Restore) Solrcloud

2014-02-03 Thread Shalin Shekhar Mangar
Hi David, The parent metadata persists only until the sub-shards become active. Actually the logic to make the sub-shards active depends on knowing when all 'sibling' sub-shards' replicas have recovered successfully. We store the parent to make that easier to look up. Once all replicas of all sub-

Apache Solr.

2014-02-03 Thread vignesh
Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Index PDF file and not able to index .Can you give me the steps to carry out PDF indexing it will be very useful. Kindly guide me through this process. Thanks & Regards. Vign

Solr and SDL Tridion Integration

2014-02-03 Thread Prasi S
Hi, I want to index sdl tridion content to solr. Can you suggest how this can be achieved. Is there any document/tutorial for this? Thanks Thanks, Prasi

Fwd: Need help for integrating solr-4.5.1 with UIMA

2014-02-03 Thread rashi gandhi
Hi, I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the solr-4.5.1\contrib\uima\readme.txt. Edited the solrconfig.xml as given in readme.txt. Also I have registered the required keys. But each time when I am indexing data , solr returns error: Feb 3, 2014 2:04:32 P

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Alexandre Rafalovitch
This is a new one. You may want to start from Tridion's list and ask about API, export or any other ways to get to the data. Then come back with more specific question once you know what it looks like and granularity of update (hook on document change vs. full export only). Regards, Alex. Pers

Re: Apache Solr.

2014-02-03 Thread Siegfried Goeschl
Hi Vignesh, a few keywords for further investigations * Solr Data Import Handler * Apache Tikka * Apache PDFBox Cheers, Siegfried Goeschl On 03.02.14 09:15, vignesh wrote: Hi Team, I am Vignesh, am using Apache Solr 3.6 and able to Index XML file and now trying to Ind

Special NGRAMish requirement

2014-02-03 Thread Lochschmied, Alexander
Hi, we need to use something very similar to EdgeNGram (minGramSize="1" maxGramSize="50" side="front"). The only thing missing is that we would like to reduce the number of matches. The request we need to implement is returning only those matches with the longest tokens (or terms if that is the

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Jack Krupansky
If SDL Tridion can export to CSV format, Solr can then import from CSV format. Otherwise, you may have to write a custom script or even maybe Java code to read from SDL Tridion and output a supported Solr format, such as Solr XML, Solr JSON, or CSV. -- Jack Krupansky -Original Message--

weird exception on update

2014-02-03 Thread Dmitry Kan
Hello! We are hitting a really strange and nasty issue when trying to delete by query and not when just adding documents. The exception says: http://pastebin.com/B1x5dAF7 Any ideas as to what is going on? The delete by query is referencing the unique field. The core's index does not contain the

Score of Search Term for every character remove

2014-02-03 Thread Lusung, Abner
Hi, I'm new with using SOLR and I'm curious if this is capable of doing the following or similar. Sample: Query: "ABCDEF" Returns: ABCDEF > 0 hits ABCDE > 2 hits ABCD > 3 hits ABC > 10 hits AB > 20 hits A > 100 hits In one request only. Thanks. Abner G. Lusung Jr.| Java Web Development, Inte

Re: Import data from mysql to sold

2014-02-03 Thread Alexei Martchenko
I've been using DIH to import large Databases to XML file batches and It's blazing fast. alexei martchenko Facebook | Linkedin| Steam | 4sq

Re: Geospatial clustering + zoom in/out help

2014-02-03 Thread Bojan Šmid
Hi David, I was hoping to get an answer on Geospatial topic from you :). These links basically confirm that approach I wanted to take should work ok with similar (or even bigger) amount of data than I plan to have. Instead of my custom NxM division of world, I'll try existing GeoHash encoding, i

Re: Apache Solr.

2014-02-03 Thread Alexei Martchenko
That's right, Solr doesn't import PDFs as it imports XMLs. You'll need to use Tikka to import binary/specific file types. http://tika.apache.org/1.4/formats.html alexei martchenko Facebook | Linkedin| Steam

Re: Apache Solr.

2014-02-03 Thread Jack Krupansky
PDF files can be directly imported into Solr using Solr Cell (AKA ExtractingRequestHandler). See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika Internally, Solr Cell uses Tika, which in turn uses PDFBox. -- Jack Krupansky -Original Messag

Announce list

2014-02-03 Thread Arie Zilberstein
Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie

Writing a customize updateRequestHandler

2014-02-03 Thread neerajp
Hi, I want to write a custom updateRequestHandler. Can you pl.s guide me the steps I need to perform for that ? -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-customize-updateRequestHandler-tp4115059.html Sent from the Solr - User mailing list archive at Nabble.co

Re: weird exception on update

2014-02-03 Thread Dmitry Kan
This exception is similar to what is talked about here: https://gist.github.com/mbklein/6367133 http://irc.projecthydra.org/2013-08-28.html We found out that: 1. this happens iff on two cores inside the same container there is a query parser defined via defType. 2. After removing index files on o

Re: Writing a customize updateRequestHandler

2014-02-03 Thread Jorge Luis Betancourt Gonzalez
In the book Apache Solr Beginner’s Guide there is a section dedicated to write new Solr plugins, perhaps it would be a good place to start, also in the wiki there is a page about this, but the it’s a light introduction. I’ve found that a very good starting point it’s just browse throw the code o

Strange Error Message while Full Import

2014-02-03 Thread Peter Sch�tt
Hallo, when I do a full import of a SOLR index I become a strange error message: org.apache.solr.handler.dataimport.DataImportHandlerException: java.sql.SQLRecoverableException: Closed Resultset: next It is only a simple query select FIRMEN_ID, FIRMIERUNG, FIRMENKENNUNG, PZN, DEBITORNUMMER,

Re: Announce list

2014-02-03 Thread Alexandre Rafalovitch
I don't think so. What would be the value? Would you be upgrading every 6-8 weeks as the new versions come out? Or are you downstream of Solr and want to check compatibility? Curious what the use case would be. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://w

Re: Announce list

2014-02-03 Thread Lajos
There's always http://projects.apache.org/feeds/rss.xml. L On 03/02/2014 14:59, Arie Zilberstein wrote: Hi, Is there a mailing list for getting just announcements about new versions? Thanks, Arie

Re: weird exception on update

2014-02-03 Thread Dmitry Kan
The solution (or workaround?) is to drop the defType from one of the cores and use {!qparser} local param on every query, including the delete by query. It would be really great, if this could be handled on the solr config side only without involving the client changes. On Mon, Feb 3, 2014 at 4

Re: shard1 gone missing ... (upgrade to 4.6.1)

2014-02-03 Thread David Santamauro
Mark, I am testing the upgrade and indexing gives me this error: 914379 [http-apr-8080-exec-4] ERROR org.apache.solr.core.SolrCore ? org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe0 (at char #1, byte #-1) ... and a bunch of these request: http://xx.xx.xx.xx/col1/update

SolrCloud query results order master vs replica

2014-02-03 Thread M. Flatterie
Greetings, My setup is: - SolrCloud V4.3 - On collection - one shard - 1 master, 1 replica so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger). My que

Elevation and nested queries

2014-02-03 Thread Holger Rieß
I have a simple query 'q=hurco' (parser type edismax). Elevation is properly configured, so I get the expected results: ... 7HURCO 0~* true A similar query with a nested query 'q=(hurco AND _query_:"{!field f=debtoritem v=0~*}")' returns the same document but without el

Re: Need help for integrating solr-4.5.1 with UIMA

2014-02-03 Thread Luca Foppiano
On Mon, Feb 3, 2014 at 10:20 AM, rashi gandhi wrote: > Hi, > > Hi, > I'm trying to integrate Solr 4.5.1 with UIMA and following the steps of the > solr-4.5.1\contrib\uima\readme.txt. > > Edited the solrconfig.xml as given in readme.txt. Also I have registered > the required keys. > [...] >

Re: SolrCloudServer questions

2014-02-03 Thread Greg Walters
I've seen best throughput while indexing by sending in batches of documents rather than individual documents per request. You might try queueing on your indexing machines for a bit then sending off a batch every N documents. Thanks, Greg On Feb 1, 2014, at 6:49 PM, Software Dev wrote: > Also,

Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread William Bell
If we add : facet.field=prac_spec_heir&facet.field=prac_spec_heir we get it twice in the results. This breaks deserialization on wt=json since you cannot have the same name twice Thoughts? Seems like a new bug in 4.6 ? "facet.field": ["prac_spec_heir","all_proc_name_code","all_cond_name_co

Re: need help in understating solr cloud stats data

2014-02-03 Thread Greg Walters
I've had some issues monitoring Solr with the per-core mbeans and ended up writing a custom "request handler" that gets loaded then registers itself as an mbean. When called it polls all the per-core mbeans then adds or averages them where appropriate before returning the requested value. I'm no

Re: need help in understating solr cloud stats data

2014-02-03 Thread Mark Miller
You should contribute that and spread the dev load with others :) We need something like that at some point, it’s just no one has done it. We currently expect you to aggregate in the monitoring layer and it’s a lot to ask IMO. - Mark http://about.me/markrmiller On Feb 3, 2014, at 10:49 AM, Gr

Re: need help in understating solr cloud stats data

2014-02-03 Thread Greg Walters
The code I wrote is currently a bit of an ugly hack so I'm a bit reluctant to share it and there's some legal concerns with open-sourcing code within my company. That being said, I wouldn't mind rewriting it on my own time. Where can I find a starter kit for contributors with coding guidelines a

SolrCloud multiple data center support

2014-02-03 Thread Darrell Burgan
Hello, we are using Solr in a SolrCloud configuration, with two Solr instances running with three Zookeepers in a single data center. We presently have a single search index with about 35 million entries in it, about 60GB disk space on each of the two Solr servers (120GB total). I would expect o

Re: Announce list

2014-02-03 Thread Daniel Collins
I have seen other projects that have a releases mailing list, the only use cases I can think of are: 1) users who want notifications about new releases, but don't want the "flood" of the full user-list. 2) historical searching to see how often releases were made. Given there isn't an official tim

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Chris Warner
There are many ways to do this, Prasi. You have a lot of thinking to do on the subject. You could decide to publish your content to database, and then index that database in Solr. You could publish XML or CSV files of your content for Solr to read and index. You could use nutch or some other t

[ANN] Heliosearch 0.03 with off-heap field cache

2014-02-03 Thread Yonik Seeley
A new Heliosearch pre-release has been cut for people to try out: https://github.com/Heliosearch/heliosearch/releases Release Notes: - This is Heliosearch v0.03 Heliosearch is forked from Apache Solr and includ

Re: Apache Solr.

2014-02-03 Thread solr2020
You can have this kind of configuration in Data import handler xml file to index different type of files. Hope this helps. -- View this message in context: http://lucene.472066.n3

RE: JVM heap constraints and garbage collection

2014-02-03 Thread Michael Della Bitta
> i2.xlarge looks vastly better than m2.2xlarge at about the same price, so I must be missing something: Is it the 120 IPs that explains why anyone would choose m2.2xlarge? i2.xlarge is a relatively new instance type (December 2013). In our case, we're partway through a yearlong reservation of m2.

Getting index schema in SolrCloud mode

2014-02-03 Thread Peter Keegan
I'm indexing data with a SolrJ client via SolrServer. Currently, I parse the schema returned from a HttpGet on: localhost:8983/solr/collection/schema/fields What is the recommended way to read the schema with CloudSolrServer? Can it be done with a single HttpGet to a ZK server? Thanks, Peter

Re: SolrCloud multiple data center support

2014-02-03 Thread Mark Miller
SolrCloud has not tackled multi data center yet. I don’t think a or b are very good options yet. Honestly, I think the best current bet is to use something like Apache Flume to send data to both data centers - it will handle retries and keeping things in sync and splitting the stream. Doesn’t s

Re: need help in understating solr cloud stats data

2014-02-03 Thread Joel Cohen
I had to come up with some Solr stats monitoring for my Zabbix instance. I found that using JMX was the easiest way for us. There is a command line jmx client that works quite well for me. http://crawler.archive.org/cmdline-jmxclient/ I wrote a shell script to wrap around that and shove the data

Re: need help in understating solr cloud stats data

2014-02-03 Thread David Santamauro
Zabbix 2.2 has a jmx client built in as well as a few JVM templates. I wrote my own templates for my solr instance and monitoring and graphing is wonderful. David On 02/03/2014 12:55 PM, Joel Cohen wrote: I had to come up with some Solr stats monitoring for my Zabbix instance. I found that

Re: Announce list

2014-02-03 Thread Chris Hostetter
: Is there a mailing list for getting just announcements about new versions? This is the primary usecase for the "general" list, although it does occasionally get other traffic from people with questions/discussion about the project as a whole... https://lucene.apache.org/solr/discussion.html#

Solr and Polygon/Radius based spatial searches

2014-02-03 Thread leevduhl
We have a public property search site that we are looking to replace the back end index server on and we are looking at Solr as a possible replacement (ElasticSearch is another possibility). One of the key search components of out site is to search on a bounding box (rectangle), custom multi-point

Re: Score of Search Term for every character remove

2014-02-03 Thread Erick Erickson
Maybe edgeNgram tokenizer? You haven't told us what the fields in the docs you care about are Best, Erick On Mon, Feb 3, 2014 at 4:48 AM, Lusung, Abner wrote: > Hi, > > > > I'm new with using SOLR and I'm curious if this is capable of doing the > following or similar. > > > > Sample: > > Q

Re: Score of Search Term for every character remove

2014-02-03 Thread Jack Krupansky
I think he want to do a bunch of separate queries and return separate result sets for each. Hmmm... maybe it would be nice to allow multiple "q" parameters in one query request, each returning a separate set of results. -- Jack Krupansky -Original Message- From: Erick Erickson Sent

Re: SolrCloud query results order master vs replica

2014-02-03 Thread Erick Erickson
This should only be happening if the scores are _exactly_ the same, which is actually quite rare. In that case, the tied scores are broken by the internal Lucene document ID, and the relative order of the docs on the two machines isn't guaranteed to be the same, the internal ID can change during se

Re: need help in understating solr cloud stats data

2014-02-03 Thread Erick Erickson
See: http://wiki.apache.org/solr/HowToContribute It outlines how to get the code, how to work with patches, how to set up IntelliJ and Eclipse IDEs (links near the bottom?). There are formatting files for both IntelliJ and Eclipse that'll do the right thing in terms of indents and such. Legal is

Re: Not finding part of fulltext field when word ends in dot

2014-02-03 Thread Thomas Michael Engelke
That was a complicated answer, but ultimately the right one. Thank you very much. 2014-01-30 Jack Krupansky : > The word delimiter filter will turn 26KA into two tokens, as if you had > written "26 KA" without the quotes. The autoGeneratePhraseQueries option > will cause the multiple terms to be

Re: Solr and Polygon/Radius based spatial searches

2014-02-03 Thread Smiley, David W.
Hi Lee, On 2/3/14, 1:59 PM, "leevduhl" wrote: >We have a public property search site that we are looking to replace the >back >end index server on and we are looking at Solr as a possible replacement >(ElasticSearch is another possibility). Both should work equally well. > >One of the key sear

Re: SolrCloud multiple data center support

2014-02-03 Thread Daniel Collins
Option a) doesn't really work out of the box, *if you need NRT support*. The main reason (for us at least) is the ZK ensemble and maintaining quorum. If you have a single ensemble, say 3 ZKs in 1 DC and 2 in another, then if you lose DC 2, you lose 2 ZKs and the rest are fine. But if you lose the

Adding HTTP Request Header in SolrJ

2014-02-03 Thread Andrew Doyle
Our web services are using PKI authentication so we have a user DN, however we're querying an external Solr which is managed via a proxy which is expecting our server DN proxying the user DN. My question is, how do we add an HTTP header to the request being made by SolrJ? I looked through the sour

Re: Adding HTTP Request Header in SolrJ

2014-02-03 Thread Shawn Heisey
On 2/3/2014 3:40 PM, Andrew Doyle wrote: Our web services are using PKI authentication so we have a user DN, however we're querying an external Solr which is managed via a proxy which is expecting our server DN proxying the user DN. My question is, how do we add an HTTP header to the request bein

Re: Special NGRAMish requirement

2014-02-03 Thread Otis Gospodnetic
Hi, Can you provide an example, Alexander? Otis Solr & ElasticSearch Support http://sematext.com/ On Feb 3, 2014 5:28 AM, "Lochschmied, Alexander" < alexander.lochschm...@vishay.com> wrote: > Hi, > > we need to use something very similar to EdgeNGram (minGramSize="1" > maxGramSize="50" side="fro

Re: how to write an efficient query with a subquery to restrict the search space?

2014-02-03 Thread Otis Gospodnetic
Hi, Sounds like a possible document and query routing use case. Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 31, 2014 7:11 AM, "svante karlsson" wrote: > It seems to be faster to first restrict the search space and then do the > scoring compared to just use the full query and l

Re: Adding DocValues in an existing field

2014-02-03 Thread Otis Gospodnetic
Hi, You can change the field definition and then reindex. Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 30, 2014 1:12 PM, "yriveiro" wrote: > Hi, > > Can I add to an existing field the docvalue feature without wipe the > actual? > > The modification on the schema will be somethi

Re: need help in understating solr cloud stats data

2014-02-03 Thread Otis Gospodnetic
Hi, Oh, I just saw Greg's email on dev@ about this. IMHO aggregating in the search engine is not the way to do. Leave that to external tools, which are likely to be more flexible when it comes to this. For example, our SPM for Solr can do all kinds of aggregations and filtering by a number of So

Re: Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread Otis Gospodnetic
Hi, Don't know if this is old or new problem, but it does feel like a bug to me. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Feb 3, 2014 at 10:48 AM, William Bell wrote: > If we add : > > facet.field=prac_spec_h

Re: Duplicate Facet.FIelds cause same results, should dedupe?

2014-02-03 Thread William Bell
THis is in 4.6.1. On Mon, Feb 3, 2014 at 9:11 PM, Otis Gospodnetic wrote: > Hi, > > Don't know if this is old or new problem, but it does feel like a bug to > me. > > Otis > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/ > >

Re: Solr and SDL Tridion Integration

2014-02-03 Thread Prasi S
Thanks a lot for the options. Our site has dynamic content as well. I would look into what best suits. Thanks, Prasi On Mon, Feb 3, 2014 at 10:34 PM, Chris Warner wrote: > There are many ways to do this, Prasi. You have a lot of thinking to do on > the subject. > > You could decide to publish y

Solr ranking query..

2014-02-03 Thread Chris
Hi, I have a document structure that looks like the below. I would like to implement something like - (urlKeywords:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^60 " + "OR (title:"+keyword+" AND domainRank:[3 TO 1] AND adultFlag:N)^20 " + "OR (title:"+keyword+" AND domainRank:[1