Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Dileepa Jayakody
I need to index the processed token to a different feild (eg: stanbolResponse), in the same document that's being indexed. I am looking for a way to retrieve the document.id from the TokenStream so that I can update the same document with new field values. (In my sample code above I'm adding a new

SV: spellcheck solr 4.3.1

2013-11-12 Thread Daniel Borup
Hi My spell requesthandler looks like this spell default wordbreak on true 10 5 5 true true 10 5 spellcheck As you see the parameter spellcheck.maxResultsForSugge

Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Alvaro Cabrerizo
Hi, Maybe the synonym filteris the mirror you can look in. You can start creating a new field type in your schema that is stanbol enhanced. Let's follow with the parallelism, in the case of synonym we could have

SolrCloud unstable

2013-11-12 Thread Martin de Vries
Hi, We have: Solr 4.5.1 - 5 servers 36 cores, 2 shards each, 2 servers per shard (every core is on 4 servers) about 4.5 GB total data on disk per server 4GB JVM-Memory per server, 3GB average in use Zookeeper 3.3.5 - 3 servers (one shared with Solr) haproxy load balancing Our Solrcloud is ver

serialization error - BinaryResponseWriter

2013-11-12 Thread giovanni.bricc...@banzai.it
Hi, I'm getting some errors reading boolean filelds, can you give me any suggestions? in this example I only have four "false" fields: leasing=false, FiltroNovita=false, FiltroFreeShipping=false, Outlet=false. this is the stack trace (solr 4.2.1) java.lang.NumberFormatException: For input st

How we can get JSON response using CloudSolrServer

2013-11-12 Thread Dharmendra Jaiswal
I am using Solr4.4.0 version with SolrCloud (i.e. a ZooKeeper aware setup) I want to get search response in JSON using CloudSolrServer (of SolrJ API). Any pointer will be helpful. -- View this message in context: http://lucene.472066.n3.nabble.com/How-we-can-get-JSON-response-using-CloudSolrSer

Re: SolrCloud unstable

2013-11-12 Thread Yago Riveiro
Hi Martin, I have the same behaviour that you are describing with a setup that is pretty equal. 6 machines, ~50 shards with replicationFactor equal two. The most critical issue IMHO is the fact of the failover doens't work because a node is down and the other in recovery mode. In log I can se

Re: SolrCloud unstable

2013-11-12 Thread Henrik Ossipoff Hansen
Hello, I’m experiencing sort of the same issue, but with much smaller indexes - although with much higher latency on disks during backup sessions on our NFS. I have a feeling the solution could be the same, so I’ll just leave my story here just in case, no solution found yet. http://lucene.472

Re: SolrCloud unstable

2013-11-12 Thread yriveiro
Sometime ago I posted this issue http://lucene.472066.n3.nabble.com/Leader-election-fails-in-some-point-td4096514.html The link for screenshot is no longer available. When some shard fails and lost the leader I have those exceptions. - Best regards -- View this message in context: http://l

Re: Replicate Solr Cloud

2013-11-12 Thread michael.boom
You'll have to provide some more details on your problem. What do you mean by "location A and B" : 2 different machines? By default SolrCloud shards can have replicas which can be hosted on different machines. It can offer you redundancy, if one of you machines dies, your search system will still

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Andre Bois-Crettez
We are using Solr running on Tomcat. I think the top reasons for us are : - we already have nagios monitoring plugins for tomcat that trace queries ok/error, http codes / response time etc in access logs, number of threads, jvm memory usage etc - start, stop, watchdogs, logs : we also use our s

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Alvaro Cabrerizo
In my case, the selection of the servlet container has never been a hard requirement. I mean, some customers provide us a virtual machine configured with java/tomcat , others have a tomcat installed and want to share it with solr, others prefer jetty because their sysadmins are used to configure it

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Roland Everaert
In my case, the first time I had to deploy and configure solr on tomcat (and jboss) it was a requirement to reuse as much as possible the application/web server already in place. The next deployment I also use tomcat, because I was used to deploy on tomcat and I don't know jetty at all. I could as

Multi-Tenant Setup in Single Core

2013-11-12 Thread Christian Ramseyer
Hi guys I'm prototyping a multi-tenant search. I have various document sources and a tenant can potentially access subsets of any source. Also tenants have overlapping access to the sources, why I'm trying to do it in a single core. I'm doing this by labeling the source (origin, single value)

Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Erick Erickson
Whether what Alvaro outlined works for you or not, do NOT commit after every document if you use SolrJ. The commit will hurt performance much more than the HTTP overhead. And you can always batch up, say, 1,000 documents and use the server.add(doclist) method. Overall, worrying about HTTP overhea

Re: Multi-Tenant Setup in Single Core

2013-11-12 Thread Erick Erickson
When you mention velocity, you're talking about the stock Velocity Response Writer that comes with the example? Because if you're exposing the Solr http address to the world, accessing each others data is the least of your worries. To whit: http://machine:8983/solr/collection/update?commit=true&st

Re: Unit of dimension for solr field

2013-11-12 Thread eakarsu
Erick, I haven't written any SOLR plugin before so it takes time to understand concepts. This is more simpler to implement and I think this way does not need to write any plugin SOLR, isn't it? Outside process analyses values with dimensions and prepare 2 fields as you described Erol Akarsu -

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Paul Libbrecht
I personally felt Tomcat to be in a more appropriate community, that of the Apache Foundation, than Jetty. Also, jetty always has been striving for simplicity and that's really not always what you intend to when you plan an app-server. E.g. features such as the manager or mod_ajp appeared import

Re: Multi-Tenant Setup in Single Core

2013-11-12 Thread Christian Ramseyer
On 11/12/13 1:51 PM, Erick Erickson wrote: > When you mention velocity, you're talking about the stock Velocity Response > Writer that comes with the example? Because if you're exposing the Solr > http address to the world, accessing each others data is the least of your > worries. To whit: > > ht

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Doug Turnbull
As an aside, I think one reason people feel compelled to deviate from the distributed jetty distribution is because the folder is named "example". I've had to explain to a few clients that this is a bit of a misnomer. The IT dept especially sees "example" and feels uncomfortable using that as a sta

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Amit Aggarwal
Agreed with Doug On 12-Nov-2013 6:46 PM, "Doug Turnbull" wrote: > As an aside, I think one reason people feel compelled to deviate from the > distributed jetty distribution is because the folder is named "example". > I've had to explain to a few clients that this is a bit of a misnomer. The > IT

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Sebastián Ramírez
I agree with Doug, when I started I had to spend some time figuring out what was just an "example" and what I would have to change in a "production" environment... until I found that all the "example" was ready for production. Of course, you commonly have to change the settings, parameters, fields

SorlCloud recovery issue while search stress test

2013-11-12 Thread Alejandro Marqués Rodríguez
Hi, We've been experiencing some problems during search stress tests and we don't even have a clue on why is this happening. We have the following: - 3 servers - Websphere 7 - Zookeeper 3.4.5 on each server - Solr 4.5.0 on each server - 1 shard (so it is one leader and 2 replicas) - The index con

Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Jack Krupansky
Any kind of cross-field processing is best done in an update processor. There are a lot of built-in update processors as well as a JavaScript script update processor. -- Jack Krupansky -Original Message- From: Dileepa Jayakody Sent: Tuesday, November 12, 2013 1:31 AM To: solr-user@lu

Date range faceting with various gap sizes?

2013-11-12 Thread jimi.hullegard
Hi, I'm experimenting with date range faceting, and would like to use different gaps depending on how old the date is. But I am not sure on how to do that. This is what I have tried, using the java API Solrj 4.0.0 and Solr 4.1.0: solrQuery.addDateRangeFacet("scheduledate_start_tdate", date1, da

Re: How we can get JSON response using CloudSolrServer

2013-11-12 Thread Jack Krupansky
If you're using SolrJ, then the response is already in Java format, so just use a JSON package to map the Java map to JSON text. Or, just call Solr directly using HTTP and specify wt=json to get raw JSON from Solr. BUT... this sounds more like an XY problem. I mean, if the response is alread

SV: Date range faceting with various gap sizes?

2013-11-12 Thread jimi.hullegard
Directly after I sent my email, I tested using two different field names, instead of the same field name for both range facets. And then it worked. So, it seems there is a bug that can't handle multiple range facets for the same field name. A workaround is to use a copyfield to another field, an

Re: Multi-core support for indexing multiple servers

2013-11-12 Thread Robert Veliz
I have two sources/servers--one of them is Magento. Since Magento has a more or less out of the box integration with Solr, my thought was to run Solr server from the Magento instance and then use DIH to get/merge content from the other source/server. Seem feasible/appropriate? I spec'd it out a

RE: Why do people want to deploy to Tomcat?

2013-11-12 Thread Hoggarth, Gil
For me, a side-affect of 'example' is that it's just that, not appropriate for production. But also, there's the organisation factor beyond Solr that is about staff expertise - we don't have any systems that utilise jetty so we're unfamiliar with its configuration, issues, or oddities. Tomcat is

RE: Why do people want to deploy to Tomcat?

2013-11-12 Thread Henrik Ossipoff Hansen
I agree with previous statements about the ‘example’ name is putting people off. Not only that though, I believe there are still some of the official wiki pages that directly states that the shipped Jetty is not appropriate for production use, which was what made us use Tomcat for a long while (

[Spellcheck] NullPointerException on QueryComponent.mergeIds

2013-11-12 Thread Jean-Marc Desprez
Hello, I'm following this tutorial : http://wiki.apache.org/solr/SolrCloud with a SolR 4.5.0 I'm at the very first step, only two replica and two shard and I have only *one* document in the index. When I try to get a spellcheck, I have this error : java.lang.NullPointerException at org.apach

RE: [Spellcheck] NullPointerException on QueryComponent.mergeIds

2013-11-12 Thread Dyer, James
Jean-Marc, This might not solve the particular problem you're having, but to get spellcheck to work properly in a distributed enviornment, be sure to set the "shards.qt" parameter to the name of your request handler. See http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Suppor

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Dejan Caric
We're a .NET shop. We use Windows Server for both .NET code and Solr hosting. With Tomcat we can get everything up and running with a few mouse clicks (it's as simple as next, next, next...) while setting up Jetty as a windows service can be quite tricky for non-Java developers. That's the only rea

Missing Documents after db import

2013-11-12 Thread Köhler Christian
Hi! I experience a mismatch between the number of indexed documents and the number of documents actually in the solr index. I can not find any reason for this in the log files. How do I find out, why some documents are deleted from the index? Setup: Solr 4.4 using DIH fetching 1000 rows form a My

Re: Missing Documents after db import

2013-11-12 Thread Gora Mohanty
On 12 November 2013 21:29, Köhler Christian wrote: > Hi! > > I experience a mismatch between the number of indexed documents and the > number of documents actually in the solr index. I can not find any > reason for this in the log files. How do I find out, why some documents > are deleted from the

Re: SolrCloud 4.5.1 and Zookeeper SASL

2013-11-12 Thread Shawn Heisey
On 11/11/2013 11:37 PM, Sven Stark wrote: > We are testing to upgrade Solr from 4.3 to 4.5.1 . We're using SolrCloud > and our problem is that the core does not appear to be loaded anymore. > > We've set logging to DEBUG and we've found lots of those > > 2013-11-12 06:30:43,339 [pool-2-thread-1-S

Re: serialization error - BinaryResponseWriter

2013-11-12 Thread Shawn Heisey
On 11/12/2013 2:37 AM, giovanni.bricc...@banzai.it wrote: > I'm getting some errors reading boolean filelds, can you give me any > suggestions? in this example I only have four "false" fields: > leasing=false, FiltroNovita=false, FiltroFreeShipping=false, Outlet=false. > > this is the stack trace

Re: Multi-Tenant Setup in Single Core

2013-11-12 Thread Shawn Heisey
On 11/12/2013 6:13 AM, Christian Ramseyer wrote: > So I'm worried about something that uses these URL paths, say > > https://reverse-proxy/mapping-to-solr/searchui_client?qt=update&; > commit=true&stream.body=*:* Ensure that all handler names start with a slash character, so they are things like

optimization suggstions

2013-11-12 Thread Eric Katherman
Stats: default config for 4.3.1 on a high memory AWS instance using jetty. Two collections each with less than 700k docs per collection. We seem to hit some performance lags when doing large commits. Our front end service allows customers to import data which is stored in Mongo and then indexed

Re: Missing Documents after db import

2013-11-12 Thread Köhler Christian
Hi Gora, thanx for pointing me in the right direction. The problem was indeed that some ids were not unique. Regards Chris Am 12.11.2013 17:05, schrieb Gora Mohanty: > On 12 November 2013 21:29, Köhler Christian wrote: >> Hi! >> >> I experience a mismatch between the number of indexed documents

Sorting memory-efficiently by any numeric field (dates too?)

2013-11-12 Thread Erick Erickson
Before I go and pat myself on the back, what do people think about this trick? The base problem is "Is there a space-efficient way to return the top N documents, sorted by a numeric field". The numeric field includes dates. It come to me in a vision in a flash! (The Pickle Song, Arlo Guthrie). If

Re: optimization suggstions

2013-11-12 Thread Shalin Shekhar Mangar
On Tue, Nov 12, 2013 at 9:56 PM, Eric Katherman wrote: > Stats: > default config for 4.3.1 on a high memory AWS instance using jetty. > Two collections each with less than 700k docs per collection. > > We seem to hit some performance lags when doing large commits. Our front end > service allows

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Siegfried Goeschl
Hi ALex, in my case * ignorance that Tomcat is not fully supported * Tomcat configuration and operations know-how inhouse * could migrate to Jetty but need approved change request to do so Cheers, Siegfried Goeschl On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here an

Re: optimization suggstions

2013-11-12 Thread Andre Bois-Crettez
I suggest putting autoCommit at something as big as your memory allows (eg 15 minutes) to flush the update log to disk and start merging segments, but not yet visible on the search. Then at the end, send an explicit wich will both persist on disk the remainder of indexed docs and make everything

Re: Indexing a token to a different field in a custom filter

2013-11-12 Thread Dileepa Jayakody
Thanks all for your valuable inputs. I looked at suggested solutions and I too feel, a* custom update processor*during indexing will be the best solution to handle the content field by changing the value and storing it in another value. Do I only need to change the below request handler to interc

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Lukasz Salwinski
On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here and on Stack Overflow people trying to deploy Solr to Tomcat. We don't usually ask why, just help when where we can. But the question happens often enough that I am curious. What is the actual business case. Is that becau

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Lukasz Salwinski
On 11/12/2013 09:28 AM, Lukasz Salwinski wrote: On 12.11.13 04:54, Alexandre Rafalovitch wrote: Hello, I keep seeing here and on Stack Overflow people trying to deploy Solr to Tomcat. We don't usually ask why, just help when where we can. But the question happens often enough that I am curious

Re: Unit of dimension for solr field

2013-11-12 Thread Erick Erickson
Yep, doing this outside Solr at ingestion should be a simpler model if you already have an external ingestion method. Otherwise a custom update processor would be reasonably easy. Best, Erick On Tue, Nov 12, 2013 at 8:04 AM, eakarsu wrote: > Erick, > > I haven't written any SOLR plugin before

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Sujit Pal
In our case, it is because all our other applications are deployed on Tomcat and ops is familiar with the deployment process. We also had customizations that needed to go in, so we inserted our custom JAR into the solr.war's WEB-INF/lib directory, so to ops the process of deploying Solr was (almost

Re: SorlCloud recovery issue while search stress test

2013-11-12 Thread Erick Erickson
Check your Solr transaction log size. It's possible that your killed Solr is replaying transaction logs. Or synching from the current leader (perhaps by replicating the entire shard index). This is usually in the case when you're getting updates while killing the leader. Here's a writeup on tlogs

JVM tuning?

2013-11-12 Thread Scott Stults
We've been using a slightly older version of this script to start Solr in server environments: https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh The thing I especially like about it is its ability to dynamically cap memory usage, and the garbage collection log section is a grea

Re: Why do people want to deploy to Tomcat?

2013-11-12 Thread Gopal Patwa
My case is also similar to "Sujit Pal" but we have jboss6. On Tue, Nov 12, 2013 at 9:47 AM, Sujit Pal wrote: > In our case, it is because all our other applications are deployed on > Tomcat and ops is familiar with the deployment process. We also had > customizations that needed to go in, so we

Re: unable to load core after cluster restart

2013-11-12 Thread kaustubh147
Hi, So finally we got our jdk/jre upgraded to 1.6.0-33. but it didnt solve the problem. I am still seeing same write.lock error. I was able to solve the problem by changing the lock type from native to single. but I am not sure about other ramifications of this approach. Do you see any problems

RE: Sorting memory-efficiently by any numeric field (dates too?)

2013-11-12 Thread Petersen, Robert
Hi Erick, I like your idea, FWIW please also leave room for boost by function query which takes many numeric fields as input but results in a single value. I don't know if this counts as a really clever function but here's one that I currently use: {!boost b=pow(sum(log(sum(product(boosted,90

Re: Sorting memory-efficiently by any numeric field (dates too?)

2013-11-12 Thread Yonik Seeley
For a reasonable top-N, the space efficiency should still be the same as it is really just dominated by the FieldCache representation (is it in-memory or disk-docvalue based). Directly sorting on that numeric field vs deriving a score from the field and sorting on that shouldn't really be that dif

Boosting documents by categorical preferences

2013-11-12 Thread Amit Nithian
Hi all, I have a question around boosting. I wanted to use the &boost= to write a nested query that will boost a document based on categorical preferences. For a movie search for example, say that a user likes drama, comedy, and action. I could use things like qq=&q={!boost%20b=$b%20defType=edis

High disk IO during UpdateCSV

2013-11-12 Thread Utkarsh Sengar
Hello, I load data from csv to solr via UpdateCSV. There are about 50M documents with 10 columns in each document. The index size is about 15GB and I am using a 3 node distributed solr cluster. While loading the data the disk IO goes to 100%. if the load balancer in front of solr hits the machine

Re: Sorting memory-efficiently by any numeric field (dates too?)

2013-11-12 Thread Erick Erickson
Yonik: Of course I'm not really up on the details of sorting, but aren't there various control structures that are allocated for a sort but not for scoring? I'm thinking of long[maxDoc] type structures in addition to the actual values in the FieldCache. I've been thinking about docValues for this

Modify the querySearch to q=*:*

2013-11-12 Thread Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco)
Hello, I upgraded Solr to 4.4.0(previous Solr version was 3.5). After the full-import was run on Solr 4.4.0 I could see the expected number of records by accessing the URL http://myhost:7983/collection1/select/?q=*. But when I access my application I can find partial number of records. Later

Re: SolrCloud 4.5.1 and Zookeeper SASL

2013-11-12 Thread Sven Stark
Shawn, thanks for taking the time to reply. Turned out it was something entirely different. We missed to deploy newly added core.properties file. Adding them immediately fixed everything. The zookeeper debug messages apparently are there even if SASL is turned off. We just got sidetracked because

Re: Modify the querySearch to q=*:*

2013-11-12 Thread Jack Krupansky
Usually we use the “/select” handler now – check your solrconfig.xml and see if you have that as well. And check to see how “handleSelect” is set if you intend not to use the “/select” handler. Overall, read the new Solr 4.4 solrconfig file carefully and try to switch to all of the new style of

RE: Modify the querySearch to q=*:*

2013-11-12 Thread Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco)
Thanks for the quick reply. I have following lines in my SolrConfig.xml. I didn’t have luck with following Solrconfig.xml. none *:* Thanks

Re: Modify the querySearch to q=*:*

2013-11-12 Thread Jack Krupansky
Better to migrate to the 4.4 select handler conventions, as shown in the example schema and config, rather than mess around with the old style of config. Start with the new config and schema and only add in or change things you really need. But first, add the debugQuery=true parameter to see

Re: Modify the querySearch to q=*:*

2013-11-12 Thread Shawn Heisey
On 11/12/2013 6:03 PM, Abhijith Jain -X (abhijjai - DIGITAL-X INC at Cisco) wrote: I am trying to set the query to q=*:* permanently. I tried to set q=*:* in SolrConfig.xml file as follows. none *:*

Re: Sorting memory-efficiently by any numeric field (dates too?)

2013-11-12 Thread Yonik Seeley
On Tue, Nov 12, 2013 at 7:01 PM, Erick Erickson wrote: > Yonik: > > Of course I'm not really up on the details of sorting, but aren't there > various control structures that are allocated for a sort but not for > scoring? I'm thinking of long[maxDoc] type structures in addition to > the actual val

Re: Sorting memory-efficiently by any numeric field (dates too?)

2013-11-12 Thread Erick Erickson
Siiigggh. Yet another "brilliant" idea bites the dust. Thanks! Erick On Tue, Nov 12, 2013 at 8:49 PM, Yonik Seeley wrote: > On Tue, Nov 12, 2013 at 7:01 PM, Erick Erickson > wrote: > > Yonik: > > > > Of course I'm not really up on the details of sorting, but aren't there > > various control s

solrcloud - forward update to a shard failed

2013-11-12 Thread Aileen
Keep getting "forward update to shard failed" error. Not sure if someone else had seen it and is able to resolve. Here is our set-up: running solrcloud 4.3.1 on 4 hosts - 2 shards, 2 replications. Updates are from two different SolrCloud Server connections, with hard commits every 1

Re: Multi-core support for indexing multiple servers

2013-11-12 Thread Liu Bo
As far as I know about magento, it's DB schema is designed for extensible property storage and relationships between db tables are kind of complex. Product has its attribute sets and properties which are stored in different tables. Configurable product may have different attribute values for each

Re: Adding a server to an existing SOLR cloud cluster

2013-11-12 Thread Gopal Patwa
Did you try adding core.properties file in your core folder with below content and change the value for name and collection property ex: core1/core.properties content: numShards=1 name=core1 shard=shard1 collection=collection1 On Mon, Nov 11, 2013 at 8:14 AM, michael.boom wrote: > From my und

RE: distributed search is significantly slower than direct search

2013-11-12 Thread Elran Dvir
Erick, Thanks for your response. We are upgrading our system using Solr. We need to preserve old functionality. Our client displays 5K document and groups them. Is there a way to refactor code in order to improve distributed documents fetching? Thanks. -Original Message- From: Erick