In-memory collections?

2013-08-06 Thread Per Steffensen
Hi Is there a way I can configure Solrs so that it handles its shared completely in memory? If yes, how? No writing to disk - neither transactionlog nor lucene indices. Of course I accept that data is lost if the Solr crash or is shut down. Regards, Per Steffensen

Re: Collection - loadOnStartup

2013-08-06 Thread Srivatsan
Thanks Erick -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4082938.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr MaxCollections

2013-08-06 Thread Srivatsan
Thanks Jack -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-MaxCollections-tp4082772p4082937.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr list all records but fq matching records first

2013-08-06 Thread Thyagaraj
I verified, the code is proper, I just highlighted with bold few things. Below I have pasted it again, Method 1 SolrQuery query = new SolrQuery().setStart(first).setRows( searchCommand.getRowsPerPage()); //setting query query.setQuery("*"); //setting spati

Re: Measuring SOLR performance

2013-08-06 Thread Roman Chyla
Hi Dmitry, I've modified the solrjmeter to retrieve data from under the core (the -t parameter) and the rest from the /solr/admin - I could test it only against 4.0, but it is there the same as 4.3 - it seems...so you can try the fresh checkout my test was: python solrjmeter.py -a -x ./jmx/SolrQu

Re: Problems with distributed MoreLikeThis

2013-08-06 Thread manju16832003
I'm not sure about the root cause in your case. However one thing to remember while MLT is that, *MLT does not work with integer fields*. In your case if 'catchall' is copyField and if you are trying to copy any integer values verify it again :-). Thanks -- View this message in context: http

Re: entity classification solr

2013-08-06 Thread manju16832003
Can you provide sample structure of the document with entities, how does the document look like?. As far as I can assume, you do not need to apply any filters. If you are entities are searchable include them in the fulltext or keyword research. Is your entities are part of the document and are they

Solr design. Choose Cores or Shards?

2013-08-06 Thread manju16832003
Hi, I have a confusion over choosing Cores or Shards for the project scenario. My scenario is as follows I have three entities 1. Customers 2. Product Info 3. Listings [Contains all the listings posted by customer based on product] I'm planning to design Solr structure for the above scenario l

entity classification solr

2013-08-06 Thread smanad
I have the following situation when using Solr 4.3. My document contains "entities" for example "peanut butter". I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat "peanut butter

Re: Transform data at index time: country -> continent

2013-08-06 Thread Jack Krupansky
I've implemented a JavaScript script for the StatelessScriptUpdate processor that does country code to continent code mapping. It will appear in the next early access of my "Solr 4.x Deep Dive" book (on 8/16.) One interesting issue: These countries that span continents - Turkey and Russia and

Re: Schema Lint

2013-08-06 Thread Alexandre Rafalovitch
Funny, you should ask. Here are the relevant suggestions from the Solr Usability contest that is going right now: *) https://solrstart.uservoice.com/forums/216001-usability-contest/suggestions/4249791-solr-lint-a-tool-to-check-solr-configuration-and *) https://solrstart.uservoice.com/forums/216001-

Re: Unexpected behavior when sorting groups

2013-08-06 Thread Paul Masurel
Here is some detail about how grouping is implemented in Solr. http://fulmicoton.com/posts/grouping-in-solr/ On Mon, Aug 5, 2013 at 2:42 AM, Tony Paloma wrote: > Thanks Paul. That's helpful. I'm not familiar with the concept of custom > caches. Would this be custom Java code or something defin

Re: problems running solr 4.4 with HDFS HA

2013-08-06 Thread Mark Miller
On Aug 6, 2013, at 3:15 PM, Greg Walters wrote: > -Dsolr.hdfs.confdir=/etc/hadoop/conf.cloudera.hdfs1 Have you set that up in the directoryFactory section of solrconfig.xml? Make sure you have something like: ${solr.hdfs.home:} ${solr.hdfs.confdir:} - Mark

RE: external zookeeper with SolrCloud

2013-08-06 Thread Joshi, Shital
Machines are definitely up. Solr4 node and zookeeper instance share the machine. We're using -DzkHost=zk1,zk2,zk3,zk4,zk5 to let solr nodes know about the zk instances. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 06, 2013 5:03 PM To:

Re: Schema Lint

2013-08-06 Thread Andy Lester
On Aug 6, 2013, at 9:55 AM, Steven Bower wrote: > Is there an easy way in code / command line to lint a solr config (or even > just a solr schema)? No, there's not. I would love there to be one, especially for the DIH. -- Andy Lester => a...@petdance.com => www.petdance.com => AIM:petdance

Re: external zookeeper with SolrCloud

2013-08-06 Thread Erick Erickson
First off, even 6 ZK instances are overkill, vast overkill. 3 should be more than enough. That aside, however, how are you letting your Solr nodes know about the zk machines? Is it possible you've pointed some of your Solr nodes at specific ZK machines that aren't up when you have this problem? I.

RE: external zookeeper with SolrCloud

2013-08-06 Thread Joshi, Shital
Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances. With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin p

Re: SolrCloud Indexing question

2013-08-06 Thread Shawn Heisey
On 8/6/2013 12:55 PM, Kalyan Kuram wrote: Hi AllI need suggestion on how to send indexing commands to 2 different solr server,Basically i want to mirror my index,here is the scenarioi have 2 cluster, each cluster has one master and 2 slaves with external zookeeper in the fronti need suggestion

Problems with distributed MoreLikeThis

2013-08-06 Thread Shawn Heisey
I'm having some problems with distributed MLT. On 4.4, it seems completely broken. Searches that work on 4.2.1 return an exception on 4.4.0. This stackoverflow post shows the EarlyTerminatingCollectorException I'm getting: http://stackoverflow.com/questions/17866313/earlyterminatingcollecto

problems running solr 4.4 with HDFS HA

2013-08-06 Thread Greg Walters
Good day, I've been working to test Solr 4.4 in our dev environment with the HDFS integration that was just announced and am having some issues getting NameNode HA to work. To start off with I had to change out all of the Hadoop jars in WEB-INF/lib/ with the matching jars from our Hadoop distri

SolrCloud Indexing question

2013-08-06 Thread Kalyan Kuram
Hi AllI need suggestion on how to send indexing commands to 2 different solr server,Basically i want to mirror my index,here is the scenarioi have 2 cluster, each cluster has one master and 2 slaves with external zookeeper in the fronti need suggestion on what solr api class i should use to send

Re: Suggest aka "autocomplete" request handler with solr 4.4

2013-08-06 Thread Utkarsh Sengar
Jack/Chris, 1. This is my complete schema.xml: https://gist.github.com/utkarsh2012/6167128/raw/1d5ac6520b666435cd040b5cc6dcb434cdfd7925/schema.xml More specifically, allText is of type: text_general which has a LowerCaseFatcory during index time. 2. allText has values: http://solr_server/solr/pro

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Raymond Wiker
One option might be to run two queries with fq set to +name:"whatever phrase" and +comment:"whatever phrase". The query results may then be annotated and merged (assuming that the hit scores only depend on the main query and the document content - i.e, no normalization, and no score contribution

Re: Transform data at index time: country -> continent

2013-08-06 Thread Walter Underwood
Would synonyms help? If you generate the query terms for the continents, you could do something like this: usa => continent-na canada => continent-na germany => continent-europe und so weiter. wunder On Aug 6, 2013, at 2:18 AM, Christian Köhler - ZFMK wrote: > Am 05.08.2013 15:52, schrieb Jac

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Jeff Wartes
For what it's worth, I had the same question last year, and I never really got a good solution: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3C81 e9a7879c550b42a767f0b86b2b81591a15b...@ex4.corp.w3data.com%3E I dug into the highlight component for a while, but it turned

TermRangeTermsEnum usage and performance

2013-08-06 Thread Chet Vora
Hi I have an index consisting of a double value that can range between certain values and an associated tag. I am trying to find all the docs which match a certain tag (or combination of tags) and a certain range. I'm trying to use the TermRangeTermsEnum from the Flex API as part of a custom pars

Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
To maybe answer another one of my questions about the 50Gb recovered when running: curl ' http://localhost:8983/solr/update?optimize=true&maxSegments=10&waitFlush=false ' It looks to me that it was from deleted docs being completely removed from the index. Thanks On Tue, Aug 6, 2013 at 11:45

Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Well, I guess I can answer one of my questions which I didn't exactly explicitly state, which is: how do I force solr to merge segments to a given maximum. I forgot about doing this: curl ' http://localhost:8983/solr/update?optimize=true&maxSegments=10&waitFlush=false ' which reduced the number o

Re: Solr 4.4 and Google Protobuf

2013-08-06 Thread Shawn Heisey
On 8/6/2013 8:37 AM, Guido Medina wrote: I saw inside the solr.war file there is a protobuf version 2.4.0a, I have two questions about it: 1. Where does Solr uses protobuf? And is it better than HTTP? 2. Why is it such an old version if protobuf recommended versions are 2.4.1 and 2.5.0 - 2.5

Re: Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
I don't see how it is sorted. this is the order as displayed above 1-> BOM Total test2 2-> BOM Total test - Copy 3-> BOM Total test2 all in the same 2.2388418 score On Tue, Aug 6, 2013 at 5:28 PM, Jack Krupansky wrote: > The Name field is sorted as you have requested - "desc". I suspect that

Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-06 Thread Spadez
Thank you very much -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-Postgres-and-Mysql-JDBC-drivers-to-Solr-tp4082806p4082832.html Sent from the Solr - User mailing list archive at Nabble.com.

Schema Lint

2013-08-06 Thread Steven Bower
Is there an easy way in code / command line to lint a solr config (or even just a solr schema)? Steve

'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Hi All, First of all, what I was actually trying to do is actually get a little space back. So if there is a better way to do this by adjusting the MergePolicy or something else please let me know. My index is currently 200Gb. In the past (Solr 1.4) we've found that optimizing the index will doubl

Re: How to plan field boosting

2013-08-06 Thread Jack Krupansky
Mostly guessing and trial and error - and eventually experience - unless you are able to do tf-idf similarity math in your head! You can look at the "explain" section of the output of the debugQuery=true parameter and work through the math yourself as well. Look at the final scores of document

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Jack Krupansky
Add the debugQuery=true parameter and the "explain"section will detail exactly what terms matched for each document. You could also use the Solr term sectors component to get info on what terms occur where in a document, but that adds more overhead to the index for "stored term vectors". --

Solr 4.4 and Google Protobuf

2013-08-06 Thread Guido Medina
Hi, I saw inside the solr.war file there is a protobuf version 2.4.0a, I have two questions about it: 1. Where does Solr uses protobuf? And is it better than HTTP? 2. Why is it such an old version if protobuf recommended versions are 2.4.1 and 2.5.0 - 2.5.0 has an extra 25% performance over

Re: Multiple sorting does not work as expected

2013-08-06 Thread Jack Krupansky
The Name field is sorted as you have requested - "desc". I suspect that you wanted name to be sorted "asc" (natural order.) -- Jack Krupansky -Original Message- From: Mysurf Mail Sent: Tuesday, August 06, 2013 10:22 AM To: solr-user@lucene.apache.org Subject: Re: Multiple sorting does

Help importing xml file as raw xml

2013-08-06 Thread jimtronic
Hi, I found a few threads out there dealing with this problem, but there didn't really seem to be much detail to the solution. I have large xml files (500M to 2+ G) with a complex nested structure. It's impossible for me to import the exact structure into a solr representation, and, honestly, I d

Re: Solr MaxCollections

2013-08-06 Thread Jack Krupansky
Although there is no hard limit or published guidelines, I would say that you should try to limit your number of collections per cluster to "dozens" or no more than 100. More than that and you are in uncharted territory. If it works for you, fine, but if it doesn't please don’t complain. But...

Re: Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
my schema On Tue, Aug 6, 2013 at 5:06 PM, Mysurf Mail wrote: > My documents has 2 indexed attribute - name (string) and version (number) > I want within the same score the documents will be displayed by the > following order > > score(desc),name(desc),version(desc) > > Therefor

Spellchecker suggests Tokens

2013-08-06 Thread Snubbel
Hello, I have a problem getting stated with SolrDirectSpellChecker. I use NGramFilterFactory to index and query for strings of length greater than 3. So, if I index the word "aQuiteLongWord" I can search for "long" and get the result. Now I'm adding the DirectSolrSpellChecker. And when searching

Multiple sorting does not work as expected

2013-08-06 Thread Mysurf Mail
My documents has 2 indexed attribute - name (string) and version (number) I want within the same score the documents will be displayed by the following order score(desc),name(desc),version(desc) Therefor I query using : http://localhost:8983/solr/vault/select? q=BOM&fl=*:score&

Re: Measuring SOLR performance

2013-08-06 Thread Dmitry Kan
Hi, Thanks for the clarification, Shawn! So with this in mind, the following work: http://localhost:8983/solr/statements/admin/system?wt=json http://localhost:8983/solr/statements/admin/mbeans?wt=json not copying their output to save space. Roman: is this something that should be set via -t p

Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-06 Thread Shawn Heisey
On 8/6/2013 6:15 AM, Spadez wrote: > Hi, > > I am running Solr4 on Jetty9 and I am trying to include the JDBC drivers for > both MySQL and PostgreSQL. I'm a little confused about how I do this. > > I beleive these to be the two files I need: > http://cdn.mysql.com/Downloads/Connector-J/mysql-conn

Re: Measuring SOLR performance

2013-08-06 Thread Shawn Heisey
On 8/6/2013 6:17 AM, Dmitry Kan wrote: > Of three URLs you asked for, only the 3rd one gave response: > The rest report 404. > > On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla wrote: > >> Hi Dmitry, >> So I think the admin pages are different on your version of solr, what do >> you see when you re

Re: Measuring SOLR performance

2013-08-06 Thread Dmitry Kan
Hi Roman, With fresh checkout, the reported admin_endpoint is: http://localhost:8983/solr/admin. This url redirects to http://localhost:8983/solr/#/ . I'm using solr 4.3.1. Is your tool supporting this version? Of three URLs you asked for, only the 3rd one gave response: {"responseHeader":{"stat

Adding Postgres and Mysql JDBC drivers to Solr

2013-08-06 Thread Spadez
Hi, I am running Solr4 on Jetty9 and I am trying to include the JDBC drivers for both MySQL and PostgreSQL. I'm a little confused about how I do this. I beleive these to be the two files I need: http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.26.tar.gz http://jdbc.postgresql.o

Re: Unexpected behavior when sorting groups

2013-08-06 Thread Paul Masurel
On Mon, Aug 5, 2013 at 2:42 AM, Tony Paloma wrote: > Thanks Paul. That's helpful. I'm not familiar with the concept of custom > caches. Would this be custom Java code or something defined in the > config/schema? Can you point me to some documentation? > > My solution requires both writing custom

Re: Collection - loadOnStartup

2013-08-06 Thread Erick Erickson
I don't think you can, really. Collections, at this point, is more geared towards SolrCloud. The idea of lazy loading matched with SolrCloud makes my head hurt. I'm afraid for the nonce you'll have to individually edit the solr.xml or core.properties files on the nodes once the collections are cre

Re: Customize Velocity Output, Utility Class or Custom Tool

2013-08-06 Thread Erick Erickson
_Everyone_ is qualified to submit a patch, it just takes some additional karma to be able to commit it to the code line. So please do create and attach any patch you'd like to a JIRA! Best Erick On Mon, Aug 5, 2013 at 4:39 PM, O. Olson wrote: > Thank you very much *Erik*. At this point I hav

Re: Transform data at index time: country -> continent

2013-08-06 Thread Christian Köhler - ZFMK
Hi, Am 06.08.2013 12:56, schrieb Raymond Wiker: Another option might be to use a pre-existing web service... it should be relatively easy to add that to your dataimporthandler configuration (if you're using DIH, that is :-) A quick google search gave me http://www.geonames.org; see http://www.g

Re: Transform data at index time: country -> continent

2013-08-06 Thread Raymond Wiker
Another option might be to use a pre-existing web service... it should be relatively easy to add that to your dataimporthandler configuration (if you're using DIH, that is :-) A quick google search gave me http://www.geonames.org; see http://www.geonames.org/export/ for API information. On Tue,

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Mysurf Mail
But what if this for multiple words ? I am guessing solr knows why the document is there since I get to see the paragraph in the highlight.(hl) section. On Tue, Aug 6, 2013 at 11:36 AM, Raymond Wiker wrote: > If you were searching for single words (terms), you could use the 'tf' > function, by

Solr MaxCollections

2013-08-06 Thread Srivatsan
Hi, I am using solr4.3 for my search application with apache zookeeper 3.4.5 . I came across limit of znode size of zookeeper. Default is 1 MB rite? I have read one article that size of znode reaches 1MB with just 1000 collections. Is it so? . And is it preferable to increase the znode size to s

Re: Transform data at index time: country -> continent

2013-08-06 Thread Christian Köhler - ZFMK
Am 05.08.2013 15:52, schrieb Jack Krupansky: You can write a brute force JavaScript script using the StatelessScript update processor that hard-codes the mapping. I'll probably do something like this. Unfortunately I have no influence on the original db itself, so I have fix this in solr. Chee

How to plan field boosting

2013-08-06 Thread Mysurf Mail
I query using qf=Name+Tag Now I want that documents that have the phrase in tag will arrive first so I use qf=Name+Tag^2 and they do appear first. What should be the rule of thumb regarding the number that comes after the field? How do I know what number to set it?

Re: Knowing what field caused the retrival of the document

2013-08-06 Thread Raymond Wiker
If you were searching for single words (terms), you could use the 'tf' function, by adding something like matchesinname:tf(name, "whatever") to the 'fl' parameter - if the 'name' field contains "whatever", the (result) field 'matchesinname' will be 1. On Tue, Aug 6, 2013 at 10:24 AM, Mysurf M

Knowing what field caused the retrival of the document

2013-08-06 Thread Mysurf Mail
I have two indexed fields in my document.- Name, Comment. The user searches for a phrase and I need to act differently if it appeared in the comment or the name. Is there a way to know why the document was retrieved? Thanks.

Re: solr - using fq parameter does not retrieve an answer

2013-08-06 Thread Mysurf Mail
Thanks. On Mon, Aug 5, 2013 at 4:57 PM, Shawn Heisey wrote: > On 8/5/2013 2:35 AM, Mysurf Mail wrote: > > When I query using > > > > http://localhost:8983/solr/vault/select?q=*:* > > > > I get reuslts including the following > > > > > > ... > > ... > > 7 > > ... > > > > > > Now I try

Re: Encountered invalid class name

2013-08-06 Thread anpm1989
It's right, i have the same idea with you after checking ServiceLoaderProcessor, it just warning and the className is added to the list. Thank very much, Artem Best wishes to you, An Pham Minh -- View this message in context: http://lucene.472066.n3.nabble.com/Encountered-invalid-class-name-t

Re: Encountered invalid class name

2013-08-06 Thread Artem Karpenko
I'm not JBoss expert, but I'm pretty sure it should work fine. The validator throws warnings, that's true. But it looks like those warnings do not influence process of loading of the classes. I suggest you to have a look at this diff (https://source.jboss.org/browse/JBossAS/server/src/main/java

Re: Boosting in function queries?

2013-08-06 Thread Upayavira
Try: _query_:"{!dismax qf=Fname^8.0 v=$f_name}" OR _query_:"{!dismax qf=Lname^8.0 v=$l_name}" If you are using one of the later 4.x releases, you might find you can do away with the _query_: {!dismax qf=Fname^8.0 v=$f_name} OR {!dismax qf=Lname^8.0 v=$l_name} I haven't tried any of this

Re: Invalid UTF-8 character 0xfffe during shard update

2013-08-06 Thread Federico Chiacchiaretta
2013/8/6 Raymond Wiker > Ok, let me rephrase that slightly: does your database extraction include > BLOBs or CLOBs that are actually complete documents, that might be UTF-8 > encoded text? > > It definitely does, each entry I have in PostgreSQL has a field of type "text" that include UTF-8 encode