Delta-Import adding duplicate entry.

2012-02-27 Thread Suneel
Hello Friend, I am working on delta-import I have configured according to given article "http://wiki.apache.org/solr/DataImportHandler#head-9ee74e0ad772fd57f6419033fb0af9828222e041";. but every time when i am executing delta-import through DIH it picked only changed data that is ok, but rather th

Reply:Re: Does solrj support compound type for field?

2012-02-27 Thread SuoNayi
Thanks Mikhail,what I mean is that when I index an instance of my POJO which has a property of List type with Field annotation and it's element is a complex type while not primitive type, such as my own Contact class, can solr index this instance successfully? If successful how can I retrieve via

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread bing
Hi, Erick, I get your point. Thank you so much. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782938.html Sent from the Solr - User mailing list archive at

Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread Erick Erickson
As I understand it (and I'm just getting into SolrCloud myself), you can essentially forget about master/slave stuff. If you're using NRT, the soft commit will make the docs visible, you don't ned to do a hard commit (unlike the master/slave days). Essentially, the update is sent to each shard lead

Re: How to Index Custom XML structure

2012-02-27 Thread Erick Erickson
You might be able to do something with the XSL Transformer step in DIH. It might also be easier to just write a SolrJ program to parse the XML and construct a SolrInputDocument to send to Solr. It's really pretty straightforward. Best Erick On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya

Re: TIKA Errors Importing MS Word Documents into SOLR Cloud

2012-02-27 Thread Erick Erickson
You *probaby* can update the Tika libraries in Solr, but it'll be "interesting" to get all the right ones updated, there are a bunch of them in Tika. And I make no guarantees. If it proves difficult, it's not too hard to write a SolrJ program that does the Tika extraction and run it on a client to

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread Erick Erickson
It runs any place that has access to the raw files and an HTTP connection to the Solr server, which is another way of saying "sounds good to me". Erick On Mon, Feb 27, 2012 at 9:18 PM, bing wrote: > HI, Erick, > > I can write SolrJ client to call Tika, but I am not certain where to invoke > the

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker
I'll have to check on the commit situation. We have been pushing data from SharePoint the last week or so. Would that somehow block the documents moving between the solr instances? I'll try another version tomorrow. Thanks for the suggestions. On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller wrote:

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread bing
HI, Erick, I can write SolrJ client to call Tika, but I am not certain where to invoke the client. In my case, I work on Dspace to call Solr, and I suppose the client should be invoked in-between Dspace and Solr. That is, Dspace invokes SolrJ client when doing index/query, which call Tika and So

Modify Standalone solr server to use it application without http request

2012-02-27 Thread Neel
Hi, We are already using embedded solr in our application. In production we have 3 app servers and each app server has a copy of index of each type. These indexes built externally once in a week and replaced. We now want allow incremental indexing and auto update to other servers rather than bui

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs
Exactly, I'm using a tint field type and works really well. The only problem is when I have a set of very wide ranges and make Solr make fireworks out of the blue. Thank you a lot Michael, I appreciate your help on this one :) -- View this message in context: http://lucene.472066.n3.nabble.com/I

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov
I don't know if this would help with OOM conditions, but are you using a tint type field for this? That should be more efficient to search than a regular int or string. -Mike On 02/27/2012 05:27 PM, federico.wachs wrote: Yeah that's what I'm doing right now. But whenever I try to index an ap

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller
Hmmm...all of that looks pretty normal... Did a commit somehow fail on the other machine? When you view the stats for the update handler, are there a lot of pending adds for on of the nodes? Do the commit counts match across nodes? You can also query an individual node with distrib=false to che

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs
Yeah that's what I'm doing right now. But whenever I try to index an apartment that has many wide ranges, my master solr server throws OutOfMemoryError ( I have set max heap to 1024m). So I thought this could be a good workaround but puf it is a lot harder than it seems! -- View this message in c

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov
Yes, I see - I think your best bet is to index every day as a distinct value. Don't worry about having 100's of values. -Mike On 02/27/2012 05:11 PM, federico.wachs wrote: This is used on an apartment booking system, and what I store as solr documents can be seen as apartments. These apartmen

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs
This is used on an apartment booking system, and what I store as solr documents can be seen as apartments. These apartments can be booked for a certain amount of days with a check in and a check out date hence the ranges I was speaking of before. What I want to do is to filter off the apartments t

Re: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Alexey Verkhovsky
Actually, "use raw parser unless query has dismax syntax" approach doesn't fit, because it kills a lot of useful dismax-related functionality, described here: http://wiki.apache.org/solr/DisMaxQParserPlugin#Parameters. However, there is a little cleaner solution than what I originally had in mind:

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov
No; contiguous means there are no gaps between them. You need something like what you described initially. Another approach is to de-normalize your data so that you have a single document for every range. But this might or might not suit your application. You haven't said anything about the

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs
Oh No, I think I understood wrong when you said that my ranges where contiguous. I could have ranges like this: 1 TO 15 5 TO 30 50 TO 60 And so on... I'm not sure that what you supposed would work, right? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-imp

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov
I think your example case would end up like this: ... 1 -- single-valued range field 15 ... On 02/27/2012 04:26 PM, federico.wachs wrote: Michael thanks a lot for your quick answer, but i'm not exactly sure I understand your solution. How would the docuemnt you are proposin

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs
Michael thanks a lot for your quick answer, but i'm not exactly sure I understand your solution. How would the docuemnt you are proposing would look like? Do you mind showing me a simple xml as example? Again, thank you for your cooperation. And yes, the ranges are contiguous! -- View this messag

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov
If your ranges are always contiguous, you could index two fields: range-start and range-end and then perform queries like: range-start:[* TO 30] AND range-end:[5 TO *] If you have multiple ranges which could have gaps in between then you need something more complicated :) On 02/27/2012 04:09

Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs
Hi all ! Here's my dreadful case, thank you for helping out! I want to have a document like this: ... -- multivalued range field 1 TO 10 5 TO 15 ... And the reason why I want to do this is because it's so much lighter than having all the numbers in there, of

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker
Here is most of the cluster state: Connected to Zookeeper localhost:2181, localhost: 2182, localhost:2183 /(v=0 children=7) "" /CONFIGS(v=0, children=1) /CONFIGURATION(v=0 children=25) < all the configuration files, velocity info, xslt, etc. /NODE_STATES(v=0 child

RE: sun-java6 alternatives for Solr 3.5

2012-02-27 Thread Demian Katz
For what it's worth, I run Solr 3.5 on Ubuntu using the OpenJDK packages and I haven't run into any problems. I do realize that sometimes the Sun JDK has features that are missing from other Java implementations, but so far it hasn't affected my use of Solr. - Demian > -Original Message--

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe
On 2/27/2012 at 3:16 PM, Alexey Verkhovsky wrote: > By the way, I'm not sure that edismax interpreting 'wal mart' as 'wal' OR > 'mart' is really a bug that should be fixed. It's a counter-intuitive > behavior, for sure, but - per my understanding - edismax is supposed to > treat consecutive words a

performance between ExternalFileField and Join

2012-02-27 Thread Kevin Osborn
I am looking at two different options to filter results in Solr, basically a per-user access control list. Our index is about 2.5 million documents The first option is to use ExternalFieldField. It seems pretty straightforward. Just put the necessary data in the files and query against that data.

Re: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Alexey Verkhovsky
On Mon, Feb 27, 2012 at 12:36 PM, Steven A Rowe wrote: > Separately, do you know about the "raw" query parser[2]? I'm not sure if > it would help, but you may be able to use it in alternate solution. > And explicitly route to edismax when dismax syntax is detected in the query? That would make

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker
I was trying to use the new interface. I see it using the old admin page. Is there a piece of it you're interested in? I don't have access to the Internet where it exists so it would mean transcribing it. On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller wrote: > > On Feb 27, 2012, at 2:22 PM, Matth

Re: sun-java6 alternatives for Solr 3.5

2012-02-27 Thread Octavian Covalschi
I'm not an Ubuntu user, but I think I read somewhere that sun's jdks packages have been removed from repositories. Don't know more details, but you should be able to install them by yourself... download and install appropriate rpm's, that's the way I did using Fedora 14-16 On Mon, Feb 27, 2012 at

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller
On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote: > Thanks for your reply Mark. > > I believe the build was towards the begining of the month. The > solr.spec.version is 4.0.0.2012.01.10.38.09 > > I cannot access the clusterstate.json contents. I clicked on it a couple of > times, but nothing

[Job] Research Internships

2012-02-27 Thread Grant Ingersoll
Hi, I have internships open for this summer for students interested in working on search and machine learning. Description is below. -Grant Research Engineer Internship DESCRIPTION Lucid Imagination, the leading commercial company for Apache Lucene and Solr, is looking for interns to work on

Re: Speeding up indexing

2012-02-27 Thread Memory Makers
A quick add on to this -- we have over 30 million documents. I take it that we should be looking @ Distributed Solr? as in http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344 Thanks. On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers wrote: > Many thanks for the response. > > H

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe
Hi Alexey, Lucene's QueryParser, and at least some of Solr's query parsers - I'm not familiar with all of them - have the problem you mention: analyzers are fed queries word-by-word, instead of whole strings between operators. There is a JIRA issue for fixing this, but no work done yet:

Re: Speeding up indexing

2012-02-27 Thread Memory Makers
Many thanks for the response. Here is the revised questions: For example if I have N processes that are producing documents to index: 1. Should I have them simultaneously submit documents to Solr (will this improve the indexing throughput)? 2. Is there anything I can do Solr configuration wise th

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker
Thanks for your reply Mark. I believe the build was towards the begining of the month. The solr.spec.version is 4.0.0.2012.01.10.38.09 I cannot access the clusterstate.json contents. I clicked on it a couple of times, but nothing happens. Is that stored on disk somewhere? I configured a custom r

sun-java6 alternatives for Solr 3.5

2012-02-27 Thread ku3ia
Hi all! I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to my sources list and updated it, but I can't find a package sun-java6-*: root@ubuntu:~# apt-cache search java6 default-jdk - Standard Java or Java compatible Development Kit default-jre - Standard Java or Java compati

Re: Speeding up indexing

2012-02-27 Thread Mikhail Khludnev
My two cents: - pulling is better than pushing - http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update - DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But there are few patches for trunk which fix it. Regards On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher

Re: Time Stats

2012-02-27 Thread Raimon Bosch
Anyone up to provide an answer? The idea is have a kind of CustomInteger compound by an array of timestamps. The value shown in this field would be based in the date range that you're sending. Biggest problem will be that this field would be in all the documents on your solr index so you need to

Re: Speeding up indexing

2012-02-27 Thread Erik Hatcher
Yes, absolutely. Parallelizing indexing can make a huge difference. How you do so will depend on your indexing environment. Most crudely, running multiple indexing scripts on different subsets of data up to the the limitations of your operating system and hardware is how many do it. SolrJ h

Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Alexey Verkhovsky
Say, there is an index of business names (fairly short text snippets), containing: Walmart, Walmart Bakery and Mini Mart. And say we need a query for 'wal mart' to match all three, with an appropriate ranking order. Also need 'walmart', 'walmart bakery' and 'bakery' to find the right things in the

Re: Solr 4.0 Question

2012-02-27 Thread Jamie Johnson
Thanks for clarifying Yonik. On Sat, Feb 25, 2012 at 3:57 PM, Yonik Seeley wrote: > On Sat, Feb 25, 2012 at 3:39 PM, Jamie Johnson wrote: >> "Unfortunately, Apache Solr still uses this horrible code in a lot of >> places, leaving us with a major piece of work undone. Major parts of >> Solr’s fac

Re: Solr Transaction Log Question

2012-02-27 Thread Jamie Johnson
perfect, thanks Yonik! On Sat, Feb 25, 2012 at 11:41 PM, Yonik Seeley wrote: > On Sat, Feb 25, 2012 at 11:30 PM, Jamie Johnson wrote: >> How large will the transaction log grow, and how long should it be kept >> around? > > We keep around enough logs to satisfy a minimum of 100 updates > lookba

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller
Hey Matt - is your build recent? Can you visit the cloud/zookeeper page in the admin and send the contents of the clusterstate.json node? Are you using a custom index chain or anything out of the ordinary? - Mark On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote: > TWIMC: > > Environment >

Re: Does solrj support compound type for field?

2012-02-27 Thread Mikhail Khludnev
Hello, >From what are you saying I can conclude you need something like http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html News are not really great for you, work in progress https://issues.apache.org/jira/browse/SOLR-3076 I've heard that ElasticSearch has some sort of

Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker
TWIMC: Environment = Apache SOLR rev-1236154 Apache Zookeeper 3.3.4 Windows 7 JDK 1.6.0_23.b05 I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty servers. I created a 3 node zookeeper ensemble to manage the solr configuration data. All the instances run on one serve

Re: General question on understanding Solr log output

2012-02-27 Thread Mikhail Khludnev
Hello Loren, I suppose you are confused by printing list of *present* commits by SolrDeletionPolicy Feb 27, 2012 6:22:37 AM org.apache.solr.core.*SolrDeletionPolicy onCommit* INFO: SolrDeletionPolicy.*onCommit: commits:num=2* commit{dir=/home/search/solr/solr/data/index,segFN=segments_141z,versio

Re: TermsComponent show only terms that matched query?

2012-02-27 Thread Jay Hill
Yes, per-doc. I mentioned TermsComponent but meant TermVectorComponent, where we get back all the terms in the doc. Just wondering if there was a way to only get back the terms that matched the query. Thanks EE, -Jay On Sat, Feb 25, 2012 at 2:54 PM, Erick Erickson wrote: > Jay: > > I've seen th

Re: Index empty after restart.

2012-02-27 Thread zarni aung
Check in the data directory to make sure that they are present. If so, you just need to load the cores again. On Mon, Feb 27, 2012 at 11:30 AM, Wouter de Boer < wouter.de.b...@springest.nl> wrote: > Hi, > > I run SOLR on Jetty. After a restart of Jetty, the indices are empty. > Anyone > an idea

Index empty after restart.

2012-02-27 Thread Wouter de Boer
Hi, I run SOLR on Jetty. After a restart of Jetty, the indices are empty. Anyone an idea what the reason can be? Regards, Wouter. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-empty-after-restart-tp3781237p3781237.html Sent from the Solr - User mailing list archive a

Re: Customizing Solr score with DixMax query

2012-02-27 Thread Xiao
Yes! Thank you! I also get this in this morning from Sematext Blog. Edismax " Supports the “boost” parameter.. like the dismax bf param, but multiplies the function query instead of adding it in" http://blog.sematext.com/2010/01/20/solr-digest-january-2010/ -- View this message in context: htt

Re: distributed deletes working?

2012-02-27 Thread Jamie Johnson
Thanks Mark. I'll pull the latest trunk today and run with that. On Sun, Feb 26, 2012 at 10:37 AM, Mark Miller wrote: >> >> >> >> Are there any outstanding issues that I should be aware of? >> >> > Not that I know of - we where trying to track down an issue around peer > sync recovery that our C

[Announce] Solr 4.0 with RankingAlgorithm 1.4, NRT

2012-02-27 Thread Nagendra Nagarajayya
I am very excited to announce the availability of Solr 4.0 with RankingAlgorithm 1.4 (NRT support) (Early Access Release). RankingAlgorithm 1.4 supports the entire Lucene Query Syntax, ± and/or boolean queries and is much faster than 1.3 and is compatible with Lucene 4.0. You can get more in

Re: Solr Performance Improvement and degradation Help

2012-02-27 Thread naptowndev
I've run some test on both the versions of Solr we are testing... one is the 2010.12.10 build and the other is the 2012.02.16 build. The latter one is where we were initially seeing poor response performance. I've attached 4 text files which have the results of a few runs against each of the buil

Re: Solr Performance Improvement and degradation Help

2012-02-27 Thread naptowndev
I will run some queries today, both with lazyfield loading on and off (for the 2010 build we're using and the 2012 build we're using) and get you some of the debug data. On Sun, Feb 26, 2012 at 4:13 PM, Yonik Seeley-2-2 [via Lucene] < ml-node+s472066n318...@n3.nabble.com> wrote: > On Sun, F

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread Erick Erickson
My *real* suggestion would be to not do it. Write a SolrJ program that uses whatever version of Tika you want to download and use *that* to index rather than try to sort through the various jar dependencies in Solr. It'd be safer. Otherwise, you're on your own here. Here's some example code: htt

Re: nutch and solr

2012-02-27 Thread alessio crisantemi
now, all works! I have another problem If I use a conector with my solr-nutch. this is the error: Grave: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version: -11 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.cor

Getting Junk Values in Dynamic fields

2012-02-27 Thread mechravi25
Hi, I am getting junk value in dynamic field in SOLR. I am using Sqlserver driver(net.sourceforge.jtds.jdbc.Driver) for connecting database and the same driver name is got as a junk value in my dynamic field values.The below is sample junk value, - net.sourceforge.jtds.jdbc.ClobImpl@55

Re: Customizing Solr score with DixMax query

2012-02-27 Thread Ahmet Arslan
--- On Mon, 2/27/12, Xiao wrote: > From: Xiao > Subject: Customizing Solr score with DixMax query > To: solr-user@lucene.apache.org > Date: Monday, February 27, 2012, 5:59 AM > In my application logic, I want to > implement the ranking (scoring) logic as > follows: > > score = "Solr relecenc

Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread roz dev
Hi All, I am trying to understand features of Solr Cloud, regarding commits and scaling. - If I am using Solr Cloud then do I need to explicitly call commit (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of writing to disk? - Do We still need to use Master