date:20120227

Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread roz dev

Hi All,

I am trying to understand features of Solr Cloud, regarding commits and
scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

Any inputs are welcome.

Thanks

-Saroj

Re: Customizing Solr score with DixMax query

2012-02-27 Thread Ahmet Arslan



--- On Mon, 2/27/12, Xiao  wrote:

> From: Xiao 
> Subject: Customizing Solr score with DixMax query
> To: solr-user@lucene.apache.org
> Date: Monday, February 27, 2012, 5:59 AM
> In my application logic, I want to
> implement the ranking (scoring) logic as
> follows: 
> 
> score = "Solr relecency score" * a_special_field_value.
> 
> I tried to use DixMax to do this. My query statement is
> q={!type=dixmax
> qf='title content' bf=field1}data. However, when I open the
> debugquery
> option, I find that what Solr does is just a "sum of" of the
> two scores,
> i.e., the TF-IDF score and FunctionQuery score. But what I
> want is to
> multiple the two together. How can I implement the
> multiplication operation?

edismax has a boost parameter for this.
q={!type=edixmax qf='title content' boost=field1}data

Getting Junk Values in Dynamic fields

2012-02-27 Thread mechravi25

Hi,

I am getting junk value in dynamic field in SOLR.

 I am using Sqlserver driver(net.sourceforge.jtds.jdbc.Driver) for
connecting database and the same driver name is got as a junk value in my
dynamic field values.The below is sample junk value,
  
  - 
net.sourceforge.jtds.jdbc.ClobImpl@5570e0f7 



The solrconfig.xml and schema.xml that i am using,

Solrconfig.xml

 
  
  
Schema.xml
  
   
  
  
  
 
 
  
 
 
 
  
  
   
   
  
 
 
 
 
 
   
  
  
  
 Am i missing something here? Please guide me.

Thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Junk-Values-in-Dynamic-fields-tp3780560p3780560.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: nutch and solr

2012-02-27 Thread alessio crisantemi

now, all works!

I have another problem If I use a conector with my solr-nutch.
this is the error:

Grave: java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
 at org.apache.solr.core.SolrCore.(SolrCore.java:579)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
 at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
 at connector.SolrConnector.(SolrConnector.java:33)
 at connector.SolrConnector.getInstance(SolrConnector.java:69)
 at connector.SolrConnector.getSolrServer(SolrConnector.java:77)
 at connector.QueryServlet.doGet(QueryServlet.java:117)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
 at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
 at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
 at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
 at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
 at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.lucene.index.CorruptIndexException: Unknown format
version: -11
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:247)
 at
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:72)
 at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
 at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
 at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057)
 ... 26 more

SUGGESTIONS?
thanks,
alessio

Il giorno 25 febbraio 2012 10:52, alessio crisantemi <
alessio.crisant...@gmail.com> ha scritto:

> thi is the problem!
> Becaus in my root there is a url!
>
> I write you my step-by-step configuration of nutch:
> (I use cygwin because I work on windows)
>
> *1. Extract the Nutch package*
>
> *2. Configure Solr*
> (*Copy the provided Nutch schema from directory apache-nutch-1.0/conf to
> directory apache-solr-1.3.0/example/solr/conf (override the existing
> file) for *to allow Solr to create the snippets for search results so we
> need to store the content in addition to indexing it:
>
> *b. Change schema.xml so that the stored attribute of field “content” is
> true.*
>
> **
>
> We want to be able to tweak the relevancy of queries easily so we’ll
> create new dismax request handler configuration for our use case:
>
> *d. Open apache-solr-1.3.0/example/solr/conf/solrconfig.xml and paste
> following fragment to it*
>
> 
>
> 
>
> dismax
>
> explicit
>
> 0.01
>
> 
>
> content^0.5 anchor^1.0 title^1.2
>
> 
>
> 
>
> content^0.5 anchor^1.5 title^1.2 site^1.5
>
> 
>
> 
>
> url
>
> 
>
> 
>
> 2<-1 5<-2 6<90%
>
> 
>
> 100
>
> 
>
> *:*
>
> title url content
>
> 0
>
> title
>
> 0
>
> url
>
> regex
>
> 
>
> 
>
> *3. Start Solr*
>
> cd apache-solr-1.3.0/example
>
> java -jar start.jar
>
> *4. Configure Nutch*
>
> *a. Open nutch-site.xml in directory apache-nutch-1.0/conf, replace it’s
> contents with the following (we specify our crawler name, active plugins
> and limit maximum url count for single host per run to be 100) :*
>
> 
>
> 
>
> 
>
> http.agent.name
>
> nutch-solr-integration
>
> 
>
> 
>
> generate.max.per.host
>
> 100
>
> 
>
> 
>
> plugin.includes
>
>
> protocol-http|urlfilter-regex|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)
>
> 
>
> 
>
> *b. Open regex-urlfilter.txt in directory apache-nutch-1.0/conf, replace
> it’s content with following:*
>
> -^(https|telnet|fil

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread Erick Erickson

My *real* suggestion would be to not do it. Write a SolrJ
program that uses whatever version of Tika you want
to download and use *that* to index rather than try to
sort through the various jar dependencies in Solr. It'd be
safer.

Otherwise, you're on your own here.

Here's some example code:

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Feb 26, 2012 at 9:01 PM, bing  wrote:
> Hi, Erick,
>
> My idea is to use Tika0.10 in Dspace1.7.2, which is based on two steps:
>
> 1. Upgrade Solr1.4.1 to Solr3.3.0 in Dspace1.7.2
> In the following link, upgraded Solr & Lucene 3.3.0 has been resolved.
> https://jira.duraspace.org/browse/DS-980
>
> 2. Upgrade to Tika0.10 in Solr3.3.0
> In the following link, people has tried to upgrade Tika0.8 to Tika0.9.
> http://lucene.472066.n3.nabble.com/upgrading-to-Tika-0-9-on-Solr-1-4-1-td2570526.html
>
> I was thinking, if both the above two steps can be achieved, then maybe I
> can get it done. What is your suggestion?
>
> Thank you.
>
> Best Regards,
> Bing
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3779437.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Performance Improvement and degradation Help

2012-02-27 Thread naptowndev

I will run some queries today, both with lazyfield loading on and off (for
the 2010 build we're using and the 2012 build we're using) and get you some
of the debug data.



On Sun, Feb 26, 2012 at 4:13 PM, Yonik Seeley-2-2 [via Lucene] <
ml-node+s472066n318...@n3.nabble.com> wrote:

> On Sun, Feb 26, 2012 at 3:32 PM, Erick Erickson <[hidden 
> email]>
> wrote:
> > Would you hypothesize that lazy field loading could be that much
> > slower if a large fraction of fields were selected?
>
> If you actually use the lazy field later, it will cause an extra read
> for each field.
> If you don't have enough free RAM for the OS to cache the entire index
> it could be even worse... the first time reading the document you take
> a hit from a real disk seek, then when you go and access those fields
> (assuming they have already been evicted from the OS cache) you take
> the hit of another disk seek.  Those could really add up.
>
> So if we're actually seeing much worse performance for lazy loading
> now than in the past, one guess would be it's due to that scenario in
> conjunction with something that is actually accessing the lazy fields.
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-Performance-Improvement-and-degradation-Help-tp3767015p318.html
>  To unsubscribe from Solr Performance Improvement and degradation Help, click
> here
> .
> NAML
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Improvement-and-degradation-Help-tp3767015p3780843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Performance Improvement and degradation Help

2012-02-27 Thread naptowndev

I've run some test on both the versions of Solr we are testing... one is the
2010.12.10 build and the other is the 2012.02.16 build.  The latter one is
where we were initially seeing poor response performance.  I've attached 4
text files which have the results of a few runs against each of the builds
with and without LazyFieldLoading enabled (plus some on the later build with
wildcard fl parameters enabled).

>From what I see, the timings don't seem to be too telling (but not really
knowing the ins and outs of it you may see something different).  Where we
see the hit/performance is on the response time getting the information
back.

Hopefully this helps some.

http://lucene.472066.n3.nabble.com/file/n3780995/2010-12-10build_lazyfieldloading_false.txt
2010-12-10build_lazyfieldloading_false.txt 
http://lucene.472066.n3.nabble.com/file/n3780995/2010-12-10build_lazyfieldloading_true.txt
2010-12-10build_lazyfieldloading_true.txt 
http://lucene.472066.n3.nabble.com/file/n3780995/2012-02-16build_lazyfieldloading_false.txt
2012-02-16build_lazyfieldloading_false.txt 
http://lucene.472066.n3.nabble.com/file/n3780995/2012-02-16build_lazyfieldloading_true.txt
2012-02-16build_lazyfieldloading_true.txt 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Improvement-and-degradation-Help-tp3767015p3780995.html
Sent from the Solr - User mailing list archive at Nabble.com.

[Announce] Solr 4.0 with RankingAlgorithm 1.4, NRT

2012-02-27 Thread Nagendra Nagarajayya

I am very excited to announce the availability of Solr 4.0 with 
RankingAlgorithm 1.4 (NRT support) (Early Access Release).


RankingAlgorithm 1.4 supports the entire Lucene Query Syntax, ± and/or 
boolean queries and is much faster than 1.3 and is compatible with 
Lucene 4.0.


You can get more information about NRT performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_4.x

You can download Solr 4.0 with RankingAlgorithm 1.4 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org

Re: distributed deletes working?

2012-02-27 Thread Jamie Johnson

Thanks Mark.  I'll pull the latest trunk today and run with that.

On Sun, Feb 26, 2012 at 10:37 AM, Mark Miller  wrote:
>>
>>
>>
>> Are there any outstanding issues that I should be aware of?
>>
>>
> Not that I know of - we where trying to track down an issue around peer
> sync recovery that our ChaosMonkey* tests were tripping, but looks like
> Yonik may have tracked that down last night.
>
> * The ChaosMonkey tests  randomly start, stop, and kill servers as we index
> and delete with multiple threads - at the end we make sure everything is
> consistent and that the client(s) had no errors sending requests.
>
> --
> - Mark
>
> http://www.lucidimagination.com

Re: Customizing Solr score with DixMax query

2012-02-27 Thread Xiao

Yes! Thank you! I also get this in this morning from Sematext Blog.

Edismax
" Supports the “boost” parameter.. like the dismax bf param, but multiplies
the function query instead of adding it in" 

http://blog.sematext.com/2010/01/20/solr-digest-january-2010/

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Customizing-Solr-score-with-DixMax-query-tp3779591p3781200.html
Sent from the Solr - User mailing list archive at Nabble.com.

Index empty after restart.

2012-02-27 Thread Wouter de Boer

Hi,

I run SOLR on Jetty. After a restart of Jetty, the indices are empty. Anyone
an idea what the reason can be?

Regards,
Wouter.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-empty-after-restart-tp3781237p3781237.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index empty after restart.

2012-02-27 Thread zarni aung

Check in the data directory to make sure that they are present.  If so, you
just need to load the cores again.

On Mon, Feb 27, 2012 at 11:30 AM, Wouter de Boer <
wouter.de.b...@springest.nl> wrote:

> Hi,
>
> I run SOLR on Jetty. After a restart of Jetty, the indices are empty.
> Anyone
> an idea what the reason can be?
>
> Regards,
> Wouter.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Index-empty-after-restart-tp3781237p3781237.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: TermsComponent show only terms that matched query?

2012-02-27 Thread Jay Hill

Yes, per-doc. I mentioned TermsComponent but meant TermVectorComponent,
where we get back all the terms in the doc. Just wondering if there was a
way to only get back the terms that matched the query.

Thanks EE,
-Jay


On Sat, Feb 25, 2012 at 2:54 PM, Erick Erickson wrote:

> Jay:
>
> I've seen the this question go 'round before, but don't remember
> a satisfactory solution. Are you talking on a per-document basis
> here? If so, I vaguely remember it being possible to do something
> with highlighting, just counting the tags returned after highlighting.
>
> Best
> Erick
>
> On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill  wrote:
> > I have a situation where I want to show the term counts as is done in the
> > TermsComponent, but *only* for terms that are *matched* in a query, so I
> > get something returned like this (pseudo code):
> >
> > q=title:(golf swing)
> >
> > 
> > title: golf legends show how to improve your golf swing on the golf
> course
> > ...other fields
> > 
> >
> > 
> > golf (3)
> > swing (1)
> > 
> >
> > rather than getting back all of the terms in the doc.
> >
> > Thanks,
> > -Jay
>

Re: General question on understanding Solr log output

2012-02-27 Thread Mikhail Khludnev

Hello Loren,

I suppose you are confused by printing list of *present* commits by
SolrDeletionPolicy

Feb 27, 2012 6:22:37 AM org.apache.solr.core.*SolrDeletionPolicy onCommit*
INFO: SolrDeletionPolicy.*onCommit: commits:num=2*
commit{dir=/home/search/solr/solr/data/index,segFN=segments_141z,version=
1328113878743,generation=*51911*,
...
commit{dir=/home/search/solr/solr/data/index,segFN=segments_1420,version=1328113878746,generation=
*51912*

Looks like your expectation about your the app the reality.

Regards

On Mon, Feb 27, 2012 at 11:25 AM, Loren Siebert  wrote:

> I'm trying to understand what Solr is doing regarding commits based on the
> logs below. I have a 60 second autocommit and no explicit commits coming
> in.
> What I am seeing in my Solr log appears to be 2 (or 3?) commits per 60
> second commit cycle. Everything below happens within a few seconds. I say
> maybe 3 commits because the third one reports the same generation as the
> second one. At any rate, I would expect to only see one commit and one
> group
> of cache statistics, not three. Am I misinterpreting the logs?
>
> Feb 27, 2012 6:22:36 AM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start
>
> commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=false)
> Feb 27, 2012 6:22:37 AM org.apache.solr.core.SolrDeletionPolicy onCommit
> INFO: SolrDeletionPolicy.onCommit: commits:num=2
>
>
> commit{dir=/home/search/solr/solr/data/index,segFN=segments_141z,version=1328113878743,generation=51911,filenames=[_189h.tii,
> _189f.frq, _189h.fdx, _17nb.fdx, _187m.tis, _17nb.fdt, _187w_3.del,
> _187m.tii, _187w.fdt, _17nb.tis, _1894.tii, _189f.prx, _189l.tii,
> _187w.fdx,
> _189k.frq, _189n.frq, _1887.frq, _1894.tis, _189n.fdt, _1887.prx,
> _189n.fdx,
> _16q1.nrm, _189i.frq, _189l.frq, _1887.tii, _17nb.prx, _189f_1.del,
> _189f.nrm, _187m.prx, _187m.fnm, _189j.fnm, _1894.nrm, _189i.tii,
> _1887.tis,
> _17nb.tii, _189h.fdt, _1887.nrm, _189m.fnm, _16q1.frq, _189l.nrm,
> _189k.prx,
> _189i.prx, _189n_1.del, _17nb.fnm, _189i.tis, _16q1.tii, _17nb_2f.del,
> _189j.nrm, _16q1.tis, _189j.fdt, _189m.fdt, _189i_1.del, _187w.frq,
> _189m.fdx, _189i.nrm, segments_141z, _189g.frq, _188s.frq, _187m.nrm,
> _187f.tii, _1894.prx, _189l.tis, _189j.fdx, _189k.nrm, _1894.fnm,
> _187w.tii,
> _187f.prx, _187f.fnm, _189j.prx, _187w.tis, _189h.fnm, _188t.frq,
> _189h.tis,
> _1894.fdt, _187f.frq, _189n.fnm, _187f.fdx, _1894.fdx, _189l.prx,
> _187w.nrm,
> _188s.nrm, _189h.frq, _1887.fnm, _187f.tis, _16q1.prx, _189j.tis,
> _188s.prx,
> _189k_1.del, _189l.fnm, _189f.fnm, _189h.prx, _189k.fnm, _188s_1.del,
> _189n.prx, _189m.nrm, _189n.nrm, _189g.tii, _188t.tis, _16q1.fdt,
> _189i.fnm,
> _1887.fdx, _189m.prx, _16q1_1a5.del, _189j.frq, _189n.tis, _1887.fdt,
> _187w.prx, _187f.nrm, _189f.fdt, _189f.tis, _189k.tis, _189f.fdx,
> _189m.frq,
> _16q1.fdx, _187m.frq, _189n.tii, _187f.fdt, _188t.tii, _189m.tis,
> _17nb.frq,
> _189k.fdt, _189j_1.del, _189g.prx, _189k.tii, _189k.fdx, _189g.fdx,
> _16q1.fnm, _189m.tii, _189g.nrm, _189f.tii, _189m_1.del, _187w.fnm,
> _187m.fdt, _188t.fdt, _1894.frq, _189g.fnm, _188t.nrm, _189i.fdx,
> _188t.fdx,
> _188t.prx, _189h.nrm, _189g.tis, _187m.fdx, _188s.fnm, _189i.fdt,
> _188s.tis,
> _17nb.nrm, _189j.tii, _188t.fnm, _187f_5.del, _189l.fdx, _189g.fdt,
> _188s.fdx, _189l.fdt, _188s.tii, _188s.fdt]
>
>
> commit{dir=/home/search/solr/solr/data/index,segFN=segments_1420,version=1328113878746,generation=51912,filenames=[_189h.tii,
> _189f.frq, _189h.fdx, _17nb.fdx, _187m.tis, _17nb.fdt, _187w_3.del,
> _187m.tii, _187w.fdt, _17nb.tis, _1894.tii, _189o.tii, _189f.prx,
> _189l.tii,
> _187w.fdx, _189k.frq, _189n.frq, _1887.frq, _1894.tis, _189n.fdt,
> _1887.prx,
> _189o.tis, _189n.fdx, _16q1.nrm, _189o.frq, _189i.frq, _189l.frq,
> _1887.tii,
> _17nb.prx, _189f_1.del, _189f.nrm, _187m.prx, _187m.fnm, _189o.prx,
> _189j.fnm, _1894.nrm, _189i.tii, _1887.tis, _17nb.tii, _189h.fdt,
> _1887.nrm,
> _189m.fnm, _16q1.frq, _189l.nrm, _189k.prx, _189i.prx, _189n_1.del,
> _17nb.fnm, _189i.tis, _16q1.tii, _17nb_2f.del, _189o.fdt, _189j.nrm,
> _16q1.tis, _189j.fdt, _189m.fdt, _189i_1.del, _187w.frq, _189o.fdx,
> _189m.fdx, _189i.nrm, _189g.frq, _188s.frq, _187m.nrm, _187f.tii,
> _1894.prx,
> _189l.tis, _189j.fdx, _189k.nrm, _1894.fnm, _187w.tii, _187f.prx,
> _187f.fnm,
> _189j.prx, _187w.tis, _189h.fnm, _188t.frq, _189h.tis, _1894.fdt,
> _187f.frq,
> _189n.fnm, _187f.fdx, _1894.fdx, _189l.prx, _187w.nrm, _188s.nrm,
> _189h.frq,
> _1887.fnm, _187f.tis, _16q1.prx, _189j.tis, _188s.prx, _189k_1.del,
> _189l.fnm, _189f.fnm, _189h.prx, _189k.fnm, _188s_1.del, _189n.prx,
> _189m.nrm, _189n.nrm, _189g.tii, _188t.tis, segments_1420, _16q1.fdt,
> _189o.fnm, _189i.fnm, _1887.fdx, _189m.prx, _189j.frq, _189n.tis,
> _1887.fdt,
> _187w.prx, _187f.nrm, _189f.fdt, _189f.tis, _189k.tis, _189f.fdx,
> _189m.frq,
> _16q1.fdx, _187m.frq, _189n.tii, _187f.fdt, _188t.tii, _189m.tis,
> _17nb.frq,
> _189k.fdt, _189j_1.del, _189g.prx, _189k.tii, _189k.fdx, _189g.fdx,

Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker

TWIMC:

Environment
=
Apache SOLR rev-1236154
Apache Zookeeper 3.3.4
Windows 7
JDK 1.6.0_23.b05

I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
servers.

I created a 3 node zookeeper ensemble to manage the solr configuration data.

All the instances run on one server so I've had to move ports around for
the various applications.

I start the 3 zookeeper nodes.

I started the first instance of solr cloud with the parameter to have two
shards.

The start the remaining 3 solr nodes.

The system comes up fine. No errors thrown.

I can view the solr cloud console and I can see the SOLR configuration
files managed by ZooKeeper.

I published data into the SOLR Cloud instances from SharePoint using Apache
Manifold 0.4-incubating. Manifold is setup to publish the data into
collection1, which is the only collection defined in the cluster.

When I query the data from collection1 as per the solr wiki, the results
are inconsistent. Sometimes all the results are there, other times nothing
comes back at all.

It seems to be having an issue auto replicating the data across the cloud.

Is there some specific setting I might have missed? Based upon what I read,
I thought that SOLR cloud would take care of distributing and replicating
the data automatically. Do you have to tell it what shard to publish the
data into as well?

Any help would be appreciated.

Thanks,

Matt

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Does solrj support compound type for field?

2012-02-27 Thread Mikhail Khludnev

Hello,

>From what are you saying I can conclude you need something like
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
News are not really great for you, work in progress
https://issues.apache.org/jira/browse/SOLR-3076

I've heard that ElasticSearch has some sort of support of parent-child
search.

Regards

2012/2/27 SuoNayi 

> Hi all, I'm new to solr and just know that solrj  generates index for
> instance of POJO
>
> by transform instance of pojo into an instance of type of
> SolrInputDocument with DocumentObjectBinder.
>
> Supposing my POJO has  a property of List type and its element compound
> types
>
> which are my customized class, Contact class for example.
>
> In my case does solr support index my List whose element is my own
> customized class?
>
> Thanks.
>
>
>
> SuoNayi




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller

Hey Matt - is your build recent?

Can you visit the cloud/zookeeper page in the admin and send the contents of 
the clusterstate.json node?

Are you using a custom index chain or anything out of the ordinary?


- Mark

On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:

> TWIMC:
> 
> Environment
> =
> Apache SOLR rev-1236154
> Apache Zookeeper 3.3.4
> Windows 7
> JDK 1.6.0_23.b05
> 
> I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
> servers.
> 
> I created a 3 node zookeeper ensemble to manage the solr configuration data.
> 
> All the instances run on one server so I've had to move ports around for
> the various applications.
> 
> I start the 3 zookeeper nodes.
> 
> I started the first instance of solr cloud with the parameter to have two
> shards.
> 
> The start the remaining 3 solr nodes.
> 
> The system comes up fine. No errors thrown.
> 
> I can view the solr cloud console and I can see the SOLR configuration
> files managed by ZooKeeper.
> 
> I published data into the SOLR Cloud instances from SharePoint using Apache
> Manifold 0.4-incubating. Manifold is setup to publish the data into
> collection1, which is the only collection defined in the cluster.
> 
> When I query the data from collection1 as per the solr wiki, the results
> are inconsistent. Sometimes all the results are there, other times nothing
> comes back at all.
> 
> It seems to be having an issue auto replicating the data across the cloud.
> 
> Is there some specific setting I might have missed? Based upon what I read,
> I thought that SOLR cloud would take care of distributing and replicating
> the data automatically. Do you have to tell it what shard to publish the
> data into as well?
> 
> Any help would be appreciated.
> 
> Thanks,
> 
> Matt
> 
> --
> This e-mail and any files transmitted with it may be proprietary.  Please 
> note that any views or opinions presented in this e-mail are solely those of 
> the author and do not necessarily represent those of Apogee Integration.

- Mark Miller
lucidimagination.com

Re: Solr Transaction Log Question

2012-02-27 Thread Jamie Johnson

perfect, thanks Yonik!

On Sat, Feb 25, 2012 at 11:41 PM, Yonik Seeley
 wrote:
> On Sat, Feb 25, 2012 at 11:30 PM, Jamie Johnson  wrote:
>> How large will the transaction log grow, and how long should it be kept 
>> around?
>
> We keep around enough logs to satisfy a minimum of 100 updates
> lookback.  Unneeded log files are deleted automatically.
> When a hard commit is done, we create a new log file (since we know
> the normal index files have been sync'd and hence we no longer need
> the update log for durability).
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10

Re: Solr 4.0 Question

2012-02-27 Thread Jamie Johnson

Thanks for clarifying Yonik.

On Sat, Feb 25, 2012 at 3:57 PM, Yonik Seeley
 wrote:
> On Sat, Feb 25, 2012 at 3:39 PM, Jamie Johnson  wrote:
>> "Unfortunately, Apache Solr still uses this horrible code in a lot of
>> places, leaving us with a major piece of work undone. Major parts of
>> Solr’s facetting and filter caching need to be rewritten to work per
>> atomic segment! For those implementing plugins or other components for
>> Solr, SolrIndexSearcher exposes a “atomic view” of its underlying
>> reader via SolrIndexSearcher.getAtomicReader()."
>
> Some of this is just a misunderstanding, and some of it is a
> difference of opinion.
>
> Solr uses a top-level FieldCache entry for certain types of faceting,
> but it's optional. Solr can also use per-segment FieldCache entries
> when faceting.  The reason we haven't removed the top-level FieldCache
> faceting is that it's faster unless you are doing near-realtime (NRT)
> search (due to the cost of merging terms across segments).  Top level
> fieldcache entries are also more memory efficient for Strings as
> string values are not repeated across each segment.  The right
> approach depends on the specific use-case, and Solr will continue to
> strive to have faceting algorithms optimized for both NRT and non-NRT.
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10

Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Alexey Verkhovsky

Say, there is an index of business names (fairly short text snippets),
containing: Walmart, Walmart Bakery and Mini Mart. And say we need a query
for 'wal mart' to match all three, with an appropriate ranking order. Also
need 'walmart', 'walmart bakery' and 'bakery' to find the right things in
the right order.

Here is the solution we came up with:

1. edismax query parser (we don't need it for this, but do for a number of
other requirements)

2. On the index, apply ShingleFilter, then remove word separators in the
shingles, so that "walmart bakery" is indexed as  "walmart", "bakery",
"walmartbakery"
Schema for this index looks like this:
  






  

3. Before sending the original query to Solr, modify it by adding a
whitespace-stripped version of it. Thus, 'wal mart' becomes 'wal mart
walmart' and walmart bakery becomes 'walmart bakery walmartbakery'. Don't
modify the query if it only has one word in it, or contains any edismax
syntax (double quotes; pluses and minuses in the beginning of a query or
after whitespace).

4. ... profit.

The reason we have to shingle the query before Solr is that edismax parser
treats 'wal mart' as two queries - 'wal' OR 'mart', so applying the
ShingleFilter in the query analyzer doesn't do anything.

This works, but feels a little dirty. Is there a more elegant way to solve
this problem?

-- 
Alex Verkhovsky

Re: Speeding up indexing

2012-02-27 Thread Erik Hatcher

Yes, absolutely.  Parallelizing indexing can make a huge difference.  How you 
do so will depend on your indexing environment.  Most crudely, running multiple 
indexing scripts on different subsets of data up to the the limitations of your 
operating system and hardware is how many do it.   SolrJ has some multithreaded 
facility, as does DataImportHandler.  Distributing the indexing to multiple 
machines, but pointing all to the same Solr server, is effectively the same as 
multi-threading it push documents into Solr from wherever as fast as it can 
handle it.  This is definitely how many do this.

Erik

On Feb 27, 2012, at 13:24 , Memory Makers wrote:

> Hi,
> 
> Is there a way to speed up indexing by increasing the number of threads
> doing the indexing or perhaps by distributing indexing on multiple machines?
> 
> Thanks.

Re: Time Stats

2012-02-27 Thread Raimon Bosch

Anyone up to provide an answer?

The idea is have a kind of CustomInteger compound by an array of
timestamps. The value shown in this field would be based in the date range
that you're sending.

Biggest problem will be that this field would be in all the documents on
your solr index so you need to calculate this number in real-time.

2012/2/26 Raimon Bosch 

>
> Hi,
>
> Today I was playing with StatsComponent just to extract some statistics
> from my index. I'm using a solr index to store user searches. Basically
> what I did is to aggregate data from accesslog into my solr index. So now I
> can see average bounce rate for a group of user searches and see which ones
> are performing better in google.
>
> Now I would like to see the evolution of this stats throught time. For
> that I would need to have a field with a different values throught time i.e.
>
> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
>
> There is any solr type field that could fit to solve this?
>
> Thanks in advance,
> Raimon Bosch.
>

Re: Speeding up indexing

2012-02-27 Thread Mikhail Khludnev

My two cents:
 - pulling is better than pushing -
http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update
 - DIH is not thread safe https://issues.apache.org/jira/browse/SOLR-3011 But
there are few patches for trunk which fix it.

Regards

On Mon, Feb 27, 2012 at 10:46 PM, Erik Hatcher wrote:

> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
> you do so will depend on your indexing environment.  Most crudely, running
> multiple indexing scripts on different subsets of data up to the the
> limitations of your operating system and hardware is how many do it.
> SolrJ has some multithreaded facility, as does DataImportHandler.
>  Distributing the indexing to multiple machines, but pointing all to the
> same Solr server, is effectively the same as multi-threading it push
> documents into Solr from wherever as fast as it can handle it.  This is
> definitely how many do this.
>
>Erik
>
> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>
> > Hi,
> >
> > Is there a way to speed up indexing by increasing the number of threads
> > doing the indexing or perhaps by distributing indexing on multiple
> machines?
> >
> > Thanks.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

sun-java6 alternatives for Solr 3.5

2012-02-27 Thread ku3ia

Hi all!
I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to
my sources list and updated it, but I can't find a package sun-java6-*:
root@ubuntu:~# apt-cache search java6
default-jdk - Standard Java or Java compatible Development Kit
default-jre - Standard Java or Java compatible Runtime
default-jre-headless - Standard Java or Java compatible Runtime (headless)
openjdk-6-jdk - OpenJDK Development Kit (JDK)
openjdk-6-jre - OpenJDK Java runtime, using Hotspot JIT
openjdk-6-jre-headless - OpenJDK Java runtime, using Hotspot JIT (headless)

Than I had goggled and found an article:
https://lists.ubuntu.com/archives/ubuntu-security-announce/2011-December/001528.html

I'm using Solr 3.5 and Apache Tomcat 6.0.32.
Please advice me what I must do in this situation, because I always used
sun-java6-* packages for Tomcat and Solr and it worked fine
Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sun-java6-alternatives-for-Solr-3-5-tp3781792p3781792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker

Thanks for your reply Mark.

I believe the build was towards the begining of the month. The
solr.spec.version is 4.0.0.2012.01.10.38.09

I cannot access the clusterstate.json contents. I clicked on it a couple of
times, but nothing happens. Is that stored on disk somewhere?

I configured a custom request handler to calculate an unique document id
based on the file's url.

On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller  wrote:

> Hey Matt - is your build recent?
>
> Can you visit the cloud/zookeeper page in the admin and send the contents
> of the clusterstate.json node?
>
> Are you using a custom index chain or anything out of the ordinary?
>
>
> - Mark
>
> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
>
> > TWIMC:
> >
> > Environment
> > =
> > Apache SOLR rev-1236154
> > Apache Zookeeper 3.3.4
> > Windows 7
> > JDK 1.6.0_23.b05
> >
> > I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
> > servers.
> >
> > I created a 3 node zookeeper ensemble to manage the solr configuration
> data.
> >
> > All the instances run on one server so I've had to move ports around for
> > the various applications.
> >
> > I start the 3 zookeeper nodes.
> >
> > I started the first instance of solr cloud with the parameter to have two
> > shards.
> >
> > The start the remaining 3 solr nodes.
> >
> > The system comes up fine. No errors thrown.
> >
> > I can view the solr cloud console and I can see the SOLR configuration
> > files managed by ZooKeeper.
> >
> > I published data into the SOLR Cloud instances from SharePoint using
> Apache
> > Manifold 0.4-incubating. Manifold is setup to publish the data into
> > collection1, which is the only collection defined in the cluster.
> >
> > When I query the data from collection1 as per the solr wiki, the results
> > are inconsistent. Sometimes all the results are there, other times
> nothing
> > comes back at all.
> >
> > It seems to be having an issue auto replicating the data across the
> cloud.
> >
> > Is there some specific setting I might have missed? Based upon what I
> read,
> > I thought that SOLR cloud would take care of distributing and replicating
> > the data automatically. Do you have to tell it what shard to publish the
> > data into as well?
> >
> > Any help would be appreciated.
> >
> > Thanks,
> >
> > Matt
> >
> > --
> > This e-mail and any files transmitted with it may be proprietary.
>  Please note that any views or opinions presented in this e-mail are solely
> those of the author and do not necessarily represent those of Apogee
> Integration.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Regards,

Matt Parker (CTR)
Senior Software Architect
Apogee Integration, LLC
5180 Parkstone Drive, Suite #160
Chantilly, Virginia 20151
703.272.4797 (site)
703.474.1918 (cell)
www.apogeeintegration.com

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Speeding up indexing

2012-02-27 Thread Memory Makers

Many thanks for the response.

Here is the revised questions:

For example if I have N processes that are producing documents to index:
1. Should I have them simultaneously submit documents to Solr (will this
improve the indexing throughput)?
2. Is there anything I can do Solr configuration wise that will allow me to
speed up indexing
3. Is there an architecture where I can have two (or more) solr server do
indexing in parallel

Thanks.

On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher wrote:

> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
> you do so will depend on your indexing environment.  Most crudely, running
> multiple indexing scripts on different subsets of data up to the the
> limitations of your operating system and hardware is how many do it.
> SolrJ has some multithreaded facility, as does DataImportHandler.
>  Distributing the indexing to multiple machines, but pointing all to the
> same Solr server, is effectively the same as multi-threading it push
> documents into Solr from wherever as fast as it can handle it.  This is
> definitely how many do this.
>
>Erik
>
> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>
> > Hi,
> >
> > Is there a way to speed up indexing by increasing the number of threads
> > doing the indexing or perhaps by distributing indexing on multiple
> machines?
> >
> > Thanks.
>
>

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe

Hi Alexey,

Lucene's QueryParser, and at least some of Solr's query parsers - I'm not 
familiar with all of them - have the problem you mention: analyzers are fed 
queries word-by-word, instead of whole strings between operators.  There is a 
JIRA issue for fixing this, but no work done yet: 
.

Separately, do you know about the "raw" query parser[2]?  I'm not sure if it 
would help, but you may be able to use it in alternate solution.

One small simplification I can think of for your current setup: 
ShingleFilterFactory[1] takes an option called "tokenSeparator" - if you set 
this to the empty string (""), you can eliminate your whitespace-stripping 
filter.

Steve

[1] 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
[2] 
http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers

> -Original Message-
> From: Alexey Verkhovsky [mailto:alexey.verkhov...@gmail.com]
> Sent: Monday, February 27, 2012 1:26 PM
> To: solr-user@lucene.apache.org
> Subject: Combining ShingleFilter and DisMaxParser, with a twist
> 
> Say, there is an index of business names (fairly short text snippets),
> containing: Walmart, Walmart Bakery and Mini Mart. And say we need a query
> for 'wal mart' to match all three, with an appropriate ranking order. Also
> need 'walmart', 'walmart bakery' and 'bakery' to find the right things in
> the right order.
> 
> Here is the solution we came up with:
> 
> 1. edismax query parser (we don't need it for this, but do for a number of
> other requirements)
> 
> 2. On the index, apply ShingleFilter, then remove word separators in the
> shingles, so that "walmart bakery" is indexed as  "walmart", "bakery",
> "walmartbakery"
> Schema for this index looks like this:
>   
>  pattern="'+" replacement=""/>
> 
> 
>  maxShingleSize="3" outputUnigrams="true"/>
>  replacement=""/>
> 
>   
> 
> 3. Before sending the original query to Solr, modify it by adding a
> whitespace-stripped version of it. Thus, 'wal mart' becomes 'wal mart
> walmart' and walmart bakery becomes 'walmart bakery walmartbakery'. Don't
> modify the query if it only has one word in it, or contains any edismax
> syntax (double quotes; pluses and minuses in the beginning of a query or
> after whitespace).
> 
> 4. ... profit.
> 
> The reason we have to shingle the query before Solr is that edismax parser
> treats 'wal mart' as two queries - 'wal' OR 'mart', so applying the
> ShingleFilter in the query analyzer doesn't do anything.
> 
> This works, but feels a little dirty. Is there a more elegant way to solve
> this problem?
> 
> --
> Alex Verkhovsky

Re: Speeding up indexing

2012-02-27 Thread Memory Makers

A quick add on to this -- we have over 30 million documents.

I take it that we should be looking @ Distributed Solr?
  as in
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344

Thanks.

On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers wrote:

> Many thanks for the response.
>
> Here is the revised questions:
>
> For example if I have N processes that are producing documents to index:
> 1. Should I have them simultaneously submit documents to Solr (will this
> improve the indexing throughput)?
> 2. Is there anything I can do Solr configuration wise that will allow me
> to speed up indexing
> 3. Is there an architecture where I can have two (or more) solr server do
> indexing in parallel
>
> Thanks.
>
> On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher wrote:
>
>> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
>> you do so will depend on your indexing environment.  Most crudely, running
>> multiple indexing scripts on different subsets of data up to the the
>> limitations of your operating system and hardware is how many do it.
>> SolrJ has some multithreaded facility, as does DataImportHandler.
>>  Distributing the indexing to multiple machines, but pointing all to the
>> same Solr server, is effectively the same as multi-threading it push
>> documents into Solr from wherever as fast as it can handle it.  This is
>> definitely how many do this.
>>
>>Erik
>>
>> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>>
>> > Hi,
>> >
>> > Is there a way to speed up indexing by increasing the number of threads
>> > doing the indexing or perhaps by distributing indexing on multiple
>> machines?
>> >
>> > Thanks.
>>
>>
>

[Job] Research Internships

2012-02-27 Thread Grant Ingersoll

Hi,

I have internships open for this summer for students interested in working on 
search and machine learning.  Description is below.

-Grant

Research Engineer Internship

DESCRIPTION
Lucid Imagination, the leading commercial company for Apache Lucene and Solr, 
is looking for interns to work on building next generation search, analytics 
and machine learning technologies based on Apache Solr, Mahout, Hadoop and 
other cutting edge capabilities.This internship will be practically focused 
on working on real problems in search and machine learning as they relate to 
Lucid products and technologies as well as open source.

Interested students (see eligibility below) should send their resume/profile, 
course work and evidence of open source activity (github account, ASF patches 
or other, etc.) to care...@lucidimagination.com.  

To learn more about Lucid Imagination, visit http://www.lucidimagination.com.

REQUIREMENTS
• Interest in working on high performance and large scale problems 
involving structured and unstructured content.
• Relevant coursework in search and/or machine learning.  Coursework in 
linear algebra and statistics is a bonus
• Experience with Lucene, Solr, Hadoop and Mahout (or other machine 
learning libraries) is a plus
• Core Java knowledge
• Eagerness to learn and to apply knowledge to real problems 
• Working on a Computer Science degree (or related field or have 
demonstrable programming skills).  Position is open to both graduate and 
undergraduate students


ELIGIBILITY
In order to participate in the program, you must be a student. Lucid defines a 
student as an individual enrolled in or accepted into an accredited institution 
including colleges, universities, masters programs, PhD programs and 
undergraduate programs. You should be prepared, upon request, to provide Lucid 
with transcripts or other documentation from your accredited institution as 
proof of enrollment or admission status. 

You may be enrolled as a full-time or part-time student. You must also be 
eligible to work in the United States and able to work from our headquarters in 
Redwood City, CA.  You may not participate if prohibited by law.


LOCATION
Redwood City, California  (San Francisco)

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller


On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:

> Thanks for your reply Mark.
> 
> I believe the build was towards the begining of the month. The
> solr.spec.version is 4.0.0.2012.01.10.38.09
> 
> I cannot access the clusterstate.json contents. I clicked on it a couple of
> times, but nothing happens. Is that stored on disk somewhere?

Are you using the new admin UI? That has recently been updated to work better 
with cloud - it had some troubles not too long ago. If you are, you should 
trying using the old admin UI's zookeeper page - that should show the cluster 
state.

That being said, there has been a lot of bug fixes over the past month - so you 
may just want to update to a recent version.

> 
> I configured a custom request handler to calculate an unique document id
> based on the file's url.
> 
> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller  wrote:
> 
>> Hey Matt - is your build recent?
>> 
>> Can you visit the cloud/zookeeper page in the admin and send the contents
>> of the clusterstate.json node?
>> 
>> Are you using a custom index chain or anything out of the ordinary?
>> 
>> 
>> - Mark
>> 
>> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
>> 
>>> TWIMC:
>>> 
>>> Environment
>>> =
>>> Apache SOLR rev-1236154
>>> Apache Zookeeper 3.3.4
>>> Windows 7
>>> JDK 1.6.0_23.b05
>>> 
>>> I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
>>> servers.
>>> 
>>> I created a 3 node zookeeper ensemble to manage the solr configuration
>> data.
>>> 
>>> All the instances run on one server so I've had to move ports around for
>>> the various applications.
>>> 
>>> I start the 3 zookeeper nodes.
>>> 
>>> I started the first instance of solr cloud with the parameter to have two
>>> shards.
>>> 
>>> The start the remaining 3 solr nodes.
>>> 
>>> The system comes up fine. No errors thrown.
>>> 
>>> I can view the solr cloud console and I can see the SOLR configuration
>>> files managed by ZooKeeper.
>>> 
>>> I published data into the SOLR Cloud instances from SharePoint using
>> Apache
>>> Manifold 0.4-incubating. Manifold is setup to publish the data into
>>> collection1, which is the only collection defined in the cluster.
>>> 
>>> When I query the data from collection1 as per the solr wiki, the results
>>> are inconsistent. Sometimes all the results are there, other times
>> nothing
>>> comes back at all.
>>> 
>>> It seems to be having an issue auto replicating the data across the
>> cloud.
>>> 
>>> Is there some specific setting I might have missed? Based upon what I
>> read,
>>> I thought that SOLR cloud would take care of distributing and replicating
>>> the data automatically. Do you have to tell it what shard to publish the
>>> data into as well?
>>> 
>>> Any help would be appreciated.
>>> 
>>> Thanks,
>>> 
>>> Matt
>>> 
>>> --
>>> This e-mail and any files transmitted with it may be proprietary.
>> Please note that any views or opinions presented in this e-mail are solely
>> those of the author and do not necessarily represent those of Apogee
>> Integration.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Regards,
> 
> Matt Parker (CTR)
> Senior Software Architect
> Apogee Integration, LLC
> 5180 Parkstone Drive, Suite #160
> Chantilly, Virginia 20151
> 703.272.4797 (site)
> 703.474.1918 (cell)
> www.apogeeintegration.com
> 
> --
> This e-mail and any files transmitted with it may be proprietary.  Please 
> note that any views or opinions presented in this e-mail are solely those of 
> the author and do not necessarily represent those of Apogee Integration.

- Mark Miller
lucidimagination.com

Re: sun-java6 alternatives for Solr 3.5

2012-02-27 Thread Octavian Covalschi

I'm not an Ubuntu user, but I think I read somewhere that sun's jdks
packages have been removed from repositories. Don't know more details, but
you should be able to install them by yourself... download and install
appropriate rpm's, that's the way I did using Fedora 14-16

On Mon, Feb 27, 2012 at 1:24 PM, ku3ia  wrote:

> Hi all!
> I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to
> my sources list and updated it, but I can't find a package sun-java6-*:
> root@ubuntu:~# apt-cache search java6
> default-jdk - Standard Java or Java compatible Development Kit
> default-jre - Standard Java or Java compatible Runtime
> default-jre-headless - Standard Java or Java compatible Runtime (headless)
> openjdk-6-jdk - OpenJDK Development Kit (JDK)
> openjdk-6-jre - OpenJDK Java runtime, using Hotspot JIT
> openjdk-6-jre-headless - OpenJDK Java runtime, using Hotspot JIT (headless)
>
> Than I had goggled and found an article:
>
> https://lists.ubuntu.com/archives/ubuntu-security-announce/2011-December/001528.html
>
> I'm using Solr 3.5 and Apache Tomcat 6.0.32.
> Please advice me what I must do in this situation, because I always used
> sun-java6-* packages for Tomcat and Solr and it worked fine
> Thanks!
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/sun-java6-alternatives-for-Solr-3-5-tp3781792p3781792.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker

I was trying to use the new interface. I see it using the old admin page.

Is there a piece of it you're interested in? I don't have access to the
Internet where it exists so it would mean transcribing it.

On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller  wrote:

>
> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
>
> > Thanks for your reply Mark.
> >
> > I believe the build was towards the begining of the month. The
> > solr.spec.version is 4.0.0.2012.01.10.38.09
> >
> > I cannot access the clusterstate.json contents. I clicked on it a couple
> of
> > times, but nothing happens. Is that stored on disk somewhere?
>
> Are you using the new admin UI? That has recently been updated to work
> better with cloud - it had some troubles not too long ago. If you are, you
> should trying using the old admin UI's zookeeper page - that should show
> the cluster state.
>
> That being said, there has been a lot of bug fixes over the past month -
> so you may just want to update to a recent version.
>
> >
> > I configured a custom request handler to calculate an unique document id
> > based on the file's url.
> >
> > On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller 
> wrote:
> >
> >> Hey Matt - is your build recent?
> >>
> >> Can you visit the cloud/zookeeper page in the admin and send the
> contents
> >> of the clusterstate.json node?
> >>
> >> Are you using a custom index chain or anything out of the ordinary?
> >>
> >>
> >> - Mark
> >>
> >> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
> >>
> >>> TWIMC:
> >>>
> >>> Environment
> >>> =
> >>> Apache SOLR rev-1236154
> >>> Apache Zookeeper 3.3.4
> >>> Windows 7
> >>> JDK 1.6.0_23.b05
> >>>
> >>> I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
> >>> servers.
> >>>
> >>> I created a 3 node zookeeper ensemble to manage the solr configuration
> >> data.
> >>>
> >>> All the instances run on one server so I've had to move ports around
> for
> >>> the various applications.
> >>>
> >>> I start the 3 zookeeper nodes.
> >>>
> >>> I started the first instance of solr cloud with the parameter to have
> two
> >>> shards.
> >>>
> >>> The start the remaining 3 solr nodes.
> >>>
> >>> The system comes up fine. No errors thrown.
> >>>
> >>> I can view the solr cloud console and I can see the SOLR configuration
> >>> files managed by ZooKeeper.
> >>>
> >>> I published data into the SOLR Cloud instances from SharePoint using
> >> Apache
> >>> Manifold 0.4-incubating. Manifold is setup to publish the data into
> >>> collection1, which is the only collection defined in the cluster.
> >>>
> >>> When I query the data from collection1 as per the solr wiki, the
> results
> >>> are inconsistent. Sometimes all the results are there, other times
> >> nothing
> >>> comes back at all.
> >>>
> >>> It seems to be having an issue auto replicating the data across the
> >> cloud.
> >>>
> >>> Is there some specific setting I might have missed? Based upon what I
> >> read,
> >>> I thought that SOLR cloud would take care of distributing and
> replicating
> >>> the data automatically. Do you have to tell it what shard to publish
> the
> >>> data into as well?
> >>>
> >>> Any help would be appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Matt
> >>>
> >>> --
> >>> This e-mail and any files transmitted with it may be proprietary.
> >> Please note that any views or opinions presented in this e-mail are
> solely
> >> those of the author and do not necessarily represent those of Apogee
> >> Integration.
> >>
> >> - Mark Miller
> >> lucidimagination.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Matt Parker (CTR)
> > Senior Software Architect
> > Apogee Integration, LLC
> > 5180 Parkstone Drive, Suite #160
> > Chantilly, Virginia 20151
> > 703.272.4797 (site)
> > 703.474.1918 (cell)
> > www.apogeeintegration.com
> >
> > --
> > This e-mail and any files transmitted with it may be proprietary.
>  Please note that any views or opinions presented in this e-mail are solely
> those of the author and do not necessarily represent those of Apogee
> Integration.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Alexey Verkhovsky

On Mon, Feb 27, 2012 at 12:36 PM, Steven A Rowe  wrote:

> Separately, do you know about the "raw" query parser[2]?  I'm not sure if
> it would help, but you may be able to use it in alternate solution.
>

And explicitly route to edismax when dismax syntax is detected in the
query? That would make sense, at least from an aesthetic point of view
(aka code readability).

By the way, I'm not sure that edismax interpreting 'wal mart' as 'wal' OR
'mart' is really a bug that should be fixed. It's a counter-intuitive
behavior, for sure, but - per my understanding - edismax is supposed to
treat consecutive words as parts of an OR clause, not as a single phrase.
If what analyzer gets is changed, it would fix some things, but break some
other things.

One small simplification I can think of for your current setup:
> ShingleFilterFactory[1] takes an option called "tokenSeparator" - if you
> set this to the empty string (""), you can eliminate your
> whitespace-stripping filter.
>

Indeed. Thanks for the pointer.

-- 
Alex Verkhovsky

performance between ExternalFileField and Join

2012-02-27 Thread Kevin Osborn

I am looking at two different options to filter results in Solr, basically
a per-user access control list. Our index is about 2.5 million documents

The first option is to use ExternalFieldField. It seems pretty
straightforward. Just put the necessary data in the files and query against
that data.

I was also intrigued by the Join feature in 4.0 trunk (SOLR-2272). In this
case, I would keep my access data in a separate core, and do cross-core
join queries. The two cores would have about the same number of documents
(2.5 million), but one core would have the actual data and the other core
would have the access information. So, the number of unique terms on the
key would be quite high. Would this be too slow?

If someone has any knowledge about the performance issues on these two
methods, please give an advice. Thanks.

-- 
KEVIN OSBORN
LEAD SOFTWARE ENGINEER
T 949.399.8714  C 949.310.4677
5 Park Plaza, Suite 600, Irvine, CA 92614

RE: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Steven A Rowe

On 2/27/2012 at 3:16 PM, Alexey Verkhovsky wrote:
> By the way, I'm not sure that edismax interpreting 'wal mart' as 'wal' OR
> 'mart' is really a bug that should be fixed. It's a counter-intuitive
> behavior, for sure, but - per my understanding - edismax is supposed to
> treat consecutive words as parts of an OR clause, not as a single phrase.
> If what analyzer gets is changed, it would fix some things, but break some
> other things.

Right - Hoss Man's comment on LUCENE-2605 expresses similar concerns: 


Steve

RE: sun-java6 alternatives for Solr 3.5

2012-02-27 Thread Demian Katz

For what it's worth, I run Solr 3.5 on Ubuntu using the OpenJDK packages and I 
haven't run into any problems.  I do realize that sometimes the Sun JDK has 
features that are missing from other Java implementations, but so far it hasn't 
affected my use of Solr.

- Demian

> -Original Message-
> From: ku3ia [mailto:dem...@gmail.com]
> Sent: Monday, February 27, 2012 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: sun-java6 alternatives for Solr 3.5
> 
> Hi all!
> I had installed an Ubuntu 10.04 LTS. I had added a 'partner' repository to
> my sources list and updated it, but I can't find a package sun-java6-*:
> root@ubuntu:~# apt-cache search java6
> default-jdk - Standard Java or Java compatible Development Kit
> default-jre - Standard Java or Java compatible Runtime
> default-jre-headless - Standard Java or Java compatible Runtime (headless)
> openjdk-6-jdk - OpenJDK Development Kit (JDK)
> openjdk-6-jre - OpenJDK Java runtime, using Hotspot JIT
> openjdk-6-jre-headless - OpenJDK Java runtime, using Hotspot JIT (headless)
> 
> Than I had goggled and found an article:
> https://lists.ubuntu.com/archives/ubuntu-security-announce/2011-
> December/001528.html
> 
> I'm using Solr 3.5 and Apache Tomcat 6.0.32.
> Please advice me what I must do in this situation, because I always used
> sun-java6-* packages for Tomcat and Solr and it worked fine
> Thanks!
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/sun-java6-
> alternatives-for-Solr-3-5-tp3781792p3781792.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker

Here is most of the cluster state:

Connected to Zookeeper
localhost:2181, localhost: 2182, localhost:2183

/(v=0 children=7) ""
   /CONFIGS(v=0, children=1)
  /CONFIGURATION(v=0 children=25)
 < all the configuration files, velocity info, xslt, etc.

  /NODE_STATES(v=0 children=4)
 MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
"state":"active","core":"","collection":"collection1","node_name:"..."
 MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
"state":"active","core":"","collection":"collection1","node_name:"..."
 MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
"state":"active","core":"","collection":"collection1","node_name:"..."
 MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
"state":"active","core":"","collection":"collection1","node_name:"..."
  /ZOOKEEPER (v-0 children=1)
 QUOTA(v=0)

/CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
  /LIVE_NODES (v=0 children=4)
 MACHINE1:8083_SOLR(ephemeral v=0)
 MACHINE1:8082_SOLR(ephemeral v=0)
 MACHINE1:8081_SOLR(ephemeral v=0)
 MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1)
 COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
 LEADER_ELECT(v=0 children=2)
 SHARD1(V=0 children=1)
 ELECTION(v=0 children=2)

87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)

87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
 SHARD2(v=0 children=1)
 ELECTION(v=0 children=2)

231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)

159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
 LEADERS (v=0 children=2)
 SHARD1 (ephemeral
v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
http://MACHINE1:8081/solr"}";
 SHARD2 (ephemeral
v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
http://MACHINE1:8082/solr"}";
  /OVERSEER_ELECT (v=0 children=2)
 ELECTION (v=0 children=4)
 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral v=0)
 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral v=0)
 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral v=0)
 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral v=0)
 LEADER (emphemeral
v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_00248"}"



On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller  wrote:

>
> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
>
> > Thanks for your reply Mark.
> >
> > I believe the build was towards the begining of the month. The
> > solr.spec.version is 4.0.0.2012.01.10.38.09
> >
> > I cannot access the clusterstate.json contents. I clicked on it a couple
> of
> > times, but nothing happens. Is that stored on disk somewhere?
>
> Are you using the new admin UI? That has recently been updated to work
> better with cloud - it had some troubles not too long ago. If you are, you
> should trying using the old admin UI's zookeeper page - that should show
> the cluster state.
>
> That being said, there has been a lot of bug fixes over the past month -
> so you may just want to update to a recent version.
>
> >
> > I configured a custom request handler to calculate an unique document id
> > based on the file's url.
> >
> > On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller 
> wrote:
> >
> >> Hey Matt - is your build recent?
> >>
> >> Can you visit the cloud/zookeeper page in the admin and send the
> contents
> >> of the clusterstate.json node?
> >>
> >> Are you using a custom index chain or anything out of the ordinary?
> >>
> >>
> >> - Mark
> >>
> >> On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
> >>
> >>> TWIMC:
> >>>
> >>> Environment
> >>> =
> >>> Apache SOLR rev-1236154
> >>> Apache Zookeeper 3.3.4
> >>> Windows 7
> >>> JDK 1.6.0_23.b05
> >>>
> >>> I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
> >>> servers.
> >>>
> >>> I created a 3 node zookeeper ensemble to manage the solr configuration
> >> data.
> >>>
> >>> All the instances run on one server so I've had to move ports around
> for
> >>> the various applications.
> >>>
> >>> I start the 3 zookeeper nodes.
> >>>
> >>> I started the first instance of solr cloud with the parameter to have
> two
> >>> shards.
> >>>
> >>> The start the remaining 3 solr nodes.
> >>>
> >>> The system comes up fine. No errors thrown.
> >>>
> >>> I can view the solr cloud console and I can see the SOLR configuration
> >>> files managed by ZooKeeper.
> >>>
> >>> I published data into the SOLR Cloud instances from SharePoint using
> >> Apache
> >>> Manifold 0.4-incubating. Manifold is setup to publish the data into
> >>> collection1, which is the only collection defined in the cluster.
> >>>
> >>> When I query the data from collection1 as per the solr wiki, the
> results
> >>> are inconsistent. Sometimes all the results are there, other times
> >> nothing

Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs

Hi all ! 

Here's my dreadful case, thank you for helping out! I want to have a
document like this:


...
 -- multivalued range field
 1 TO 10
 5 TO 15

...

And the reason why I want to do this is because it's so much lighter than
having all the numbers in there, of course. Just to be clear, I want to
avoid having this in solr:


...
 -- multivalued range field
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10

...

And then perform range queries on this range field like: fq=-occupiedDays:[5
TO 30]

Anybody has any idea? I have asked and searched all over the internet and
seems solr does not support this.

Any help would be really helpful! Thanks in advanced.

Federico

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782083.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov

If your ranges are always contiguous, you could index two fields: 
range-start and range-end and then perform queries like:


range-start:[* TO 30] AND range-end:[5 TO *]

If you have multiple ranges which could have gaps in between then you 
need something more complicated :)


On 02/27/2012 04:09 PM, federico.wachs wrote:

Hi all !

Here's my dreadful case, thank you for helping out! I want to have a
document like this:


 ...
   -- multivalued range field
  1 TO 10
  5 TO 15
 
 ...

And the reason why I want to do this is because it's so much lighter than
having all the numbers in there, of course. Just to be clear, I want to
avoid having this in solr:


 ...
   -- multivalued range field
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
 
 ...

And then perform range queries on this range field like: fq=-occupiedDays:[5
TO 30]

Anybody has any idea? I have asked and searched all over the internet and
seems solr does not support this.

Any help would be really helpful! Thanks in advanced.

Federico

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782083.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs

Michael thanks a lot for your quick answer, but i'm not exactly sure I
understand your solution.
How would the docuemnt you are proposing would look like? Do you mind
showing me a simple xml as example?

Again, thank you for your cooperation. And yes, the ranges are contiguous!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782139.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov


I think your example case would end up like this:


...
1  -- single-valued range field
15
...




On 02/27/2012 04:26 PM, federico.wachs wrote:

Michael thanks a lot for your quick answer, but i'm not exactly sure I
understand your solution.
How would the docuemnt you are proposing would look like? Do you mind
showing me a simple xml as example?

Again, thank you for your cooperation. And yes, the ranges are contiguous!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782139.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs

Oh No, I think I understood wrong when you said that my ranges where
contiguous.

I could have ranges like this:

1 TO 15
5 TO 30
50 TO 60

And so on... I'm not sure that what you supposed would work, right?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782202.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov


No; contiguous means there are no gaps between them.

You need something like what you described initially.

Another approach is to de-normalize your data so that you have a single 
document for every range.  But this might or might not suit your 
application.  You haven't said anything about the context in which this 
is to be used.


-Mike

On 02/27/2012 04:43 PM, federico.wachs wrote:

Oh No, I think I understood wrong when you said that my ranges where
contiguous.

I could have ranges like this:

1 TO 15
5 TO 30
50 TO 60

And so on... I'm not sure that what you supposed would work, right?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782202.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Combining ShingleFilter and DisMaxParser, with a twist

2012-02-27 Thread Alexey Verkhovsky

Actually, "use raw parser unless query has dismax syntax" approach doesn't
fit, because it kills a lot of useful dismax-related functionality,
described here: http://wiki.apache.org/solr/DisMaxQParserPlugin#Parameters.

However, there is a little cleaner solution than what I originally had in
mind: "explicitly turn the query into a phrase query (by double-quoting
it), unless the query contains dismax syntax".

-- 
Alex Verkhovsky

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs

This is used on an apartment booking system, and what I store as solr
documents can be seen as apartments. These apartments can be booked for a
certain amount of days with a check in and a check out date hence the ranges
I was speaking of before.

What I want to do is to filter off the apartments that are booked so my
users won't have a bad user experience while trying to book an apartment
that suits their needs.

Did I make any sense? Please let me know, otherwise I can explain
furthermore.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782304.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov

Yes, I see - I think your best bet is to index every day as a distinct 
value.  Don't worry about having 100's of values.


-Mike

On 02/27/2012 05:11 PM, federico.wachs wrote:

This is used on an apartment booking system, and what I store as solr
documents can be seen as apartments. These apartments can be booked for a
certain amount of days with a check in and a check out date hence the ranges
I was speaking of before.

What I want to do is to filter off the apartments that are booked so my
users won't have a bad user experience while trying to book an apartment
that suits their needs.

Did I make any sense? Please let me know, otherwise I can explain
furthermore.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782304.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs

Yeah that's what I'm doing right now.
But whenever I try to index an apartment that has many wide ranges, my
master solr server throws OutOfMemoryError ( I have set max heap to 1024m).
So I thought this could be a good workaround but puf it is a lot harder than
it seems!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller

Hmmm...all of that looks pretty normal...

Did a commit somehow fail on the other machine? When you view the stats for the 
update handler, are there a lot of pending adds for on of the nodes? Do the 
commit counts match across nodes?

You can also query an individual node with distrib=false to check that.

If you build is a month old, I'd honestly recommend you try upgrading as well.

- Mark

On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:

> Here is most of the cluster state:
> 
> Connected to Zookeeper
> localhost:2181, localhost: 2182, localhost:2183
> 
> /(v=0 children=7) ""
>   /CONFIGS(v=0, children=1)
>  /CONFIGURATION(v=0 children=25)
> < all the configuration files, velocity info, xslt, etc.
> 
>  /NODE_STATES(v=0 children=4)
> MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
> "state":"active","core":"","collection":"collection1","node_name:"..."
> MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
> "state":"active","core":"","collection":"collection1","node_name:"..."
> MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
> "state":"active","core":"","collection":"collection1","node_name:"..."
> MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
> "state":"active","core":"","collection":"collection1","node_name:"..."
>  /ZOOKEEPER (v-0 children=1)
> QUOTA(v=0)
> 
> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
>  /LIVE_NODES (v=0 children=4)
> MACHINE1:8083_SOLR(ephemeral v=0)
> MACHINE1:8082_SOLR(ephemeral v=0)
> MACHINE1:8081_SOLR(ephemeral v=0)
> MACHINE1:8084_SOLR(ephemeral v=0)
>  /COLLECTIONS (v=1 children=1)
> COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
> LEADER_ELECT(v=0 children=2)
> SHARD1(V=0 children=1)
> ELECTION(v=0 children=2)
> 
> 87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
> 
> 87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
> SHARD2(v=0 children=1)
> ELECTION(v=0 children=2)
> 
> 231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
> 
> 159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
> LEADERS (v=0 children=2)
> SHARD1 (ephemeral
> v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
> http://MACHINE1:8081/solr"}";
> SHARD2 (ephemeral
> v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
> http://MACHINE1:8082/solr"}";
>  /OVERSEER_ELECT (v=0 children=2)
> ELECTION (v=0 children=4)
> 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral v=0)
> 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral v=0)
> 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral v=0)
> 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral v=0)
> LEADER (emphemeral
> v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_00248"}"
> 
> 
> 
> On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller  wrote:
> 
>> 
>> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
>> 
>>> Thanks for your reply Mark.
>>> 
>>> I believe the build was towards the begining of the month. The
>>> solr.spec.version is 4.0.0.2012.01.10.38.09
>>> 
>>> I cannot access the clusterstate.json contents. I clicked on it a couple
>> of
>>> times, but nothing happens. Is that stored on disk somewhere?
>> 
>> Are you using the new admin UI? That has recently been updated to work
>> better with cloud - it had some troubles not too long ago. If you are, you
>> should trying using the old admin UI's zookeeper page - that should show
>> the cluster state.
>> 
>> That being said, there has been a lot of bug fixes over the past month -
>> so you may just want to update to a recent version.
>> 
>>> 
>>> I configured a custom request handler to calculate an unique document id
>>> based on the file's url.
>>> 
>>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller 
>> wrote:
>>> 
 Hey Matt - is your build recent?
 
 Can you visit the cloud/zookeeper page in the admin and send the
>> contents
 of the clusterstate.json node?
 
 Are you using a custom index chain or anything out of the ordinary?
 
 
 - Mark
 
 On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
 
> TWIMC:
> 
> Environment
> =
> Apache SOLR rev-1236154
> Apache Zookeeper 3.3.4
> Windows 7
> JDK 1.6.0_23.b05
> 
> I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
> servers.
> 
> I created a 3 node zookeeper ensemble to manage the solr configuration
 data.
> 
> All the instances run on one server so I've had to move ports around
>> for
> the various applications.
> 
> I start the 3 zookeeper nodes.
> 
> I started the first instance of solr cloud with the parameter to have
>> two
> shards.
> 
> The start the remaining 3 solr nodes.
>

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread Mike Sokolov

I don't know if this would help with OOM conditions, but are you using a 
tint type field for this?  That should be more efficient to search than 
a regular int or string.


-Mike

On 02/27/2012 05:27 PM, federico.wachs wrote:

Yeah that's what I'm doing right now.
But whenever I try to index an apartment that has many wide ranges, my
master solr server throws OutOfMemoryError ( I have set max heap to 1024m).
So I thought this could be a good workaround but puf it is a lot harder than
it seems!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782347.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to implement a IntRangeField in Solr?

2012-02-27 Thread federico.wachs

Exactly, I'm using a tint field type and works really well. The only problem
is when I have a set of very wide ranges and make Solr make fireworks out of
the blue.

Thank you a lot Michael, I appreciate your help on this one :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-implement-a-IntRangeField-in-Solr-tp3782083p3782415.html
Sent from the Solr - User mailing list archive at Nabble.com.

Modify Standalone solr server to use it application without http request

2012-02-27 Thread Neel

Hi,

We are already using embedded solr in our application. In production we have
3 app servers and each app server has a copy of index of each type. These
indexes built externally once in a week and replaced. 

We now want allow incremental indexing and auto update to other servers
rather than building indexes externally and replacing. 

I see there are few old posts
{http://wiki.apache.org/solr/SolrCollectionDistributionScripts,
http://wiki.apache.org/solr/CollectionDistribution}that says about
distribution using scripts and rsync. For me this solution looks difficult
follow on.

I see standalone solrs provides replication
(http://wiki.apache.org/solr/SolrReplication)  using http requests. I like
this approach, but worried about additional http request from application to
standalone solr server.

I now want to enhance standalone solr server to use it in directely in my
application by removing the ui stuff etc and also work replication
automatically. Can you please provide guidence how this can be done.

Thanks in advance,
Neel

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Modify-Standalone-solr-server-to-use-it-application-without-http-request-tp3781826p3781826.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread bing

HI, Erick, 

I can write SolrJ client to call Tika, but I am not certain where to invoke
the client. In my case, I work on Dspace to call Solr, and I suppose the
client should be invoked in-between Dspace and Solr. That is, Dspace invokes
SolrJ client when doing index/query,  which call Tika and Solr. Do you think
it is reasonable? 

Best Regards, 
Bing 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782793.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Matthew Parker

I'll have to check on the commit situation. We have been pushing data from
SharePoint the last week or so. Would that somehow block the documents
moving between the solr instances?

I'll try another version tomorrow. Thanks for the suggestions.

On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller  wrote:

> Hmmm...all of that looks pretty normal...
>
> Did a commit somehow fail on the other machine? When you view the stats
> for the update handler, are there a lot of pending adds for on of the
> nodes? Do the commit counts match across nodes?
>
> You can also query an individual node with distrib=false to check that.
>
> If you build is a month old, I'd honestly recommend you try upgrading as
> well.
>
> - Mark
>
> On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
>
> > Here is most of the cluster state:
> >
> > Connected to Zookeeper
> > localhost:2181, localhost: 2182, localhost:2183
> >
> > /(v=0 children=7) ""
> >   /CONFIGS(v=0, children=1)
> >  /CONFIGURATION(v=0 children=25)
> > < all the configuration files, velocity info, xslt, etc.
> >
> >  /NODE_STATES(v=0 children=4)
> > MACHINE1:8083_SOLR (v=121)"[{"shard_id":"shard1",
> > "state":"active","core":"","collection":"collection1","node_name:"..."
> > MACHINE1:8082_SOLR (v=101)"[{"shard_id":"shard2",
> > "state":"active","core":"","collection":"collection1","node_name:"..."
> > MACHINE1:8081_SOLR (v=92)"[{"shard_id":"shard1",
> > "state":"active","core":"","collection":"collection1","node_name:"..."
> > MACHINE1:8084_SOLR (v=73)"[{"shard_id":"shard2",
> > "state":"active","core":"","collection":"collection1","node_name:"..."
> >  /ZOOKEEPER (v-0 children=1)
> > QUOTA(v=0)
> >
> >
> /CLUSTERSTATE.JSON(V=272)"{"collection1":{"shard1":{MACHINE1:8081_solr_":{shard_id":"shard1","leader":"true","..."
> >  /LIVE_NODES (v=0 children=4)
> > MACHINE1:8083_SOLR(ephemeral v=0)
> > MACHINE1:8082_SOLR(ephemeral v=0)
> > MACHINE1:8081_SOLR(ephemeral v=0)
> > MACHINE1:8084_SOLR(ephemeral v=0)
> >  /COLLECTIONS (v=1 children=1)
> > COLLECTION1(v=0 children=2)"{"configName":"configuration1"}"
> > LEADER_ELECT(v=0 children=2)
> > SHARD1(V=0 children=1)
> > ELECTION(v=0 children=2)
> >
> > 87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
> >
> > 87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
> > SHARD2(v=0 children=1)
> > ELECTION(v=0 children=2)
> >
> > 231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
> >
> > 159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
> > LEADERS (v=0 children=2)
> > SHARD1 (ephemeral
> > v=0)"{"core":"","node_name":"MACHINE1:8081_solr","base_url":"
> > http://MACHINE1:8081/solr"}";
> > SHARD2 (ephemeral
> > v=0)"{"core":"","node_name":"MACHINE1:8082_solr","base_url":"
> > http://MACHINE1:8082/solr"}";
> >  /OVERSEER_ELECT (v=0 children=2)
> > ELECTION (v=0 children=4)
> > 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
> v=0)
> > 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral v=0)
> > 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
> v=0)
> > 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral v=0)
> > LEADER (emphemeral
> > v=0)"{"id":"87186203314552835-MACHINE1:8081_solr-n_00248"}"
> >
> >
> >
> > On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller 
> wrote:
> >
> >>
> >> On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
> >>
> >>> Thanks for your reply Mark.
> >>>
> >>> I believe the build was towards the begining of the month. The
> >>> solr.spec.version is 4.0.0.2012.01.10.38.09
> >>>
> >>> I cannot access the clusterstate.json contents. I clicked on it a
> couple
> >> of
> >>> times, but nothing happens. Is that stored on disk somewhere?
> >>
> >> Are you using the new admin UI? That has recently been updated to work
> >> better with cloud - it had some troubles not too long ago. If you are,
> you
> >> should trying using the old admin UI's zookeeper page - that should show
> >> the cluster state.
> >>
> >> That being said, there has been a lot of bug fixes over the past month -
> >> so you may just want to update to a recent version.
> >>
> >>>
> >>> I configured a custom request handler to calculate an unique document
> id
> >>> based on the file's url.
> >>>
> >>> On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller 
> >> wrote:
> >>>
>  Hey Matt - is your build recent?
> 
>  Can you visit the cloud/zookeeper page in the admin and send the
> >> contents
>  of the clusterstate.json node?
> 
>  Are you using a custom index chain or anything out of the ordinary?
> 
> 
>  - Mark
> 
>  On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
> 
> > TWIMC:
> >
> > Environment
> > =
> > Apache SOLR rev-1236154
> > Apache Zookeeper 3.3.4
> > Windows 7
> > JDK

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread Erick Erickson

It runs any place that has access to the raw files and an HTTP connection
to the Solr server, which is another way of saying "sounds good to me".

Erick

On Mon, Feb 27, 2012 at 9:18 PM, bing  wrote:
> HI, Erick,
>
> I can write SolrJ client to call Tika, but I am not certain where to invoke
> the client. In my case, I work on Dspace to call Solr, and I suppose the
> client should be invoked in-between Dspace and Solr. That is, Dspace invokes
> SolrJ client when doing index/query,  which call Tika and Solr. Do you think
> it is reasonable?
>
> Best Regards,
> Bing
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782793.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: TIKA Errors Importing MS Word Documents into SOLR Cloud

2012-02-27 Thread Erick Erickson

You *probaby* can update the Tika libraries in Solr, but it'll be "interesting"
to get all the right ones updated, there are a bunch of them in Tika. And I
make no guarantees.

If it proves difficult, it's not too hard to write a SolrJ program that does
the Tika extraction and run it on a client totally separated from the Solr
server.

Best
Erick

On Sun, Feb 26, 2012 at 7:33 PM, Matthew Parker
 wrote:
> I tried to import some documents into SOLR Cloud using Apache Manifold.
>
> TIKA started throwing exceptions for various documents
>
> The exception reads like the following:
>
> org.apache.solr.common.SolrException
> at org.apache.solr.handler.extraction.ExtractionDocumentLoader.load(
> ExtractingDocumentLoader.java: 213)
> ..
>
> Caused by:  org.apache.tika.exception.TikaException:
> UnexpectedRuntimeException from
> org.apche.tika.parser.microsoft.OfficeParser@d394424
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> ...
> Caused by: java.lang.ArrayIndexOutOfBoundsException
> at java.lang.System.arraycopy(NativeMethod)
> at
> org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:363)
>
> It seems to be related to the following fix now in Tika 1.1
>
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51902
>
> Can the Tika libraries in the SOLR trunk be updated?
>
> --
> This e-mail and any files transmitted with it may be proprietary.  Please 
> note that any views or opinions presented in this e-mail are solely those of 
> the author and do not necessarily represent those of Apogee Integration.

Re: How to Index Custom XML structure

2012-02-27 Thread Erick Erickson

You might be able to do something with the XSL Transformer step in DIH.

It might also be easier to just write a SolrJ program to parse the XML and
construct a SolrInputDocument to send to Solr. It's really pretty
straightforward.

Best
Erick

On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya
 wrote:
> Hi,
>
> I am using ManifoldCF to Crawl data from Documentum repository. I am able
> to successfully read the metadata/properties for the defined document types
> in Documentum using the out-of-the box Documentum Connector in ManifoldCF.
> Unfortunately, there is one XML file also present which consists of a
> custom XML structure which I need to read and fetch the element values and
> add it for indexing in lucene through SOLR.
>
> Is there any mechanism to index any XML structure document in SOLR ?
>
> I checked the SOLR CELL framework which support below stucture..
>
> 
>  
>    9885A004
>    Canon PowerShot SD500
>    camera
>    3x optical zoom
>    aluminum case
>    6.4
>    329.95
>  
>  
>    9885A003
>    Canon PowerShot SD504
>    camera1
>    3x optical zoom1
>    aluminum case1
>    6.41
>    329.956
>  
> 
>
> & my Custom XML structure is of the following format.. from which I need to
> read *subject *& *abstract *field for indexing. I checked TIKA project but
> I couldn't find any useful stuff.
>
> 
> 
> 1
> This is an abstract.
> Text Subject
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> Appreciate any help on this.
>
> Regards
> Anupam

Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-27 Thread Erick Erickson

As I understand it (and I'm just getting into SolrCloud myself), you can
essentially forget about master/slave stuff. If you're using NRT,
the soft commit will make the docs visible, you don't ned to do a hard
commit (unlike the master/slave days). Essentially, the update is sent
to each shard leader and then fanned out into the replicas for that
leader. All automatically. Leaders are elected automatically. ZooKeeper
is used to keep the cluster information.

Additionally, SolrCloud keeps a transaction log of the updates, and replays
them if the indexing is interrupted, so you don't risk data loss the way
you used to.

There aren't really masters/slaves in the old sense any more, so
you have to get out of that thought-mode (it's hard, I know).

The code is under pretty active development, so any feedback is
valuable

Best
Erick

On Mon, Feb 27, 2012 at 3:26 AM, roz dev  wrote:
> Hi All,
>
> I am trying to understand features of Solr Cloud, regarding commits and
> scaling.
>
>
>   - If I am using Solr Cloud then do I need to explicitly call commit
>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>   writing to disk?
>
>
>   - Do We still need to use  Master/Slave setup to scale searching? If we
>   have to use Master/Slave setup then do i need to issue hard-commit to make
>   my changes visible to slaves?
>   - If I were to use NRT with Master/Slave setup with soft commit then
>   will the slave be able to see changes made on master with soft commit?
>
> Any inputs are welcome.
>
> Thanks
>
> -Saroj

Re: TikaLanguageIdentifierUpdateProcessorFactory(since Solr3.5.0) to be used in Solr3.3.0?

2012-02-27 Thread bing

Hi, Erick, 

I get your point. Thank you so much. 

Best Regards, 
Bing

--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaLanguageIdentifierUpdateProcessorFactory-since-Solr3-5-0-to-be-used-in-Solr3-3-0-tp3771620p3782938.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply:Re: Does solrj support compound type for field?

2012-02-27 Thread SuoNayi

Thanks Mikhail,what I mean is that when I index an instance of my POJO
which has a property of List type with Field annotation and it's element is a 
complex type while not primitive type,
such as my own Contact class, can solr index this instance successfully?
If successful how can I retrieve via the complex type,here is Contact?
Sorry for my poor english.
 
SuoNayi 
 

At 2012-02-28 02:04:06,"Mikhail Khludnev"  wrote: 
>Hello, > >From what are you saying I can conclude you need something like 
>http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html 
>News are not really great for you, work in progress 
>https://issues.apache.org/jira/browse/SOLR-3076 > >I've heard that 
ElasticSearch has some sort of support of parent-child >search. > >Regards > 
>2012/2/27 SuoNayi  > >> Hi all, I'm new to solr and just 
know that solrj  generates index for >> instance of POJO >> >> by transform 
instance of pojo into an instance of type of >> SolrInputDocument with 
DocumentObjectBinder. >> >> Supposing my POJO has  a property of List type and 
its element compound >> types >> >> which are my customized class, Contact 
class for example. >> >> In my case does solr support index my List whose 
element is my own >> customized class? >> >> Thanks. >> >> >> >> SuoNayi > > > 
> >--  >Sincerely yours >Mi
 khail Khludnev >Lucid Certified >Apache Lucene/Solr Developer >Grid Dynamics > 
> >

Delta-Import adding duplicate entry.

2012-02-27 Thread Suneel

Hello Friend,

I am working on delta-import I have configured according to given article
"http://wiki.apache.org/solr/DataImportHandler#head-9ee74e0ad772fd57f6419033fb0af9828222e041";.

but every time when i am executing delta-import through DIH it picked only
changed data that is ok, but rather then updating its adding duplicate
records. 

I have defined.  "deltaImportQuery", "deltaQuery"  in db-data-config.xml

I unable to find out the reason. please help me and provide me some example.


Thanks & Regards

-
Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-Import-adding-duplicate-entry-tp3783114p3783114.html
Sent from the Solr - User mailing list archive at Nabble.com.

60 matches

Mail list logo