date:20120418

Re: HTML Indexing error

2012-04-18 Thread Gora Mohanty

On 18 April 2012 00:41, Chambeda  wrote:
> Hi All,
>
> I am trying to parse some text that contains embedded HTML elements and am
> getting the following error:
[...]
> According to the documentation the  should be removed correctly.
>
> Anything I am missing?

How are you indexing the XML documents? Using DIH? If so, please
show us the DIH configuration file.

Regards,
Gora

Re: searching and text highlighting

2012-04-18 Thread darul


rpc29y wrote
> 
> Good afternoon:
>  I would like to know if it can be indexed with SolR word documents or
> pdf.
> 
Yes, you may first look at Tika Solr processor.

rpc29y wrote
> 
>  If so how do I modify the solrconfig.xml to search these documents and
> highlight the found text?
> 
I guess you should first follow solr tutorial to know more about it, how
query parser work, how to define your schema and then you may use highlight
in right way.

http://wiki.apache.org/solr/HighlightingParameters




--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-and-text-highlighting-tp3917856p3919546.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Core not able access latest data indexed by multiple server.

2012-04-18 Thread Paresh Modi

Hi,

I am using Solr multicore approach in my app. we have two different servers
(ServerA1 and ServerA2) for load balancing, both the server accessing the
same index repository and request will go to any server as per load balance
algorithm. 

Problem occurs in following way [Note that both the servers accessing the
same physical location(index)].

- ADD TO INDEX request for File1 go to ServerA1 for core CR1, core CR1
loaded in ServerA1 and indexing done.
- ADD TO INDEX request for File2 go to ServerA2 for core CR1, core CR1
loaded in ServerA2 and indexing done.
- SEARCH request for File2 go to ServerA1, now here core CR1 is already
loaded so it directly access the index but File2 added by ServerA2 is not
found in core loaded by ServerA1.

So this is the problem, File2 indexed by core CR1 loaded in ServerA2 is not
available in core CR1 loaded by ServerA1.


I have searched and found that the solution to this problem is reload the
CORE. when you reload the core, it will have latest indexed data. but
reloading the Core for every request is very heavy and time consuming
process.

Please let me know if anyone has any solution for this.


Waiting for your expert advice.


Thanks
Paresh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Core-not-able-access-latest-data-indexed-by-multiple-server-tp3919113p3919113.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: need help to integrate SolrJ with my web application.

2012-04-18 Thread Marcelo Carvalho Fernandes

Hi Vijaya,

Why not just making standard http calls to Solr as it was a RESTful Service?
Just use a HTTP/REST Client on Spring, ask solr to return Json responses
and get rid of all those war dependencies of SolrJ

---
Marcelo

On Monday, April 16, 2012, Ben McCarthy 
wrote:
> Hello,
>
> When I have seen this it usually means the SOLR you are trying to connect
to is not available.
>
> Do you have it installed on:
>
> http://localhost:8080/solr
>
> Try opening that address in your browser.  If your running the example
solr using the embedded Jetty you wont be on 8080 :D
>
> Hope that helps
>
> -Original Message-
> From: Vijaya Kumar Tadavarthy [mailto:vijaya.tadavar...@ness.com]
> Sent: 16 April 2012 12:15
> To: 'solr-user@lucene.apache.org'
> Subject: need help to integrate SolrJ with my web application.
>
> Hi All,
>
> I am trying to integrate solr with my Spring application.
>
> I have performed following steps:
>
> 1) Added below list of jars to my webapp lib folder.
> apache-solr-cell-3.5.0.jar
> apache-solr-core-3.5.0.jar
> apache-solr-solrj-3.5.0.jar
> commons-codec-1.5.jar
> commons-httpclient-3.1.jar
> lucene-analyzers-3.5.0.jar
> lucene-core-3.5.0.jar
> 2) I have added Tika jar files for processing binary files.
> tika-core-0.10.jar
> tika-parsers-0.10.jar
> pdfbox-1.6.0.jar
> poi-3.8-beta4.jar
> poi-ooxml-3.8-beta4.jar
> poi-ooxml-schemas-3.8-beta4.jar
> poi-scratchpad-3.8-beta4.jar
> 3) I have modified web.xml added below setup.
> 
>SolrRequestFilter
>
 org.apache.solr.servlet.SolrDispatchFilter
>
>
>
>SolrRequestFilter
>/dataimport
>
>
>SolrServer
>
 org.apache.solr.servlet.SolrServlet
>1
>
>
>SolrUpdate
>
 org.apache.solr.servlet.SolrUpdateServlet
>2
>
>
>Logging
>
 org.apache.solr.servlet.LogLevelSelection
>
>
>SolrUpdate
>/update/*
>
>
>Logging
>/admin/logging
>
>
> I am trying to test this setup by running a simple java program with
extract content of MS Excel file as below
>
> public SolrServer createNewSolrServer()
>{
>  try {
>// setup the server...
>String url = "http://localhost:8080/solr";;
>CommonsHttpSolrServer s = new CommonsHttpSolrServer( url );
>s.setConnectionTimeout(100); // 1/10th sec
>s.setDefaultMaxConnectionsPerHost(100);
>s.setMaxTotalConnections(100);
>
>// where the magic happens
>s.setParser(new BinaryResponseParser());
>s.setRequestWrit
>
>
> This e-mail is sent on behalf of Trader Media Group Limited, Registered
Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower
Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
This email and any files transmitted with it are confidential and may be
legally privileged, and intended solely for the use of the individual or
entity to whom they are addressed. If you have received this email in error
please notify the sender. This email message has been swept for the
presence of computer viruses.
>
>

-- 

Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786

DIH + JNDI

2012-04-18 Thread Stephen Lacy

Hi All,

I'm new to solr and I don't have much experience in java.
I'm trying to setup two environments with configuration files that mirror
each other
so that it's easy to copy files across after changes have been made.
The problem is that they both access different sql servers. So I want to
separate
the data source from the data-import.xml.

I'm trying to do that with JNDI following this doc
http://tomcat.apache.org/tomcat-6.0-doc/jndi-datasource-examples-howto.html

I put the datasource as a resource in
my /etc/tomcat6/Catalina/localhost/solr.xml (Context)



and the resource ref in /var/lib/tomcat6/webapps/solr/WEB-INF/web.xml

 
  DB Connection
  jdbc/DATABASENAME
  JdbcDataSource
  Container
  


Then I changed the data-config.xml to



I restart the server and try to do a delta import and I get the following:

SEVERE: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: select 1 as report_id Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:84)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextModifiedRowKey(EntityProcessorWrapper.java:262)
at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:893)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:285)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:179)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:390)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:429)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: javax.naming.NamingException: Cannot create resource instance
at
org.apache.naming.factory.ResourceFactory.getObjectInstance(ResourceFactory.java:143)
at
javax.naming.spi.NamingManager.getObjectInstance(NamingManager.java:321)
at org.apache.naming.NamingContext.lookup(NamingContext.java:793)
at org.apache.naming.NamingContext.lookup(NamingContext.java:140)
at org.apache.naming.NamingContext.lookup(NamingContext.java:781)
at org.apache.naming.NamingContext.lookup(NamingContext.java:140)
at org.apache.naming.NamingContext.lookup(NamingContext.java:781)
at org.apache.naming.NamingContext.lookup(NamingContext.java:140)
at org.apache.naming.NamingContext.lookup(NamingContext.java:781)
at org.apache.naming.NamingContext.lookup(NamingContext.java:153)
at
org.apache.naming.SelectorContext.lookup(SelectorContext.java:152)
at javax.naming.InitialContext.lookup(InitialContext.java:409)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:140)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
at
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:240)
... 11 more

I've tried a couple of different alterations, I've only really succeeded in
changing the error I get.
Anyone know how fix this issue? I'm kind of lost here.

Stephen

property substitution not working with multicore

2012-04-18 Thread jmlucjav

Hi,

I cannot seem to get right the configuration of using a properties file for
cores (with 3.6.0). In Solr3 Entr. Search Server book they say this:
"This property substitution works in solr.xml ,  solrconfig.xml, 
schema.xml, and DIH configuration files."

So my solr.xml is like this:

  

  

core0.properties is in multicore/core0 (I tried with an absolute path too
but does not work either)

And my properties file has:
config.datadir=c:\\tmp\\core0\\data
config.db-data.jdbcUrl=jdbc:mysql:localhost\\...
config.db-data.username=root
config.db-data.password=

None of those values are taken into account. I think I read in jira that dih
does not support properties, but as they say in the book it does I just
tried. The path to data dir should work right? But not even taht one, I
always get the index in ./tmp/solr_data

any hints?
xab

--
View this message in context: 
http://lucene.472066.n3.nabble.com/property-substitution-not-working-with-multicore-tp3919696p3919696.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Populating a filter cache by means other than a query

2012-04-18 Thread Erick Erickson

I guess my question is "what advantage are you trying
to get here?"

At the start, this feels like an "XY" problem. How are
you intending to use the fq after you've built it? Because
if there's any way to just create an "fq" clause, Solr
will take care of it for you. Caching it, autowarming
it when searchers are re-opened, etc. Otherwise, you're
going to be re-inventing a bunch of stuff it seems to me,
you'll have to intercept the queries coming in in order
to apply the filter from the cache, etc.

Which also may be another way of asking "How big
is this set of document IDs?" If it's in the 100s, I'd
just go with an fq. If it's more than that, I'd index
some kind of set identifier that you could create for
your fqs.

And if this is gibberish, ignore me ..

Best
Erick

On Tue, Apr 17, 2012 at 4:34 PM, Chris Collins  wrote:
> Hi, I am a long time Lucene user but new to solr.  I would like to use 
> something like the filterCache but build a such a cache not from a query but 
> custom code.  I guess I will ask my question by using techniques and vocab I 
> am familiar with.  Not sure its actually the right way so I appologize if its 
> just the wrong approach.
>
> The scenario is that I would like to filter a result set by a set of labeled 
> documents, I will call that set L.
> L contains app specific document IDs that are indexed as literals in the 
> lucenefield "myid".
> I would imagine I could build a OpenBitSet from enumerating the termdocs and 
> look for the intersecting ids in my label set.
> Now I have my bitset that I assume I could use in a filter.
>
> Another approach would be to implement a hits collector, compute a fieldcache 
> from that myid field and look for the intersection in a hashtable of L at 
> scoring time, throwing out results that are not contained in the hashtable.
>
> Of course I am working within the confines / concepts that SOLR has layed 
> out.  Without going completely off the reservation is their a neat way of 
> doing such a thing with SOLR?
>
> Glad to clarify if my question makes absolutely no sense.
>
> Best
>
> C

Re: How sorlcloud distribute data among shards of the same cluster?

2012-04-18 Thread Erick Erickson

Try looking at DistributedUpdateProcessor, there's
a "hash(cmd)" method in there.

Best
Erick

On Tue, Apr 17, 2012 at 4:45 PM, emma1023  wrote:
> Thanks for your reply. In sorl 3.x, we need to manually hash the doc Id to
> the server.How does solrcloud do this instead? I am working on a project
> using solrcloud.But we need to monitor how the solrcloud distribute the
> data. I cannot find which part of the code it is from source code.Is it
> from the cloud part? Thanks.
>
>
> On Tue, Apr 17, 2012 at 3:16 PM, Mark Miller-3 [via Lucene] <
> ml-node+s472066n3918192...@n3.nabble.com> wrote:
>
>>
>> On Apr 17, 2012, at 9:56 AM, emma1023 wrote:
>>
>> It hashes the id. The doc distribution is fairly even - but sizes may be
>> fairly different.
>>
>> > How solrcloud manage distribute data among shards of the same cluster
>> when
>> > you query? Is it distribute the data equally? What is the basis? Which
>> part
>> > of the code that I can find about it?Thank you so much!
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/How-sorlcloud-distribute-data-among-shards-of-the-same-cluster-tp3917323p3917323.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/How-sorlcloud-distribute-data-among-shards-of-the-same-cluster-tp3917323p3918192.html
>>  To unsubscribe from How sorlcloud distribute data among shards of the
>> same cluster?, click 
>> here
>> .
>> NAML
>>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-sorlcloud-distribute-data-among-shards-of-the-same-cluster-tp3917323p3918348.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr hanging

2012-04-18 Thread Trym R. Møller


Hi

I am using Solr trunk and have 7 Solr instances running with 28 leaders 
and 28 replicas for a single collection.
After indexing a while (a couple of days) the solrs start hanging and 
doing a thread dump on the jvm I see blocked threads like the following:

Thread 2369: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; 
information may be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) 
@bci=14, line=158 (Compiled frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() 
@bci=42, line=1987 (Compiled frame)
 - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, 
line=399 (Compiled frame)
 - java.util.concurrent.ExecutorCompletionService.take() @bci=4, 
line=164 (Compiled frame)
 - 
org.apache.solr.update.SolrCmdDistributor.checkResponses(boolean) 
@bci=27, line=350 (Compiled frame)
 - org.apache.solr.update.SolrCmdDistributor.finish() @bci=18, 
line=98 (Compiled frame)
 - 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish() 
@bci=4, line=299 (Compiled frame)
 - 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish() 
@bci=1, line=817 (Compiled frame)

...
 - org.mortbay.thread.QueuedThreadPool$PoolThread.run() @bci=25, 
line=582 (Interpreted frame)


I read the stack trace as my indexing client has indexed a document and 
this Solr is now waiting for the replica? to respond before returning an 
answer to the client.


The other Solrs have similar blocked threads.

Any ideas of how I can get closer to the problem? Am I reading the stack 
trace correctly? Any further information that are relevant for 
commenting this problem?


Thanks for any comments.

Best regards Trym

Re: SOLR 4 / Date Query: Spurious Results: Is it me or ... ?

2012-04-18 Thread Erick Erickson

Your schema didn't come through, but...

1> why terms=-1 I don't know. I have a build from this
 morning and it's fine. When's yours?
2> date .vs. tdate. Yes, that's kind of confusing, but
 the Trie types inject some extra stuff in the field
 that allows the faster range queries, I think of it
 as "navigation data". These get displayed as
 1970 dates (e.g. the epoch). Ignore them.
3> I don't quite understand here. If you're still talking about
 a tdate field, could the "navigation data" account
 for it? That data shouldn't belong to any document and
 isn't really putting multi-values in any doc. Changing the
 schema type to not be multivalued should show this is the
 case if so.

Best
Erick

On Tue, Apr 17, 2012 at 7:18 PM, vybe3142  wrote:
> I wrote a custom handler that uses externally injected metadata (bypassing
> Tika et all)
>
> WRT Dates, I see them associated with the correct docs when retrieving all
> docs:
>
> BUT:
>
> looking at the schema analyzer, things look wierd:
> 1. Top terms = -1
> 2. The Dates are all mixed up with some spurious 1970 dates thrown in (I can
> get rid of the 1970 dates if i use type "date" vs "tdate")
> 3. Multi Valued values (should only be one per doc, as per input data, even
> though the schema allows it).
>
> Any ideas what, if anything, I'm doing wrong?
>
> See pic http://lucene.472066.n3.nabble.com/file/n3918636/Capture.jpg
>
> Here's my SOLR schema:
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SOLR-4-Date-Query-Spurious-Results-Is-it-me-or-tp3918636p3918636.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Solr file size limit?

2012-04-18 Thread Bram Rongen

Dear fellow Solr users,

I've been using Solr for a very short time now and I'm stuck. I'm trying to
index a drupal website consisting of 1.2 million smaller nodes and 300k
larger nodes (~400kb avg)..

I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace
and 16GB of memory. I've tried using the sun JRE and OpenJDK, both
resulting in the same problem. Indexing works great until my .fdt file
reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts
merging it just keeps on merging, starting over and over.. Java is using
all the available memory even though Xmx is set at 8G. When I restart Solr
everything looks fine until merging is triggered. Whenever it hangs the
server load averages 3, searching is possible but slow, the solr admin
interface is reachable but sending new documents leads to a time-out.

I've tried using several different settings for MergePolicy and started
reindexing a couple of times but the behavior stays the same. My current
solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to
find errors in the log which makes it really difficult to debug.. Could
anyone point me in the right direction?

I've already asked my question on stackoverflow without receiving a
solution:
http://stackoverflow.com/questions/9993633/apache-solr-3-5-hangs-when-indexing.
Maybe it can provide you with some more information.

Kind regards!
Bram Rongen

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen

I'm curious how on the fly updates are handled as a new shard is added
to an alias.  Eg, how does the system know to which shard to send an
update?

On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček  wrote:
> Hi,
>
> speaking about ES I think it would be fair to mention that one has to
> specify number of shards upfront when the index is created - that is
> correct, however, it is possible to give index one or more aliases which
> basically means that you can add new indices on the fly and give them same
> alias which is then used to search against. Given that you can add/remove
> indices, nodes and aliases on the fly I think there is a way how to handle
> growing data set with ease. If anyone is interested such scenario has been
> discussed in detail in ES mail list.
>
> Regards,
> Lukas
>
> On Tue, Apr 17, 2012 at 2:42 AM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> One of big weaknesses of Solr Cloud (and ES?) is the lack of the
>> ability to redistribute shards across servers.  Meaning, as a single
>> shard grows too large, splitting the shard, while live updates.
>>
>> How do you plan on elastically adding more servers without this feature?
>>
>> Cassandra and HBase handle elasticity in their own ways.  Cassandra
>> has successfully implemented the Dynamo model and HBase uses the
>> traditional BigTable 'split'.  Both systems are complex though are at
>> a singular level of maturity.
>>
>> Also Cassandra [successfully] implements multiple data center support,
>> is that available in SC or ES?
>>
>> On Thu, Apr 12, 2012 at 7:23 PM, Otis Gospodnetic
>>  wrote:
>> > Hello Ali,
>> >
>> >> I'm trying to setup a large scale *Crawl + Index + Search
>> *infrastructure
>> >
>> >> using Nutch and Solr/Lucene. The targeted scale is *5 Billion web
>> pages*,
>> >> crawled + indexed every *4 weeks, *with a search latency of less than
>> 0.5
>> >> seconds.
>> >
>> >
>> > That's fine.  Whether it's doable with any tech will depend on how much
>> hardware you give it, among other things.
>> >
>> >> Needless to mention, the search index needs to scale to 5Billion pages.
>> It
>> >> is also possible that I might need to store multiple indexes -- one for
>> >> crawled content, and one for ancillary data that is also very large.
>> Each
>> >> of these indices would likely require a logically distributed and
>> >> replicated index.
>> >
>> >
>> > Yup, OK.
>> >
>> >> However, I would like for such a system to be homogenous with the Hadoop
>> >> infrastructure that is already installed on the cluster (for the
>> crawl). In
>> >> other words, I would much prefer if the replication and distribution of
>> the
>> >> Solr/Lucene index be done automagically on top of Hadoop/HDFS, instead
>> of
>> >> using another scalability framework (such as SolrCloud). In addition, it
>> >> would be ideal if this environment was flexible enough to be dynamically
>> >> scaled based on the size requirements of the index and the search
>> traffic
>> >> at the time (i.e. if it is deployed on an Amazon cluster, it should be
>> easy
>> >> enough to automatically provision additional processing power into the
>> >> cluster without requiring server re-starts).
>> >
>> >
>> > There is no such thing just yet.
>> > There is no Search+Hadoop/HDFS in a box just yet.  There was an attempt
>> to automatically index HBase content, but that was either not completed or
>> not committed into HBase.
>> >
>> >> However, I'm not sure which Solr-based tool in the Hadoop ecosystem
>> would
>> >> be ideal for this scenario. I've heard mention of Solr-on-HBase,
>> Solandra,
>> >> Lily, ElasticSearch, IndexTank etc, but I'm really unsure which of
>> these is
>> >> mature enough and would be the right architectural choice to go along
>> with
>> >> a Nutch crawler setup, and to also satisfy the dynamic/auto-scaling
>> aspects
>> >> above.
>> >
>> >
>> > Here is a summary on all of them:
>> > * Search on HBase - I assume you are referring to the same thing I
>> mentioned above.  Not ready.
>> > * Solandra - uses Cassandra+Solr, plus DataStax now has a different
>> (commercial) offering that combines search and Cassandra.  Looks good.
>> > * Lily - data stored in HBase cluster gets indexed to a separate Solr
>> instance(s)  on the side.  Not really integrated the way you want it to be.
>> > * ElasticSearch - solid at this point, the most dynamic solution today,
>> can scale well (we are working on a mny-B documents index and hundreds
>> of nodes with ElasticSearch right now), etc.  But again, not integrated
>> with Hadoop the way you want it.
>> > * IndexTank - has some technical weaknesses, not integrated with Hadoop,
>> not sure about its future considering LinkedIn uses Zoie and Sensei already.
>> > * And there is SolrCloud, which is coming soon and will be solid, but is
>> again not integrated.
>> >
>> > If I were you and I had to pick today - I'd pick ElasticSearch if I were
>> completely open.  If I had Solr bias I'd give SolrCloud a try first.
>> >
>>

Re: Multiple document structure

2012-04-18 Thread Gora Mohanty

On 18 April 2012 10:05, abhijit bashetti  wrote:
>
> Hi ,
> Is it possible to have 2 document structures in solr?
[...]

Do not think so, but why do you need it? Use two separate
indices, either in a multi-core setup, or in separate Solr
instances.

Regards,
Gora

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Lukáš Vlček

AFAIK it can not. You can only add new shards by creating a new index and
you will then need to index new data into that new index. Index aliases are
useful mainly for searching part. So it means that you need to plan for
this when you implement your indexing logic. On the other hand the query
logic does not need to change as you only add new indices and give them all
the same alias.

I am not an expert on this but I think that index splitting and re-sharding
can be expensive for [near] real-time search system and the point is that
you can probably use different techniques to support your large scale
needs. Index aliasing and routing in elasticsearch can help a lot in
supporting various large scale data scenarios, check the following thread
in ES ML for some examples:
https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Just to sum it up, the fact that elasticsearch does have fixed number of
shards per index and does not support resharding and index splitting does
not mean you can not scale your data easily.

(I was not following this whole thread in every detail. So may be you may
have specific needs that can be solved only by splitting or resharding, in
such case I would recommend you to ask on ES ML with further questions, I
do not want to run into system X vs system Y flame here...)

Regards,
Lukas

On Wed, Apr 18, 2012 at 2:22 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

> I'm curious how on the fly updates are handled as a new shard is added
> to an alias.  Eg, how does the system know to which shard to send an
> update?
>
> On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček 
> wrote:
> > Hi,
> >
> > speaking about ES I think it would be fair to mention that one has to
> > specify number of shards upfront when the index is created - that is
> > correct, however, it is possible to give index one or more aliases which
> > basically means that you can add new indices on the fly and give them
> same
> > alias which is then used to search against. Given that you can add/remove
> > indices, nodes and aliases on the fly I think there is a way how to
> handle
> > growing data set with ease. If anyone is interested such scenario has
> been
> > discussed in detail in ES mail list.
> >
> > Regards,
> > Lukas
> >
> > On Tue, Apr 17, 2012 at 2:42 AM, Jason Rutherglen <
> > jason.rutherg...@gmail.com> wrote:
> >
> >> One of big weaknesses of Solr Cloud (and ES?) is the lack of the
> >> ability to redistribute shards across servers.  Meaning, as a single
> >> shard grows too large, splitting the shard, while live updates.
> >>
> >> How do you plan on elastically adding more servers without this feature?
> >>
> >> Cassandra and HBase handle elasticity in their own ways.  Cassandra
> >> has successfully implemented the Dynamo model and HBase uses the
> >> traditional BigTable 'split'.  Both systems are complex though are at
> >> a singular level of maturity.
> >>
> >> Also Cassandra [successfully] implements multiple data center support,
> >> is that available in SC or ES?
> >>
> >> On Thu, Apr 12, 2012 at 7:23 PM, Otis Gospodnetic
> >>  wrote:
> >> > Hello Ali,
> >> >
> >> >> I'm trying to setup a large scale *Crawl + Index + Search
> >> *infrastructure
> >> >
> >> >> using Nutch and Solr/Lucene. The targeted scale is *5 Billion web
> >> pages*,
> >> >> crawled + indexed every *4 weeks, *with a search latency of less than
> >> 0.5
> >> >> seconds.
> >> >
> >> >
> >> > That's fine.  Whether it's doable with any tech will depend on how
> much
> >> hardware you give it, among other things.
> >> >
> >> >> Needless to mention, the search index needs to scale to 5Billion
> pages.
> >> It
> >> >> is also possible that I might need to store multiple indexes -- one
> for
> >> >> crawled content, and one for ancillary data that is also very large.
> >> Each
> >> >> of these indices would likely require a logically distributed and
> >> >> replicated index.
> >> >
> >> >
> >> > Yup, OK.
> >> >
> >> >> However, I would like for such a system to be homogenous with the
> Hadoop
> >> >> infrastructure that is already installed on the cluster (for the
> >> crawl). In
> >> >> other words, I would much prefer if the replication and distribution
> of
> >> the
> >> >> Solr/Lucene index be done automagically on top of Hadoop/HDFS,
> instead
> >> of
> >> >> using another scalability framework (such as SolrCloud). In
> addition, it
> >> >> would be ideal if this environment was flexible enough to be
> dynamically
> >> >> scaled based on the size requirements of the index and the search
> >> traffic
> >> >> at the time (i.e. if it is deployed on an Amazon cluster, it should
> be
> >> easy
> >> >> enough to automatically provision additional processing power into
> the
> >> >> cluster without requiring server re-starts).
> >> >
> >> >
> >> > There is no such thing just yet.
> >> > There is no Search+Hadoop/HDFS in a box just yet.  There was an
> attempt
> >> to automatically index HBase content, but that was

pushing updates to solr from postgresql

2012-04-18 Thread Welty, Richard

i have a setup right this instant where the dataimporthandler is being used to 
pull data for an index from a postgresql server.

i'd like to switch over to push, and am looking for some validation of my 
approach.

i have perl installed as an untrusted language on my postgresql server and am 
planning to set up triggers on the tables where insert/update/delete operations 
should cause an update of the relevant solr indexes. the trigger functions will 
build xml in the format for UpdateXmlMessages and notify Solr via http requests.


is this sensible, or am i missing something easier?

also, does anyone have any thoughts about coordinating initial indexing/full 
reindexing via dataimporthandler with the trigger based push operations?

thanks,
   richard

hierarchical faceting?

2012-04-18 Thread sam ”

I have hierarchical colors:

text_path is TextField with PathHierarchyTokenizerFactory as tokenizer.

Given these two documents,
Doc1: red
Doc2: red/pink

I want the result to be the following:
?fq=red
==> Doc1, Doc2

?fq=red/pink
==> Doc2

But, with PathHierarchyTokenizer, Doc1 is included for the query:
?fq=red/pink
==> Doc1, Doc2

How can I query for hierarchical facets?
http://wiki.apache.org/solr/HierarchicalFaceting describes facet.prefix..
But it looks too cumbersome to me.

Is there a simpler way to implement hierarchical facets?

Problems with edismax parser and solr3.6

2012-04-18 Thread Bernd Fehling


I just looked through my logs of solr 3.6 and saw several "0 hits" which were 
not seen with solr 3.5.

While tracing this down it turned out that edismax don't like queries of type 
"...&q=(text:ide)&..." any more.

If parentheses around the query term the edismax fails with solr 3.6.

Can anyone confirm this and give me feedback?

Bernd

Re: hierarchical faceting?

2012-04-18 Thread Darren Govoni

Put the parent term in all the child documents at index time
and the re-issue the facet query when you expand the parent using the
parent's term. works perfect.

On Wed, 2012-04-18 at 10:56 -0400, sam ” wrote:
> I have hierarchical colors:
>  stored="true" multiValued="true"/>
> text_path is TextField with PathHierarchyTokenizerFactory as tokenizer.
> 
> Given these two documents,
> Doc1: red
> Doc2: red/pink
> 
> I want the result to be the following:
> ?fq=red
> ==> Doc1, Doc2
> 
> ?fq=red/pink
> ==> Doc2
> 
> But, with PathHierarchyTokenizer, Doc1 is included for the query:
> ?fq=red/pink
> ==> Doc1, Doc2
> 
> How can I query for hierarchical facets?
> http://wiki.apache.org/solr/HierarchicalFaceting describes facet.prefix..
> But it looks too cumbersome to me.
> 
> Is there a simpler way to implement hierarchical facets?

How to add/remove/customize search tabs

2012-04-18 Thread Valentin, AJ

I have Apache Solr installed with my Drupal 7 site and noticed some default 
tabs available (Content, Site, Users).  Is there a way to add/change that tabs 
section?


CONFIDENTIALITY NOTICE: This email constitutes an electronic communication 
within the meaning of the Electronic Communications Privacy Act, 18 U.S.C. 
2510, and its disclosure is strictly limited to the named recipient(s) intended 
by the sender of this message. This email, and any attachments, may contain 
confidential and/or proprietary information of Scientific Research Corporation. 
If you are not a named recipient, any copying, using, disclosing or 
distributing to others the information in this email and attachments is 
STRICTLY PROHIBITED. If you have received this email in error, please notify 
the sender immediately and permanently delete the email, any attachments, and 
all copies thereof from any drives or storage media and destroy any printouts 
or hard copies of the email and attachments.

EXPORT COMPLIANCE NOTICE: This email and any attachments may contain technical 
data subject to U.S export restrictions under the International Traffic in Arms 
Regulations (ITAR) or the Export Administration Regulations (EAR). Export or 
transfer of this technical data and/or related information to any foreign 
person(s) or entity(ies), either within the U.S. or outside of the U.S., may 
require advance export authorization by the appropriate U.S. Government agency 
prior to export or transfer. In addition, technical data may not be exported or 
transferred to certain countries or specified designated nationals identified 
by U.S. embargo controls without prior export authorization. By accepting this 
email and any attachments, all recipients confirm that they understand and will 
comply with all applicable ITAR, EAR and embargo compliance requirements.

Re: How to add/remove/customize search tabs

2012-04-18 Thread Dave Stuart

This is question is probably better set on the Drupal groups page for Apache 
Solr  http://groups.drupal.org/lucene-nutch-and-solr

As this is more of a Drupal issue than a Solr issue


On 18 Apr 2012, at 16:11, Valentin, AJ wrote:

> I have Apache Solr installed with my Drupal 7 site and noticed some default 
> tabs available (Content, Site, Users).  Is there a way to add/change that 
> tabs section?
> 
> 
> CONFIDENTIALITY NOTICE: This email constitutes an electronic communication 
> within the meaning of the Electronic Communications Privacy Act, 18 U.S.C. 
> 2510, and its disclosure is strictly limited to the named recipient(s) 
> intended by the sender of this message. This email, and any attachments, may 
> contain confidential and/or proprietary information of Scientific Research 
> Corporation. If you are not a named recipient, any copying, using, disclosing 
> or distributing to others the information in this email and attachments is 
> STRICTLY PROHIBITED. If you have received this email in error, please notify 
> the sender immediately and permanently delete the email, any attachments, and 
> all copies thereof from any drives or storage media and destroy any printouts 
> or hard copies of the email and attachments.
> 
> EXPORT COMPLIANCE NOTICE: This email and any attachments may contain 
> technical data subject to U.S export restrictions under the International 
> Traffic in Arms Regulations (ITAR) or the Export Administration Regulations 
> (EAR). Export or transfer of this technical data and/or related information 
> to any foreign person(s) or entity(ies), either within the U.S. or outside of 
> the U.S., may require advance export authorization by the appropriate U.S. 
> Government agency prior to export or transfer. In addition, technical data 
> may not be exported or transferred to certain countries or specified 
> designated nationals identified by U.S. embargo controls without prior export 
> authorization. By accepting this email and any attachments, all recipients 
> confirm that they understand and will comply with all applicable ITAR, EAR 
> and embargo compliance requirements.
> 

David Stuart
M  +44(0) 778 854 2157
T   +44(0) 845 519 5465
www.axistwelve.com
Axis12 Ltd | 7 Wynford Road
| London | N1 9QN | UK

AXIS12 - Enterprise Web Solutions

Reg Company No. 7215135
VAT No. 997 4801 60

This e-mail is strictly confidential and intended solely for the ordinary user 
of the e-mail account to which it is addressed. If you have received this 
e-mail in error please inform Axis12 immediately by return e-mail or telephone. 
We advise that in keeping with good computing practice the recipient of this 
e-mail should ensure that it is virus free. We do not accept any responsibility 
for any loss or damage that may arise from the use of this email or its 
contents.

Re: hierarchical faceting?

2012-04-18 Thread sam ”

Yah, that's exactly what PathHierarchyTokenizer does.

  

  


I think I have a query time tokenizer that tokenizes at /

?q=colors:red
==> Doc1, Doc2

?q=colors:redfoobar
==>

?q=colors:red/foobarasdfoaijao
==> Doc1, Doc2



On Wed, Apr 18, 2012 at 11:10 AM, Darren Govoni  wrote:

> Put the parent term in all the child documents at index time
> and the re-issue the facet query when you expand the parent using the
> parent's term. works perfect.
>
> On Wed, 2012-04-18 at 10:56 -0400, sam ” wrote:
> > I have hierarchical colors:
> >  > stored="true" multiValued="true"/>
> > text_path is TextField with PathHierarchyTokenizerFactory as tokenizer.
> >
> > Given these two documents,
> > Doc1: red
> > Doc2: red/pink
> >
> > I want the result to be the following:
> > ?fq=red
> > ==> Doc1, Doc2
> >
> > ?fq=red/pink
> > ==> Doc2
> >
> > But, with PathHierarchyTokenizer, Doc1 is included for the query:
> > ?fq=red/pink
> > ==> Doc1, Doc2
> >
> > How can I query for hierarchical facets?
> > http://wiki.apache.org/solr/HierarchicalFaceting describes
> facet.prefix..
> > But it looks too cumbersome to me.
> >
> > Is there a simpler way to implement hierarchical facets?
>
>
>

minimum match and not matched words / term frequency in query result

2012-04-18 Thread giovanni.bricc...@banzai.it


Hi

I have a dismax query with a mininimum match settings, this allows some 
terms to be missing in query results.


I would like give a feedback to the user, highlighting the not matched 
words. It would be interesting also to show the words with a very low 
frequence.


For instance searching for "purple pendrive" I would highlight that the 
results ignore the term "purple",  beacuse we don't have any.


Can you suggest how to approach the problem?

I was thinking about the debugQuery output, but since I will not get 
details about all the results I probably will  miss something.


I am trying to write a new SearchComponent but I don't know how to get 
term frequency data from a ResponseBuilder object... I am new to 
solr/lucene programming.


Thanks a lot

Solr 3.6 parsing and extraction files

2012-04-18 Thread Tod

Could someone possibly provide me with a list of jars that I need to 
extract from the apache-solr-3.6.0.tgz file to enable the parsing and 
remote streaming of office style documents?  I assume (for a multicore 
configuration) they would go into ./tomcat/webapps/solr/WEB-INF/lib - 
correct?



Thanks - Tod

Re: pushing updates to solr from postgresql

2012-04-18 Thread Otis Gospodnetic

Hi Richard,

One thing to think about here is what you will do when Solr is unavailable to 
take a new document for whatever reason.  If you send docs to Solr from PG, 
docs either get indexed or not.  So you may have to catch errors and then mark 
documents in PG as not indexed.  You may want to keep track of initial and/or 
last index attempt and the total number of indexing attempts (new DB columns) 
and will probably want to use DIH to "pick up" unindexed documents from PG and 
get them indexed.

Also keep in mind that sending docs to Solr one by one will not be as efficient 
as sending batches of them or as efficient as getting a batch of them via DIH.  
If your data volume is low this likely won't be a problem, but if it is it high 
or is growing, you'll want to keep this in mind.

Otis

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



>
> From: "Welty, Richard" 
>To: solr-user@lucene.apache.org 
>Sent: Wednesday, April 18, 2012 10:48 AM
>Subject: pushing updates to solr from postgresql
> 
>i have a setup right this instant where the dataimporthandler is being used to 
>pull data for an index from a postgresql server.
>
>i'd like to switch over to push, and am looking for some validation of my 
>approach.
>
>i have perl installed as an untrusted language on my postgresql server and am 
>planning to set up triggers on the tables where insert/update/delete 
>operations should cause an update of the relevant solr indexes. the trigger 
>functions will build xml in the format for UpdateXmlMessages and notify Solr 
>via http requests.
>
>
>is this sensible, or am i missing something easier?
>
>also, does anyone have any thoughts about coordinating initial indexing/full 
>reindexing via dataimporthandler with the trigger based push operations?
>
>thanks,
>   richard
>
>
>

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-18 Thread Jason Rutherglen

The main point being made is established NoSQL solutions (eg,
Cassandra, HBase, et al) have solved the update problem (among many
other scalability issues, for several years).

If an update is being performed and it is not known where the record
exists, the update capability of the system is inefficient.  In
addition, in a production system, the mere possibility of losing data,
or inaccurate updates is usually a red flag.

On Wed, Apr 18, 2012 at 6:40 AM, Lukáš Vlček  wrote:
> AFAIK it can not. You can only add new shards by creating a new index and
> you will then need to index new data into that new index. Index aliases are
> useful mainly for searching part. So it means that you need to plan for
> this when you implement your indexing logic. On the other hand the query
> logic does not need to change as you only add new indices and give them all
> the same alias.
>
> I am not an expert on this but I think that index splitting and re-sharding
> can be expensive for [near] real-time search system and the point is that
> you can probably use different techniques to support your large scale
> needs. Index aliasing and routing in elasticsearch can help a lot in
> supporting various large scale data scenarios, check the following thread
> in ES ML for some examples:
> https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ
>
> Just to sum it up, the fact that elasticsearch does have fixed number of
> shards per index and does not support resharding and index splitting does
> not mean you can not scale your data easily.
>
> (I was not following this whole thread in every detail. So may be you may
> have specific needs that can be solved only by splitting or resharding, in
> such case I would recommend you to ask on ES ML with further questions, I
> do not want to run into system X vs system Y flame here...)
>
> Regards,
> Lukas
>
> On Wed, Apr 18, 2012 at 2:22 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> I'm curious how on the fly updates are handled as a new shard is added
>> to an alias.  Eg, how does the system know to which shard to send an
>> update?
>>
>> On Tue, Apr 17, 2012 at 4:00 PM, Lukáš Vlček 
>> wrote:
>> > Hi,
>> >
>> > speaking about ES I think it would be fair to mention that one has to
>> > specify number of shards upfront when the index is created - that is
>> > correct, however, it is possible to give index one or more aliases which
>> > basically means that you can add new indices on the fly and give them
>> same
>> > alias which is then used to search against. Given that you can add/remove
>> > indices, nodes and aliases on the fly I think there is a way how to
>> handle
>> > growing data set with ease. If anyone is interested such scenario has
>> been
>> > discussed in detail in ES mail list.
>> >
>> > Regards,
>> > Lukas
>> >
>> > On Tue, Apr 17, 2012 at 2:42 AM, Jason Rutherglen <
>> > jason.rutherg...@gmail.com> wrote:
>> >
>> >> One of big weaknesses of Solr Cloud (and ES?) is the lack of the
>> >> ability to redistribute shards across servers.  Meaning, as a single
>> >> shard grows too large, splitting the shard, while live updates.
>> >>
>> >> How do you plan on elastically adding more servers without this feature?
>> >>
>> >> Cassandra and HBase handle elasticity in their own ways.  Cassandra
>> >> has successfully implemented the Dynamo model and HBase uses the
>> >> traditional BigTable 'split'.  Both systems are complex though are at
>> >> a singular level of maturity.
>> >>
>> >> Also Cassandra [successfully] implements multiple data center support,
>> >> is that available in SC or ES?
>> >>
>> >> On Thu, Apr 12, 2012 at 7:23 PM, Otis Gospodnetic
>> >>  wrote:
>> >> > Hello Ali,
>> >> >
>> >> >> I'm trying to setup a large scale *Crawl + Index + Search
>> >> *infrastructure
>> >> >
>> >> >> using Nutch and Solr/Lucene. The targeted scale is *5 Billion web
>> >> pages*,
>> >> >> crawled + indexed every *4 weeks, *with a search latency of less than
>> >> 0.5
>> >> >> seconds.
>> >> >
>> >> >
>> >> > That's fine.  Whether it's doable with any tech will depend on how
>> much
>> >> hardware you give it, among other things.
>> >> >
>> >> >> Needless to mention, the search index needs to scale to 5Billion
>> pages.
>> >> It
>> >> >> is also possible that I might need to store multiple indexes -- one
>> for
>> >> >> crawled content, and one for ancillary data that is also very large.
>> >> Each
>> >> >> of these indices would likely require a logically distributed and
>> >> >> replicated index.
>> >> >
>> >> >
>> >> > Yup, OK.
>> >> >
>> >> >> However, I would like for such a system to be homogenous with the
>> Hadoop
>> >> >> infrastructure that is already installed on the cluster (for the
>> >> crawl). In
>> >> >> other words, I would much prefer if the replication and distribution
>> of
>> >> the
>> >> >> Solr/Lucene index be done automagically on top of Hadoop/HDFS,
>> instead
>> >> of
>> >> >> using another scalability framework (such as

[Job] Search Engineer Lead at Sematext International

2012-04-18 Thread Otis Gospodnetic

Hello,

If you've always wanted a full-time job working with Solr, ElasticSearch, or 
Lucene, we have a position that is all about that, offers path to team 
leadership, and will expose a person to a healthy mixture of engineering and 
business.  If you are interested, please send your resume to j...@sematext.com .

Otis

Sematext International is looking for a strong Search Engineer with interest 
and ability to interact with clients and with potential to build and lead local 
and/or remote development teams.  By “client-facing” we really mean primarily 
email, phone, Skype.


A person in this role needs to be able to:
* design large scale search systems
* have solid knowledge of either Solr or ElasticSearch or both
* efficiently troubleshoot performance, relevance, and other search-related 
issues
* speak and interact with clients


Pluses – beyond pure engineering:
* ability and desire to expand and lead a development/consulting teams
* ability to think both business and engineering
* ability to build products based on observed client needs
* ability to present in public, at meetups, conferences, etc
* ability to contribute to blog.sematext.com
* active participation in online search communities
* attention to detail
* desire to share knowledge and teach
* positive attitude, humor, agility


Location:
    * New York

Travel:
    * Minimal


Relevant pointers:
* http://sematext.com/about/jobs.html
* http://sematext.com/about/jobs.html#advantages
* http://sematext.com/engineering/index.html

solr stats component

2012-04-18 Thread Peter Markey

Hello,

I am using the stats component and I wanted help with range like function
(in facet component). To be more clear, we would like to have a similar
functionality of facet.range (i.e with gap and stuff) for the statistics
component. That is, with one call we would like to do faceting in stats
compoenent that would return us the facets only for a specified range
broken down into several buckets (based on the gap). We know that this
functionality is not available in solr but wanted to see if there's any
other indirect way of doing it. Any thoughts would be highly appreciated.

Thanks

Maximum Open Cursors using JdbcDataSource and cacheImpl

2012-04-18 Thread Keith Naas

After upgrading from 3.5.0 to 3.6.0 we have noticed that when we use a 
cacheImpl  on a nested JdbcDataSource entity, the database runs out of cursors. 
 It does not matter what transactionIsolation, autoCommit, or holdability 
setting we use.  I have only been using solr for a few months but after looking 
at EntityProcessorBase, DIHCacheSupport, and JdbcDataSource.ResultSetIterator 
it may be that the ResultSet or Statement is never closed.  In 
EntityProcessBase.getNext() if there is no cacheSupport it likely immediately 
closes the resources it was using.  Whereas with caching it might be leaving it 
open because the rowIterator is never set to null.  Since it has a reference to 
the resultSet and stmt it holds onto them and neither is ever closed.

On a related note there appear to be other possible leaks in 
JdbcDataSource.ResultSetIterator. The close() method attempts to close both the 
resultSet and the stmt.  However if it fails closing the resultSet it will not 
close the stmt.  They should probably be wrapped in separate try/catch blocks.  
It will also not close the stmt or resultSet if the ResultSetIterator throws an 
exception in its constructor.  In my experience one cannot count on the closing 
of the connection to cleanup those resources consistently.

  2012-04-18 12:02:22,017 ERROR 
[org.apache.solr.handler.dataimport.DataImporter] Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select distinct DISPLAY_NAME from dimension where 
dimension.DIMENSION_ID = 'M' Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select distinct DISPLAY_NAME from dimension where 
dimension.DIMENSION_ID = 'M' Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select distinct DISPLAY_NAME from dimension where 
dimension.DIMENSION_ID = 'M' Processing Document # 11
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.sql.SQLException: ORA-01000: maximum open cursors 
exceeded

at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:331)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:288)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:745)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at 
oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:804)
at 
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1049)
at 
oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:845)
at 
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1146)
at 
oracle.jdbc.driver.OracleStatement.executeInterna

RE: Maximum Open Cursors using JdbcDataSource and cacheImpl

2012-04-18 Thread Dyer, James

Keith,

Can you supply your data-config.xml ?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Keith Naas [mailto:keithn...@dswinc.com] 
Sent: Wednesday, April 18, 2012 11:43 AM
To: solr-user@lucene.apache.org
Subject: Maximum Open Cursors using JdbcDataSource and cacheImpl

After upgrading from 3.5.0 to 3.6.0 we have noticed that when we use a 
cacheImpl  on a nested JdbcDataSource entity, the database runs out of cursors. 
 It does not matter what transactionIsolation, autoCommit, or holdability 
setting we use.  I have only been using solr for a few months but after looking 
at EntityProcessorBase, DIHCacheSupport, and JdbcDataSource.ResultSetIterator 
it may be that the ResultSet or Statement is never closed.  In 
EntityProcessBase.getNext() if there is no cacheSupport it likely immediately 
closes the resources it was using.  Whereas with caching it might be leaving it 
open because the rowIterator is never set to null.  Since it has a reference to 
the resultSet and stmt it holds onto them and neither is ever closed.

On a related note there appear to be other possible leaks in 
JdbcDataSource.ResultSetIterator. The close() method attempts to close both the 
resultSet and the stmt.  However if it fails closing the resultSet it will not 
close the stmt.  They should probably be wrapped in separate try/catch blocks.  
It will also not close the stmt or resultSet if the ResultSetIterator throws an 
exception in its constructor.  In my experience one cannot count on the closing 
of the connection to cleanup those resources consistently.

  2012-04-18 12:02:22,017 ERROR 
[org.apache.solr.handler.dataimport.DataImporter] Full Import 
failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select distinct DISPLAY_NAME from dimension where 
dimension.DIMENSION_ID = 'M' Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select distinct DISPLAY_NAME from dimension where 
dimension.DIMENSION_ID = 'M' Processing Document # 11
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select distinct DISPLAY_NAME from dimension where 
dimension.DIMENSION_ID = 'M' Processing Document # 11
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:253)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.sql.SQLException: ORA-01000: maximum open cursors 
exceeded

at 
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:331)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:288)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:745)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at 
oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:804)
at 
oracle.jdb

Re: SOLR 4 / Date Query: Spurious Results: Is it me or ... ?

2012-04-18 Thread vybe3142

Thanks for clarifying.

I figured out the (terms=-1). It was my fault. I attempted a truncate of the
index in my test case setup by issuing a delete query and think the
subsequent commit might not have taken effect by the time the subsequent 
index queries started.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-Date-Query-Spurious-Results-Is-it-me-or-tp3918636p3920652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: hierarchical faceting?

2012-04-18 Thread sam ”

It looks like TextField is the problem.

This fixed:

  
  
  
  
  
  


I am assuming the text_path fields won't include whitespace characters.

?q=colors:red/pink
==> Doc2   (Doc1, which has colors = red isn't included!)


Is there a tokenizer that tokenizes the string as one token?
I tried to extend Tokenizer myself  but it fails:
public class AsIsTokenizer extends Tokenizer {
@Override
public boolean incrementToken() throws IOException {
return true;//or false;
}
}


On Wed, Apr 18, 2012 at 11:33 AM, sam ”  wrote:

> Yah, that's exactly what PathHierarchyTokenizer does.
>  positionIncrementGap="100">
>   
> 
>   
> 
>
> I think I have a query time tokenizer that tokenizes at /
>
> ?q=colors:red
> ==> Doc1, Doc2
>
> ?q=colors:redfoobar
> ==>
>
> ?q=colors:red/foobarasdfoaijao
> ==> Doc1, Doc2
>
>
>
>
> On Wed, Apr 18, 2012 at 11:10 AM, Darren Govoni wrote:
>
>> Put the parent term in all the child documents at index time
>> and the re-issue the facet query when you expand the parent using the
>> parent's term. works perfect.
>>
>> On Wed, 2012-04-18 at 10:56 -0400, sam ” wrote:
>> > I have hierarchical colors:
>> > > > stored="true" multiValued="true"/>
>> > text_path is TextField with PathHierarchyTokenizerFactory as tokenizer.
>> >
>> > Given these two documents,
>> > Doc1: red
>> > Doc2: red/pink
>> >
>> > I want the result to be the following:
>> > ?fq=red
>> > ==> Doc1, Doc2
>> >
>> > ?fq=red/pink
>> > ==> Doc2
>> >
>> > But, with PathHierarchyTokenizer, Doc1 is included for the query:
>> > ?fq=red/pink
>> > ==> Doc1, Doc2
>> >
>> > How can I query for hierarchical facets?
>> > http://wiki.apache.org/solr/HierarchicalFaceting describes
>> facet.prefix..
>> > But it looks too cumbersome to me.
>> >
>> > Is there a simpler way to implement hierarchical facets?
>>
>>
>>
>

Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Chris Warner

Hi, folks,

Perhaps I'm overlooking an obvious solution to a common desire... I'd like to 
return a specific document with every query, as the first result. As well, I'd 
like to have that document be the first result in a *:* query.

I'm looking into index time boosting using the boost attribute on the 
appropriate doc. I haven't tested this yet, and I'm not sure this would do 
anything for the *:* queries.

Thanks for any suggested reading or patterns...

Best,
Chris

 
--
chris_war...@yahoo.com

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Jeevanandam Madanagopal

Chris -

Take a look - QueryElevationComponent

http://wiki.apache.org/solr/QueryElevationComponent

-Jeevanandam

On Apr 18, 2012, at 10:46 PM, Chris Warner wrote:

> Hi, folks,
> 
> Perhaps I'm overlooking an obvious solution to a common desire... I'd like to 
> return a specific document with every query, as the first result. As well, 
> I'd like to have that document be the first result in a *:* query.
> 
> I'm looking into index time boosting using the boost attribute on the 
> appropriate doc. I haven't tested this yet, and I'm not sure this would do 
> anything for the *:* queries.
> 
> Thanks for any suggested reading or patterns...
> 
> Best,
> Chris
> 
>  
> --
> chris_war...@yahoo.com

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Otis Gospodnetic

Chris,

I haven't checked if Elevate Component has an easy way to push a specific doc 
for *all* queries, but have a 
look http://wiki.apache.org/solr/QueryElevationComponent

Otis 

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



- Original Message -
> From: Chris Warner 
> To: "solr-user@lucene.apache.org" 
> Cc: 
> Sent: Wednesday, April 18, 2012 1:16 PM
> Subject: Can you suggest a method or pattern to consistently promote a 
> document with any query?
> 
> Hi, folks,
> 
> Perhaps I'm overlooking an obvious solution to a common desire... I'd 
> like to return a specific document with every query, as the first result. As 
> well, I'd like to have that document be the first result in a *:* query.
> 
> I'm looking into index time boosting using the boost attribute on the 
> appropriate doc. I haven't tested this yet, and I'm not sure this would 
> do anything for the *:* queries.
> 
> Thanks for any suggested reading or patterns...
> 
> Best,
> Chris
> 
>  
> --
> chris_war...@yahoo.com
>

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Chris Warner

Thanks, Jeevanandam and Otis,

I'll take another look at Elevate. My first attempts did not yield success, as 
I was not able to find a way to elevate a document with a *:* query. Perhaps 
I'll try a * query to see what happens.

Cheers,
Chris

 


- Original Message -
From: Jeevanandam Madanagopal 
To: solr-user@lucene.apache.org; Chris Warner 
Cc: 
Sent: Wednesday, April 18, 2012 10:21 AM
Subject: Re: Can you suggest a method or pattern to consistently promote a 
document with any query?

Chris -

Take a look - QueryElevationComponent

http://wiki.apache.org/solr/QueryElevationComponent

-Jeevanandam

On Apr 18, 2012, at 10:46 PM, Chris Warner wrote:

> Hi, folks,
> 
> Perhaps I'm overlooking an obvious solution to a common desire... I'd like to 
> return a specific document with every query, as the first result. As well, 
> I'd like to have that document be the first result in a *:* query.
> 
> I'm looking into index time boosting using the boost attribute on the 
> appropriate doc. I haven't tested this yet, and I'm not sure this would do 
> anything for the *:* queries.
> 
> Thanks for any suggested reading or patterns...
> 
> Best,
> Chris
> 
>  
> --

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Walter Underwood

That is not a useful test. Users don't look for *:*.

Test with real queries.

wunder

On Apr 18, 2012, at 10:27 AM, Chris Warner wrote:

> Thanks, Jeevanandam and Otis,
> 
> I'll take another look at Elevate. My first attempts did not yield success, 
> as I was not able to find a way to elevate a document with a *:* query. 
> Perhaps I'll try a * query to see what happens.
> 
> Cheers,
> Chris
> 
>  
> 
> 
> - Original Message -
> From: Jeevanandam Madanagopal 
> To: solr-user@lucene.apache.org; Chris Warner 
> Cc: 
> Sent: Wednesday, April 18, 2012 10:21 AM
> Subject: Re: Can you suggest a method or pattern to consistently promote a 
> document with any query?
> 
> Chris -
> 
> Take a look - QueryElevationComponent
> 
> http://wiki.apache.org/solr/QueryElevationComponent
> 
> -Jeevanandam
> 
> On Apr 18, 2012, at 10:46 PM, Chris Warner wrote:
> 
>> Hi, folks,
>> 
>> Perhaps I'm overlooking an obvious solution to a common desire... I'd like 
>> to return a specific document with every query, as the first result. As 
>> well, I'd like to have that document be the first result in a *:* query.
>> 
>> I'm looking into index time boosting using the boost attribute on the 
>> appropriate doc. I haven't tested this yet, and I'm not sure this would do 
>> anything for the *:* queries.
>> 
>> Thanks for any suggested reading or patterns...
>> 
>> Best,
>> Chris
>> 
>>   
>> --

--
Walter Underwood
wun...@wunderwood.org

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Chris Warner

Browsing all documents and all facets, skipper.


Cheers,
Chris

 

- Original Message -
From: Walter Underwood 
To: solr-user@lucene.apache.org
Cc: 
Sent: Wednesday, April 18, 2012 10:29 AM
Subject: Re: Can you suggest a method or pattern to consistently promote a 
document with any query?

That is not a useful test. Users don't look for *:*.

Test with real queries.

wunder

On Apr 18, 2012, at 10:27 AM, Chris Warner wrote:

> Thanks, Jeevanandam and Otis,
> 
> I'll take another look at Elevate. My first attempts did not yield success, 
> as I was not able to find a way to elevate a document with a *:* query. 
> Perhaps I'll try a * query to see what happens.
> 
> Cheers,
> Chris
> 
>  
> 
> 
> - Original Message -
> From: Jeevanandam Madanagopal 
> To: solr-user@lucene.apache.org; Chris Warner 
> Cc: 
> Sent: Wednesday, April 18, 2012 10:21 AM
> Subject: Re: Can you suggest a method or pattern to consistently promote a 
> document with any query?
> 
> Chris -
> 
> Take a look - QueryElevationComponent
> 
> http://wiki.apache.org/solr/QueryElevationComponent
> 
> -Jeevanandam
> 
> On Apr 18, 2012, at 10:46 PM, Chris Warner wrote:
> 
>> Hi, folks,
>> 
>> Perhaps I'm overlooking an obvious solution to a common desire... I'd like 
>> to return a specific document with every query, as the first result. As 
>> well, I'd like to have that document be the first result in a *:* query.
>> 
>> I'm looking into index time boosting using the boost attribute on the 
>> appropriate doc. I haven't tested this yet, and I'm not sure this would do 
>> anything for the *:* queries.
>> 
>> Thanks for any suggested reading or patterns...
>> 
>> Best,
>> Chris
>> 
>>  
>> --

--
Walter Underwood
wun...@wunderwood.org

Re: hierarchical faceting?

2012-04-18 Thread Darren Govoni

I don't use any of that stuff in my app, so not sure how it works.

I just manage my taxonomy outside of solr at index time and don't need
any special fields or tokenizers. I use a string field type and insert
the proper field at index time and query it normally. Nothing special
required.

On Wed, 2012-04-18 at 13:00 -0400, sam ” wrote:
> It looks like TextField is the problem.
> 
> This fixed:
>  positionIncrementGap="100">
>   
>delimiter="/"/>
>   
>   
>   
>   
> 
> 
> I am assuming the text_path fields won't include whitespace characters.
> 
> ?q=colors:red/pink
> ==> Doc2   (Doc1, which has colors = red isn't included!)
> 
> 
> Is there a tokenizer that tokenizes the string as one token?
> I tried to extend Tokenizer myself  but it fails:
> public class AsIsTokenizer extends Tokenizer {
> @Override
> public boolean incrementToken() throws IOException {
> return true;//or false;
> }
> }
> 
> 
> On Wed, Apr 18, 2012 at 11:33 AM, sam ”  wrote:
> 
> > Yah, that's exactly what PathHierarchyTokenizer does.
> >  > positionIncrementGap="100">
> >   
> > 
> >   
> > 
> >
> > I think I have a query time tokenizer that tokenizes at /
> >
> > ?q=colors:red
> > ==> Doc1, Doc2
> >
> > ?q=colors:redfoobar
> > ==>
> >
> > ?q=colors:red/foobarasdfoaijao
> > ==> Doc1, Doc2
> >
> >
> >
> >
> > On Wed, Apr 18, 2012 at 11:10 AM, Darren Govoni wrote:
> >
> >> Put the parent term in all the child documents at index time
> >> and the re-issue the facet query when you expand the parent using the
> >> parent's term. works perfect.
> >>
> >> On Wed, 2012-04-18 at 10:56 -0400, sam ” wrote:
> >> > I have hierarchical colors:
> >> >  >> > stored="true" multiValued="true"/>
> >> > text_path is TextField with PathHierarchyTokenizerFactory as tokenizer.
> >> >
> >> > Given these two documents,
> >> > Doc1: red
> >> > Doc2: red/pink
> >> >
> >> > I want the result to be the following:
> >> > ?fq=red
> >> > ==> Doc1, Doc2
> >> >
> >> > ?fq=red/pink
> >> > ==> Doc2
> >> >
> >> > But, with PathHierarchyTokenizer, Doc1 is included for the query:
> >> > ?fq=red/pink
> >> > ==> Doc1, Doc2
> >> >
> >> > How can I query for hierarchical facets?
> >> > http://wiki.apache.org/solr/HierarchicalFaceting describes
> >> facet.prefix..
> >> > But it looks too cumbersome to me.
> >> >
> >> > Is there a simpler way to implement hierarchical facets?
> >>
> >>
> >>
> >

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Chris Warner

Thanks to those who responded. A more thorough reading of the wiki and I see 
the need for forceElevation=true in the elevate query.

Cheers,
Chris


- Original Message -
From: Otis Gospodnetic 
To: "solr-user@lucene.apache.org" ; Chris Warner 

Cc: 
Sent: Wednesday, April 18, 2012 10:23 AM
Subject: Re: Can you suggest a method or pattern to consistently promote a 
document with any query?

Chris,

I haven't checked if Elevate Component has an easy way to push a specific doc 
for *all* queries, but have a 
look http://wiki.apache.org/solr/QueryElevationComponent

Otis 

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



- Original Message -
> From: Chris Warner 
> To: "solr-user@lucene.apache.org" 
> Cc: 
> Sent: Wednesday, April 18, 2012 1:16 PM
> Subject: Can you suggest a method or pattern to consistently promote a 
> document with any query?
> 
> Hi, folks,
> 
> Perhaps I'm overlooking an obvious solution to a common desire... I'd 
> like to return a specific document with every query, as the first result. As 
> well, I'd like to have that document be the first result in a *:* query.
> 
> I'm looking into index time boosting using the boost attribute on the 
> appropriate doc. I haven't tested this yet, and I'm not sure this would 
> do anything for the *:* queries.
> 
> Thanks for any suggested reading or patterns...
> 
> Best,
> Chris
> 
>  
> --
> chris_war...@yahoo.com
>

Date granularity

2012-04-18 Thread vybe3142

A query search on a particular date:

returns 1valid result (as expected).

How can I alter the granularity of the search for example , to all matches
on the particular DAY?

Reading through various docs, I attempt to append "/DAY" but this doesn't
seem to work (in fact I get 0 results back when querying).

What am I neglecting? 
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-granularity-tp3920890p3920890.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: hierarchical faceting?

2012-04-18 Thread Charlie Maroto

 The PathHierarchyTokenizerFactory is intended for file path therefore
assumes that all documents should be indexed with all of the paths to the
parent folders but you are trying to use it for a taxonomy so you can't
simply use the PathHierarchyTokenizerFactory.   Use the analysis page (
http://localhost:8983/solr/admin/analysis.jsp) so that you can see what's
happening with the content both at index and query time.

Field  (Type)  text_path
Field value (Index)  red/pink
Field value (Query) red/pink
You'd notice that the result of both is identical, therefore explaining why
both documents are retrieved:

Index Analyzer:
   red
   red/pink
Query Analyzer:
   red
   red/pink

 Carlos

-Original Message-
From: Darren Govoni [mailto:dar...@ontrenet.com]
Sent: Wednesday, April 18, 2012 8:10 AM
To: solr-user@lucene.apache.org
Subject: Re: hierarchical faceting?

Put the parent term in all the child documents at index time and the
re-issue the facet query when you expand the parent using the parent's
term. works perfect.

On Wed, 2012-04-18 at 10:56 -0400, sam ” wrote:

> I have hierarchical colors:

>  stored="true" multiValued="true"/>

> text_path is TextField with PathHierarchyTokenizerFactory as tokenizer.

>

> Given these two documents,

> Doc1: red

> Doc2: red/pink

>

> I want the result to be the following:

> ?fq=red

> ==> Doc1, Doc2

>

> ?fq=red/pink

> ==> Doc2

>

> But, with PathHierarchyTokenizer, Doc1 is included for the query:

> ?fq=red/pink

> ==> Doc1, Doc2

>

> How can I query for hierarchical facets?

> http://wiki.apache.org/solr/HierarchicalFaceting describes facet.prefix..

> But it looks too cumbersome to me.

>

> Is there a simpler way to implement hierarchical facets?

Suggester

2012-04-18 Thread John

Using Solr 3.6, I am trying to get suggestions for phrases.
I managed getting prefixed suggestions, but not suggestions for middle of
phrase.
Can this be achieved with built in Solr suggest, or do I need to create a
special core for this purpose?

Thanks in advance.

Re: Can you suggest a method or pattern to consistently promote a document with any query?

2012-04-18 Thread Jeevanandam Madanagopal

Chris -

If you have defined 'last-components' in search handler, forceElevation=true 
may not required.  It gets invoked in search life cycle


  elevator


-Jeevanandam


On Apr 18, 2012, at 11:37 PM, Chris Warner wrote:

> Thanks to those who responded. A more thorough reading of the wiki and I see 
> the need for forceElevation=true in the elevate query.
> 
> Cheers,
> Chris
> 
> 
> - Original Message -
> From: Otis Gospodnetic 
> To: "solr-user@lucene.apache.org" ; Chris Warner 
> 
> Cc: 
> Sent: Wednesday, April 18, 2012 10:23 AM
> Subject: Re: Can you suggest a method or pattern to consistently promote a 
> document with any query?
> 
> Chris,
> 
> I haven't checked if Elevate Component has an easy way to push a specific doc 
> for *all* queries, but have a look 
> http://wiki.apache.org/solr/QueryElevationComponent
> 
> Otis 
> 
> Performance Monitoring SaaS for Solr - 
> http://sematext.com/spm/solr-performance-monitoring/index.html
> 
> 
> 
> - Original Message -
>> From: Chris Warner 
>> To: "solr-user@lucene.apache.org" 
>> Cc: 
>> Sent: Wednesday, April 18, 2012 1:16 PM
>> Subject: Can you suggest a method or pattern to consistently promote a 
>> document with any query?
>> 
>> Hi, folks,
>> 
>> Perhaps I'm overlooking an obvious solution to a common desire... I'd 
>> like to return a specific document with every query, as the first result. As 
>> well, I'd like to have that document be the first result in a *:* query.
>> 
>> I'm looking into index time boosting using the boost attribute on the 
>> appropriate doc. I haven't tested this yet, and I'm not sure this would 
>> do anything for the *:* queries.
>> 
>> Thanks for any suggested reading or patterns...
>> 
>> Best,
>> Chris
>> 
>>  
>> --
>> chris_war...@yahoo.com
>> 
>

Re: Solr file size limit?

2012-04-18 Thread Shawn Heisey


On 4/18/2012 6:17 AM, Bram Rongen wrote:

I'm using Solr 3.5 on a dedicated Ubuntu 10.04 box with 3TB of diskspace
and 16GB of memory. I've tried using the sun JRE and OpenJDK, both
resulting in the same problem. Indexing works great until my .fdt file
reaches the size of 4.9GB/ 5217987319b. At this point when Solr starts
merging it just keeps on merging, starting over and over.. Java is using
all the available memory even though Xmx is set at 8G. When I restart Solr
everything looks fine until merging is triggered. Whenever it hangs the
server load averages 3, searching is possible but slow, the solr admin
interface is reachable but sending new documents leads to a time-out.


Solr 3.5 works a little differently than previous versions (MMAPs all 
the index files), so if you look at the memory usage as reported by the 
OS, it's going to look all wrong.  I've got my max heap set to 8192M, 
but this is what top looks like:


Mem:  64937704k total, 58876376k used,  6061328k free,   379400k buffers
Swap:  8388600k total,77844k used,  8310756k free, 47080172k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
22798 ncindex   20   0 75.6g  21g  12g S  1.0 34.3  14312:55 java

If you add up the 47GB it says it's using for the disk cache, the 6GB 
that it says is free, and the 21GB it says that Java has resident, you 
end up with considerably more than the 64GB total RAM the machine has, 
even if you include the 77MB of swap that's used.  You can use the jstat 
command to get a better idea of how much RAM java really is using:


jstat -gc -t  5000

Add up the S0C, S1C, EC, OC, and PC columns.  The alignment is often 
wrong on this output, so you'll have to count the columns.  If I do this 
for my system, I end up with 8462972 KB.  Alternatively, if you have a 
GUI installed on the server or you have set up remote JMX, you can use 
JConsole to very easily get a correct number.


The extra memory reported by the OS is not really being used, it is a 
side effect of the memory mapping used by the Lucene indexes.



I've tried using several different settings for MergePolicy and started
reindexing a couple of times but the behavior stays the same. My current
solrconf.xml can be found here: http://pastebin.com/NXDT0B8f. I'm unable to
find errors in the log which makes it really difficult to debug.. Could
anyone point me in the right direction?


A MergeFactor of 4 is extremely low and will result in very frequent 
merging.  The default is 10.  I use a value of 36, but that is unusually 
high.


Looking at one of my indexes on that machine, the largest fdt file is 
7657412 KB, the other three are tiny - 9880, 12160, and 28 KB.  That 
index was recently optimized.  The total index size is over 20GB.  I 
have three indexes that size running in different cores on that 
machine.  You're definitely not running into any limits as far as Solr 
is concerned.


You might be running into I/O issues.  Are you relying on autocommit, or 
explicitly committing your updates and waiting for the commit to finish 
before doing more updates?  When there is segment merging, commits can 
take a really long time.  If you are using autocommit or not waiting for 
manual commits to finish, it might get bad enough that one commit has 
not yet finished when another is ready to take place.  I don't know what 
this would actually do, but it would not be a good situation.


How have you created your 3TB of disk space?  If you are using RAID5 or 
RAID6, you can run into very serious and unavoidable performance 
problems with writes.  If it is a single disk, it may not provide enough 
IOPS for good performance.  My servers also have 3TB of disk space, 
using six 1TB SATA drives in RAID10.  The worst-case scenario for your 
merges is equivalent to an optimize.  An optimize of one of my 20GB 
indexes takes 15 minutes even on RAID10, so I only optimize one large 
index once a day, so each large index gets optimized every six days.


I hope this helps, but I'll be happy to try and offer more, within my 
skill set.


Thanks,
Shawn

Difference between Search result from Admin console and solr/browse

2012-04-18 Thread srini

I have imported my xml documents from oracle database and indexed them. When
I search *:* in *admin console *I do get results. My xml format is not close
to what solr expects. but still when I search for any word that is part of
my xml document Solr displays whole xml document. for example if I search
for word "voicemail" solr displays xml documents that has word "voicemail"

Now when I go to solr/browse and give *:* I do see some thing but each
result is like below (no data) even if i search for same word "voicemail" I
am getting below. Can some body !!please Advice!

Price:
Features:
In Stock

there are only two things I can think off, one is settings in
solrconfig.xml(like below). 



   explicit

   
   velocity

   browse
   layout
   Solritas

   text
   edismax
   *:*
   10
   *,score
   
 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   
   text,features,name,sku,id,manu,cat
   3
   
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Difference-between-Search-result-from-Admin-console-and-solr-browse-tp3921323p3921323.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr file size limit?

2012-04-18 Thread Shawn Heisey


On 4/18/2012 6:17 AM, Bram Rongen wrote:

I've been using Solr for a very short time now and I'm stuck. I'm trying to
index a drupal website consisting of 1.2 million smaller nodes and 300k
larger nodes (~400kb avg)..


A followup to my previous reply: Your ramBufferSizeMB is only 32, the 
default in the example config.  I have seen recommendations indicating 
that going beyond 128MB is not usually helpful.  With such large input 
documents, that may not apply to you - try setting it to 512 or 1024.  
That will result in far fewer index segments being created.  They will 
be larger, so merges will be much less frequent but take longer.


Thanks,
Shawn

Re: Populating a filter cache by means other than a query

2012-04-18 Thread Chris Collins

Great question. 

The set could be in the millions.  I over simplified the use case somewhat to 
protect the innocent :-}.  If a user is querying a large set of documents (for 
the sake of argument lets say its high tens of millions but could be in the 
small billions), they want to potentially mark a result set or subset of those 
docs with a label/tag and use that label /tag later. Now lets throw in its 
multi tenant system and we dont want to keep re-indexing documents to add these 
tags.  Really what I would want todo is to execute a query filtering by this 
labeled set, the server fetches the labeled set out of local cache or over the 
wire or off disk and then incorporates it by one means or another as a filter 
(docset or hashtable in the hitcollector).

Personally I think the dictionary approach wouldnt be a good one.  It may 
produce the most optimal filter mechanism but will cost a bunch to construct 
the OpenBitSet.   

In a prior company I built a more generic version of this for not only 
filtering but for sorting, aggregate stats, etc.   We didn't use Solr.   I was 
curious if there was any methodology for plugging in such a scheme without 
taking a branch of solr and hacking at it.  This was a multi tenant system 
where we were producing aggregate graphs, filtering and ranking by things such 
as entity level sentiment so we produced a rather generic solution here that as 
you pointed out reinvented perhaps some things that smell similar.  It was 
about 7B docs and was multi tenant.  Users were able to overide these 
"features" on a document level which was necessary so their counts, sorts etc 
worked correctly.  Saying how long it took me to build and debug it if I can 
take something close off the shelf.well you know the rest of the story :-}

C

On Apr 18, 2012, at 4:38 AM, Erick Erickson wrote:

> I guess my question is "what advantage are you trying
> to get here?"
> 
> At the start, this feels like an "XY" problem. How are
> you intending to use the fq after you've built it? Because
> if there's any way to just create an "fq" clause, Solr
> will take care of it for you. Caching it, autowarming
> it when searchers are re-opened, etc. Otherwise, you're
> going to be re-inventing a bunch of stuff it seems to me,
> you'll have to intercept the queries coming in in order
> to apply the filter from the cache, etc.
> 
> Which also may be another way of asking "How big
> is this set of document IDs?" If it's in the 100s, I'd
> just go with an fq. If it's more than that, I'd index
> some kind of set identifier that you could create for
> your fqs.
> 
> And if this is gibberish, ignore me ..
> 
> Best
> Erick
> 
> On Tue, Apr 17, 2012 at 4:34 PM, Chris Collins  wrote:
>> Hi, I am a long time Lucene user but new to solr.  I would like to use 
>> something like the filterCache but build a such a cache not from a query but 
>> custom code.  I guess I will ask my question by using techniques and vocab I 
>> am familiar with.  Not sure its actually the right way so I appologize if 
>> its just the wrong approach.
>> 
>> The scenario is that I would like to filter a result set by a set of labeled 
>> documents, I will call that set L.
>> L contains app specific document IDs that are indexed as literals in the 
>> lucenefield "myid".
>> I would imagine I could build a OpenBitSet from enumerating the termdocs and 
>> look for the intersecting ids in my label set.
>> Now I have my bitset that I assume I could use in a filter.
>> 
>> Another approach would be to implement a hits collector, compute a 
>> fieldcache from that myid field and look for the intersection in a hashtable 
>> of L at scoring time, throwing out results that are not contained in the 
>> hashtable.
>> 
>> Of course I am working within the confines / concepts that SOLR has layed 
>> out.  Without going completely off the reservation is their a neat way of 
>> doing such a thing with SOLR?
>> 
>> Glad to clarify if my question makes absolutely no sense.
>> 
>> Best
>> 
>> C
>

RE: Changing precisionStep without a re-index

2012-04-18 Thread Michael Ryan

In case anyone tries to do this... If you facet on a TrieField and change the 
precisionStep to 0, you'll need to re-index. Changing precisionStep to 0 
changes the prefix returned by TrieField.getMainValuePrefix(FieldType), which 
then causes facets with a value of "0" to be returned.

-Michael

Re: Date granularity

2012-04-18 Thread Peter Markey

you could use a filter query like: fq=datefield:[NOW/DAY-1DAY TO
NOW/DAY+1DAY]

*replace datefield with your field that contains the time info

On Wed, Apr 18, 2012 at 11:11 AM, vybe3142  wrote:

> A query search on a particular date:
>
> returns 1valid result (as expected).
>
> How can I alter the granularity of the search for example , to all matches
> on the particular DAY?
>
> Reading through various docs, I attempt to append "/DAY" but this doesn't
> seem to work (in fact I get 0 results back when querying).
>
> What am I neglecting?
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Date-granularity-tp3920890p3920890.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Difference between Search result from Admin console and solr/browse

2012-04-18 Thread Jan Høydahl

Hi,

The /browse Request Handler is built to showcase the xml documents in 
solr/example/exampledata and if you want to use it for your own data and schema 
you must modify the templates in solr/example/conf/velocity/ to display 
whatever you want to display.

Given that you use an unmodified example schmema, you should be able to get 
more or less the same results as in Admin console (which uses the Lucene query 
parser on default field "text" ootb) by querying for text:voicemail. If you 
then click the "enable debug" link at the bottom of the page and then click the 
"toggle all fields" links below each result hit, you will see what is contained 
in each and every field.

What you probably *should* do is to transform your oracle XMLs into XML that 
corresponds with Solr's schema, and you should tweak your schema and Velocity 
templates to match what you'd like to output in the reults. A simple way to 
prototype transforms is to write an XSL and using the XSLTUpdateRequestHandler 
at solr/update/xslt instead of the XML handler. See 
http://wiki.apache.org/solr/XsltUpdateRequestHandler

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. apr. 2012, at 22:49, srini wrote:

> I have imported my xml documents from oracle database and indexed them. When
> I search *:* in *admin console *I do get results. My xml format is not close
> to what solr expects. but still when I search for any word that is part of
> my xml document Solr displays whole xml document. for example if I search
> for word "voicemail" solr displays xml documents that has word "voicemail"
> 
> Now when I go to solr/browse and give *:* I do see some thing but each
> result is like below (no data) even if i search for same word "voicemail" I
> am getting below. Can some body !!please Advice!
> 
> Price:
> Features:
> In Stock
> 
> there are only two things I can think off, one is settings in
> solrconfig.xml(like below). 
> 
> 
> 
>   explicit
> 
> 
>   velocity
> 
>   browse
>   layout
>   Solritas
> 
>   text
>   edismax
>   *:*
>   10
>   *,score
>   
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>   
>   text,features,name,sku,id,manu,cat
>   3
>   
>  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>   
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Difference-between-Search-result-from-Admin-console-and-solr-browse-tp3921323p3921323.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: minimum match and not matched words / term frequency in query result

2012-04-18 Thread Jan Høydahl

Hi,

Which query terms that match may of course vary from document to document, so 
it would be hard to globally print non matching terms. But for each individual 
document match, you could deduct what terms do not match by enumerating what 
terms that DO match - using the explain output for instance.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. apr. 2012, at 17:34, giovanni.bricc...@banzai.it wrote:

> Hi
> 
> I have a dismax query with a mininimum match settings, this allows some terms 
> to be missing in query results.
> 
> I would like give a feedback to the user, highlighting the not matched words. 
> It would be interesting also to show the words with a very low frequence.
> 
> For instance searching for "purple pendrive" I would highlight that the 
> results ignore the term "purple",  beacuse we don't have any.
> 
> Can you suggest how to approach the problem?
> 
> I was thinking about the debugQuery output, but since I will not get details 
> about all the results I probably will  miss something.
> 
> I am trying to write a new SearchComponent but I don't know how to get term 
> frequency data from a ResponseBuilder object... I am new to solr/lucene 
> programming.
> 
> Thanks a lot
> 
> 
> 
>

Re: Solr 3.6 parsing and extraction files

2012-04-18 Thread Jan Høydahl

Hi,

I suppose you want to POST office docs into Solr for text extraction using the 
Extracting RequestHandler (SolrCell).
Have you read this page? http://wiki.apache.org/solr/ExtractingRequestHandler
You basically need all libs provided by contrib/extraction. You can see in the 
example solr/conf/solrconfig.xml which  directives are included near 
the top of the file, this should give you a hint of how to configure your own 
solrconfig.xml depending on where you put those libs.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. apr. 2012, at 17:36, Tod wrote:

> Could someone possibly provide me with a list of jars that I need to extract 
> from the apache-solr-3.6.0.tgz file to enable the parsing and remote 
> streaming of office style documents?  I assume (for a multicore 
> configuration) they would go into ./tomcat/webapps/solr/WEB-INF/lib - correct?
> 
> 
> Thanks - Tod

Re: Populating a filter cache by means other than a query

2012-04-18 Thread Erick Erickson

Pesky users. Life would be so much easier if they'd just leave
devs alone 


Right. Well, you can certainly create your own SearchComponent and attach your
custom filter at that point, note how I'm skimping on the details here.

>From left field, you might create a custom FunctionQuery that returns 0 in the
case of excluded documents. Since that gets multiplied into the score, the
resulting score is 0. Returning 1 for docs that should be kept wouldn't change
the score.

But other than that, I'll leave it to the folks in the code. Chris,
you there? ..

Best
Erick

On Wed, Apr 18, 2012 at 5:14 PM, Chris Collins  wrote:
> Great question.
>
> The set could be in the millions.  I over simplified the use case somewhat to 
> protect the innocent :-}.  If a user is querying a large set of documents 
> (for the sake of argument lets say its high tens of millions but could be in 
> the small billions), they want to potentially mark a result set or subset of 
> those docs with a label/tag and use that label /tag later. Now lets throw in 
> its multi tenant system and we dont want to keep re-indexing documents to add 
> these tags.  Really what I would want todo is to execute a query filtering by 
> this labeled set, the server fetches the labeled set out of local cache or 
> over the wire or off disk and then incorporates it by one means or another as 
> a filter (docset or hashtable in the hitcollector).
>
> Personally I think the dictionary approach wouldnt be a good one.  It may 
> produce the most optimal filter mechanism but will cost a bunch to construct 
> the OpenBitSet.
>
> In a prior company I built a more generic version of this for not only 
> filtering but for sorting, aggregate stats, etc.   We didn't use Solr.   I 
> was curious if there was any methodology for plugging in such a scheme 
> without taking a branch of solr and hacking at it.  This was a multi tenant 
> system where we were producing aggregate graphs, filtering and ranking by 
> things such as entity level sentiment so we produced a rather generic 
> solution here that as you pointed out reinvented perhaps some things that 
> smell similar.  It was about 7B docs and was multi tenant.  Users were able 
> to overide these "features" on a document level which was necessary so their 
> counts, sorts etc worked correctly.  Saying how long it took me to build and 
> debug it if I can take something close off the shelf.well you know the 
> rest of the story :-}
>
> C
>
>
> On Apr 18, 2012, at 4:38 AM, Erick Erickson wrote:
>
>> I guess my question is "what advantage are you trying
>> to get here?"
>>
>> At the start, this feels like an "XY" problem. How are
>> you intending to use the fq after you've built it? Because
>> if there's any way to just create an "fq" clause, Solr
>> will take care of it for you. Caching it, autowarming
>> it when searchers are re-opened, etc. Otherwise, you're
>> going to be re-inventing a bunch of stuff it seems to me,
>> you'll have to intercept the queries coming in in order
>> to apply the filter from the cache, etc.
>>
>> Which also may be another way of asking "How big
>> is this set of document IDs?" If it's in the 100s, I'd
>> just go with an fq. If it's more than that, I'd index
>> some kind of set identifier that you could create for
>> your fqs.
>>
>> And if this is gibberish, ignore me ..
>>
>> Best
>> Erick
>>
>> On Tue, Apr 17, 2012 at 4:34 PM, Chris Collins  wrote:
>>> Hi, I am a long time Lucene user but new to solr.  I would like to use 
>>> something like the filterCache but build a such a cache not from a query 
>>> but custom code.  I guess I will ask my question by using techniques and 
>>> vocab I am familiar with.  Not sure its actually the right way so I 
>>> appologize if its just the wrong approach.
>>>
>>> The scenario is that I would like to filter a result set by a set of 
>>> labeled documents, I will call that set L.
>>> L contains app specific document IDs that are indexed as literals in the 
>>> lucenefield "myid".
>>> I would imagine I could build a OpenBitSet from enumerating the termdocs 
>>> and look for the intersecting ids in my label set.
>>> Now I have my bitset that I assume I could use in a filter.
>>>
>>> Another approach would be to implement a hits collector, compute a 
>>> fieldcache from that myid field and look for the intersection in a 
>>> hashtable of L at scoring time, throwing out results that are not contained 
>>> in the hashtable.
>>>
>>> Of course I am working within the confines / concepts that SOLR has layed 
>>> out.  Without going completely off the reservation is their a neat way of 
>>> doing such a thing with SOLR?
>>>
>>> Glad to clarify if my question makes absolutely no sense.
>>>
>>> Best
>>>
>>> C
>>
>

Re: Multiple document structure

2012-04-18 Thread Erick Erickson

Solr does not enforce anything about documents conforming to
the schema except:
1> a field specified in a doc must be present in the schema
2> any field in the schema with ' required="true" ' must be present
in the doc.

Additionally there is no penalty for NOT putting all the fields
defined in the schema into a particular document.

What this means:
Just create your schema with all the fields you'll need for both
types of documents, probably along with a "type" field to
distinguish the two. Now just index the separate document
types in the same index.

Best
Erick

On Wed, Apr 18, 2012 at 9:28 AM, Gora Mohanty  wrote:
> On 18 April 2012 10:05, abhijit bashetti  
> wrote:
>>
>> Hi ,
>> Is it possible to have 2 document structures in solr?
> [...]
>
> Do not think so, but why do you need it? Use two separate
> indices, either in a multi-core setup, or in separate Solr
> instances.
>
> Regards,
> Gora

Re: Date granularity

2012-04-18 Thread Erick Erickson

If Peter's suggestion doesn't work, please post the results
of adding &debugQuery=on to your query. The date math
stuff is sensitive to spaces, for instance and it's impossible
to tell whether you're making a simple error like that without
seeing what you're actually doing.

Best
Erick

On Wed, Apr 18, 2012 at 6:46 PM, Peter Markey  wrote:
> you could use a filter query like: fq=datefield:[NOW/DAY-1DAY TO
> NOW/DAY+1DAY]
>
> *replace datefield with your field that contains the time info
>
> On Wed, Apr 18, 2012 at 11:11 AM, vybe3142  wrote:
>
>> A query search on a particular date:
>>
>> returns 1valid result (as expected).
>>
>> How can I alter the granularity of the search for example , to all matches
>> on the particular DAY?
>>
>> Reading through various docs, I attempt to append "/DAY" but this doesn't
>> seem to work (in fact I get 0 results back when querying).
>>
>> What am I neglecting?
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Date-granularity-tp3920890p3920890.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: Solr Core not able access latest data indexed by multiple server.

2012-04-18 Thread Erick Erickson

I think you're trying to do something that's you shouldn't. The trunk
SolrCloud stuff will address this issue, but for the 3.x code line having
multiple servers opening up a shared index and writing to it will produce
unpredictable results. This is really bad practice.

You'd be far ahead setting up one of these machines as a master,
the other as a slave, and always indexing to the master.

Best
Erick

On Wed, Apr 18, 2012 at 1:17 AM, Paresh Modi  wrote:
> Hi,
>
> I am using Solr multicore approach in my app. we have two different servers
> (ServerA1 and ServerA2) for load balancing, both the server accessing the
> same index repository and request will go to any server as per load balance
> algorithm.
>
> Problem occurs in following way [Note that both the servers accessing the
> same physical location(index)].
>
> - ADD TO INDEX request for File1 go to ServerA1 for core CR1, core CR1
> loaded in ServerA1 and indexing done.
> - ADD TO INDEX request for File2 go to ServerA2 for core CR1, core CR1
> loaded in ServerA2 and indexing done.
> - SEARCH request for File2 go to ServerA1, now here core CR1 is already
> loaded so it directly access the index but File2 added by ServerA2 is not
> found in core loaded by ServerA1.
>
> So this is the problem, File2 indexed by core CR1 loaded in ServerA2 is not
> available in core CR1 loaded by ServerA1.
>
>
> I have searched and found that the solution to this problem is reload the
> CORE. when you reload the core, it will have latest indexed data. but
> reloading the Core for every request is very heavy and time consuming
> process.
>
> Please let me know if anyone has any solution for this.
>
>
> Waiting for your expert advice.
>
>
> Thanks
> Paresh
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Core-not-able-access-latest-data-indexed-by-multiple-server-tp3919113p3919113.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems with edismax parser and solr3.6

2012-04-18 Thread Erick Erickson

Happened to see that Jan confirms this as a bug, see:
https://issues.apache.org/jira/browse/SOLR-3377

On Wed, Apr 18, 2012 at 11:00 AM, Bernd Fehling
 wrote:
>
> I just looked through my logs of solr 3.6 and saw several "0 hits" which were 
> not seen with solr 3.5.
>
> While tracing this down it turned out that edismax don't like queries of type 
> "...&q=(text:ide)&..." any more.
>
> If parentheses around the query term the edismax fails with solr 3.6.
>
> Can anyone confirm this and give me feedback?
>
> Bernd

Re: Problems with edismax parser and solr3.6

2012-04-18 Thread Jan Høydahl

Hi,

Thanks for reporting this. I've created a bug ticket for this at 
https://issues.apache.org/jira/browse/SOLR-3377

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 18. apr. 2012, at 17:00, Bernd Fehling wrote:

> 
> I just looked through my logs of solr 3.6 and saw several "0 hits" which were 
> not seen with solr 3.5.
> 
> While tracing this down it turned out that edismax don't like queries of type 
> "...&q=(text:ide)&..." any more.
> 
> If parentheses around the query term the edismax fails with solr 3.6.
> 
> Can anyone confirm this and give me feedback?
> 
> Bernd

58 matches

Mail list logo