date:20100126

AW: AW: Searching for empty fields possible?

2010-01-26 Thread Jan-Simon Winkelmann

>> I'm not sure, theoretically fields with a null value
>> (php-side) should end
>> up not having the field. But then again i don't think it's
>> relevant just
>> yet. What bugs me is that if I add the -puid:[* TO *], all
>> results for
>> puid:[0 TO *] disappear, even though I am using "OR".
>
>- operator does not work with OR operator as you think. 
>Your query can be re-written as (puid:[0 TO *] OR (*:* -puid:[* TO *]))
>
>This new query satisfies your needs? And more importantly does
type="integer"  supports correct numeric range queries? In Solr 1.4.0 range
queries work >correctly with type="tint".


Strangely enough when I rewrote my query to ((puid:[0 TO *]) OR (-puid:[* TO
*])) I did actually get results. Weather they were correct I currently
cannot verify properly since my index does not actually contain null values
for the column. I will however check out if your query gets me any different
results :)
Speaking of your query, I don't quite understand what the *:* does there and
how it gets parsed.

Best
Jan

Re: StreamingUpdateSolrServer seems to hang on indexing big batches

2010-01-26 Thread Tim Terlegård

2010/1/26 Jake Brownell :

> I swapped our indexing process over to the streaming update server, but now 
> I'm seeing places where our indexing code adds several documents, but 
> eventually hangs. It hangs just before the completion message, which comes 
> directly after sending to solr. I found this issue in jira
>
> https://issues.apache.org/jira/browse/SOLR-1711
>
> which may be what I'm seeing. If this is indeed what we're running up against 
> is there any best practice to work around it?

I experience this too I think. My indexing script has been running all
night and has accomplished nothing. I see lots of disk activity
though, which is weird.

To me it doesn't look like the patch is added to version control, so
you need to apply it to your own svn checkout of solrj.

/Tim

Re: Solr wiki link broken

2010-01-26 Thread Erik Hatcher


All seems well now.  The wiki does have its flakey moments though.

Erik

On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote:


In
http://lucene.apache.org/solr/
the wiki tab and "Docs (wiki)" hyper text in the side bar text after  
expansion are the link to

http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a  
generic help page of the Wiki system.


What's going on? Did I just hit the site in a maintenance time?

Kuro

Re: Solr wiki link broken

2010-01-26 Thread Sven Maurmann


Hi,

you might want to try the link called Frontpage on the generic
wiki page. But well, this seems to be kind of broken for some
locales.

Regards,
 Sven

--On Dienstag, 26. Januar 2010 01:23 -0500 Teruhiko Kurosaka 
 wrote:



In
http://lucene.apache.org/solr/
the wiki tab and "Docs (wiki)" hyper text in the side bar text after
expansion are the link to http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a
generic help page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Jorg Heymans

Hi Shah,

I am assuming you are talking about the integration of SOLR-1358, i am very
interested in this feature as well. Did you get it to work ? Is there a
snapshot build available for this somewhere or do i have to build solr from
source myself ?

Thanks,
Jorg

On Mon, Jan 25, 2010 at 6:27 PM, Shah, Nirmal  wrote:

> Hi,
>
>
>
> I am fairly new to Solr and would like to use the DIH to pull rich text
> files (pdfs, etc) from BLOB fields in my database.
>
>
>
> There was a suggestion made to use the FieldReaderDataSource with the
> recently commited TikaEntityProcessor.  Has anyone accomplished this?
>
> This is my configuration, and the resulting error - I'm not sure if I'm
> using the FieldReaderDataSource correctly.  If anyone could shed light
> on whether I am going the right direction or not, it would be
> appreciated.
>
>
>
> ---Data-config.xml:
>
> 
>
>   
>
>url="jdbc:oracle:thin:un/p...@host:1521:sid" />
>
>  
>
>  
>
>  dataField="attach.attachment" format="text">
>
>
>
> 
>
>  
>
>   
>
> 
>
>
>
>
>
> -Debug error:
>
> 
>
> 
>
> 0
>
> 203
>
> 
>
> 
>
> 
>
> testdb-data-config.xml
>
> 
>
> 
>
> full-import
>
> debug
>
> 
>
> 
>
> 
>
> 
>
> select id as name, attachment from testtable2
>
> 0:0:0.32
>
> --- row #1-
>
> java.math.BigDecimal:2
>
> oracle.sql.BLOB:oracle.sql.b...@1c8e807
>
> -
>
> 
>
> 
>
> org.apache.solr.handler.dataimport.DataImportHandlerException: No
> dataSource :f1 available for entity :253433571801723 Processing Document
> # 1
>
>at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
> taImporter.java:279)
>
>at
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
> .java:93)
>
>at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:97)
>
>at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:237)
>
>at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:357)
>
>at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:383)
>
>at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :242)
>
>at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 0)
>
>at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:331)
>
>at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :389)
>
>at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
> ataImportHandler.java:203)
>
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> ase.java:131)
>
>at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> va:338)
>
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> ava:241)
>
>at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
> dler.java:1089)
>
>at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>
>at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
> 16)
>
>at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>
>at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>
>at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>
>at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
> Collection.java:211)
>
>at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
> a:114)
>
>at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>
>at org.mortbay.jetty.Server.handle(Server.java:285)
>
>at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>
>at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
> ction.java:821)
>
>at
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>
>at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>
>at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>
>at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
> a:226)
>
>at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
> va:442)
>
>
>
> Thanks,
>
> Nirmal
>
>

Need hardware recommendation

2010-01-26 Thread Jayesh Wadhwani

I am trying to do the following:

Index 6 Million database records( SQL Server 2008). Full index daily.
Differential every 15 minutes

Index 2 Million rich documents. Full index weekly. Differential every 15
minutes

Search queries: 1 per minute

20 cores

I am looking for hardware recommendations.

Any advice/recommendation will be appreciated.

-Jayesh Wadhwani

Re: StreamingUpdateSolrServer seems to hang on indexing big batches

2010-01-26 Thread Erick Erickson

<<< My indexing script has been running all
night and has accomplished nothing. I see lots of disk activity
though, which is weird.>>>


One explanation would be that you're memory-starved and
the disk activity you see is thrashing. How much memory
do you allocate to your JVM? A further indication that
this is where you should start looking would be if your
CPU usage is very low at the same time.

Erick

2010/1/26 Tim Terlegård 

> 2010/1/26 Jake Brownell :
>
> > I swapped our indexing process over to the streaming update server, but
> now I'm seeing places where our indexing code adds several documents, but
> eventually hangs. It hangs just before the completion message, which comes
> directly after sending to solr. I found this issue in jira
> >
> > https://issues.apache.org/jira/browse/SOLR-1711
> >
> > which may be what I'm seeing. If this is indeed what we're running up
> against is there any best practice to work around it?
>
> I experience this too I think. My indexing script has been running all
> night and has accomplished nothing. I see lots of disk activity
> though, which is weird.
>
> To me it doesn't look like the patch is added to version control, so
> you need to apply it to your own svn checkout of solrj.
>
> /Tim
>

Re: Invalid CRLF - StreamingUpdateSolrServer ?

2010-01-26 Thread Patrick Sauts

I've patched the solrj release(tag) 1.4 with SOLR-1595, it's online for 
about two weeks now and It's working just fine.

Thanks a lot.

Patrick.

P.S.: It's a pity there is no plan for a 1.4.1 release



Yonik Seeley a écrit :

It could be this bug, fixed in trunk:

* SOLR-1595: StreamingUpdateSolrServer used the platform default character
  set when streaming updates, rather than using UTF-8 as the HTTP headers
  indicated, leading to an encoding mismatch. (hossman, yonik)

Could you try a recent nightly build (or build your own from trunk)
and see if it fixes it?

-Yonik
http://www.lucidimagination.com



On Thu, Dec 31, 2009 at 5:07 AM, Patrick Sauts  wrote:
  

I'm using solr 1.4 on tomcat 5.0.28, with client StreamingUpdateSolrServer
with 10threads and xml communication via Post method.

Is there a way to avoid this error (data lost)?
And is StreamingUpdateSolrServer reliable ?

GRAVE: org.apache.solr.common.SolrException: Invalid CRLF
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:72)
  at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
  at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
  at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
  at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
  at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
  at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
  at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
  at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:874)
  at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
  at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
  at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
  at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
  at java.lang.Thread.run(Thread.java:619)
Caused by: com.ctc.wstx.exc.WstxIOException: Invalid CRLF

RE: Solr vs. Compass

2010-01-26 Thread Minutello, Nick

Ultimately... You're right, to some extent, the transaction
synchronisation isn't ideal for sheer throughput if you many small
transactions (as Lucene benefits from batching documents when you
index...). However, the subindex feature gives you decidedly more
throughput since the locking is at the subindex level.

>> It is just blatant advertisement, trick; even JavaDocs remain
unchanged...
Such sneaky developers
While I suspect its changed a bit since you last looked, I only ever
used the local tx synch support, and not terribly interested in arguing
the point...

-N

-Original Message-
From: Funtick [mailto:f...@efendi.ca] 
Sent: 26 January 2010 02:44
To: solr-user@lucene.apache.org
Subject: RE: Solr vs. Compass

Minutello, Nick wrote:
> 
> Maybe spend some time playing with Compass rather than speculating ;)
> 

I spent few weeks by studying Compass source code, it was three years
ago, and Compass docs (3 years ago) were saying the same as now:
"Compass::Core provides support for two phase commits transactions
(read_committed and serializable), implemented on top of Lucene index
segmentations. The implementation provides fast commits (faster than
Lucene), though they do require the concept of Optimizers that will keep
the index at bay. Compass::Core comes with support for Local and JTA
transactions, and Compass::Spring comes with Spring transaction
synchronization. When only adding data to the index, Compass comes with
the batch_insert transaction, which is the same IndexWriter operation
with the same usual suspects for controlling performance and memory. "

It is just blatant advertisement, trick; even JavaDocs remain
unchanged...

Clever guys from Compass can re-apply transaction log to Lucene in case
of server crash (for instance, server was 'killed'  _before_ Lucene
flushed new segment to disk).

Internally, it is implemented as a background thread. Nothing says in
docs "lucene is part of transaction"; I studied source - it is just
'speculating'.

Minutello, Nick wrote:
> 
> If it helps, on the project where I last used compass, we had what I 
> consider to be a small dataset - just a few million documents. Nothing

> related to indexing/searching took more than a second or 2 - mostly it

> was 10's or 100's of milliseconds. That app has been live almost 3 
> years.
> 

I did the same, and I was happy with Compass: I got Lucene-powered
search without any development. But I got performance problems after few
weeks... I needed about 300 TPS, and Compass-based approach didn't work.
With SOLR, I have 4000 index updates per second.

-Fuad
http://www.tokenizer.org

--
View this message in context:
http://old.nabble.com/Solr-vs.-Compass-tp27259766p27317213.html
Sent from the Solr - User mailing list archive at Nabble.com.

=== 
 Please access the attached hyperlink for an important electronic 
communications disclaimer: 
 http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 

===

Re: StreamingUpdateSolrServer seems to hang on indexing big batches

2010-01-26 Thread Tim Terlegård

2010/1/26 Erick Erickson :
> > My indexing script has been running all
> > night and has accomplished nothing. I see lots of disk activity
> > though, which is weird.
>
>
> One explanation would be that you're memory-starved and
> the disk activity you see is thrashing. How much memory
> do you allocate to your JVM? A further indication that
> this is where you should start looking would be if your
> CPU usage is very low at the same time.

CPU usage was very low. There were lots of free memory. I immediately
thought solr caused the disk activity, but I might have been wrong,
because the disk activity stopped after a while and the indexing still
showed no progress.

Does this thread dump reveal anything? It doesn't look like solr is doing much?

/Tim




example


1.5.0_22-147
Java HotSpot(TM) Server VM


20
23
3



64
pool-8-thread-1
WAITING
10430,6650ms
8602,0210ms

at sun.misc.Unsafe.park(Native Method) 
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)

at java.util.concurrent.DelayQueue.take(DelayQueue.java:131) 
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:533)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:526)

at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

at java.lang.Thread.run(Thread.java:613) 



60
pool-7-thread-1
WAITING
132,0080ms
13,6950ms

at sun.misc.Unsafe.park(Native Method) 
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)

at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359)

at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

at java.lang.Thread.run(Thread.java:613) 



26
DestroyJavaVM
RUNNABLE
1453,5030ms
1317,7670ms



25
Timer-2
TIMED_WAITING
java.util.taskqu...@6a58c4
3,2500ms
0,8090ms

at java.lang.Object.wait(Native Method) 
at java.util.TimerThread.mainLoop(Timer.java:509) 
at java.util.TimerThread.run(Timer.java:462) 



24
pool-1-thread-1
WAITING
41,5590ms
39,1740ms

at sun.misc.Unsafe.park(Native Method) 
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)

at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359)

at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

at java.lang.Thread.run(Thread.java:613) 



22
Timer-1
TIMED_WAITING
java.util.taskqu...@e9d100
96,9640ms
74,5150ms

at java.lang.Object.wait(Native Method) 
at java.util.TimerThread.mainLoop(Timer.java:509) 
at java.util.TimerThread.run(Timer.java:462) 



21
btpool0-9 - Acceptor0 SocketConnector @ 0.0.0.0:8983
RUNNABLE

26,0110ms
23,3400ms

at java.net.PlainSocketImpl.socketAccept(Native Method) 
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) 
at java.net.ServerSocket.implAccept(ServerSocket.java:450) 
at java.net.ServerSocket.accept(ServerSocket.java:421) 
at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:97)

at 
org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516)

at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)




20
btpool0-8
TIMED_WAITING
org.mortbay.thread.boundedthreadpool$poolthr...@7a17
734105,5810ms
727677,9460ms

at java.lang.Object.wait(Native Method) 
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)




19
btpool0-7
TIMED_WAITING
org.mortbay.thread.boundedthreadpool$poolthr...@7414c8
798010,4300ms
785039,2820ms

at java.lang.Object.wait(Native Method) 
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)




18
btpool0-6
TIMED_WAITING
org.mortbay.thread.boundedthreadpool$poolthr...@e5c339
719254,0510ms
710319,6850ms

at java.lang.Object.wait(Native Method) 
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)




17
btpool0-5
TIMED_WAITING
org.mortbay.thread.boundedthreadpool$poolthr...@d38976
243756,7410ms
240759,1390ms

at java.lang.Object.wait(Native Method) 
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)




16
btpool0-4
TIMED_WAITING
org.mortbay.thread.boundedthreadpool$poolthr...@ad97f5
501531,8820ms
496494,6760ms

at java.lang.Object.wait(Native Method) 
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)

RE: Solr vs. Compass

2010-01-26 Thread Shay Banon

Hi,

  Well, I thought I would jump here as the creator of Compass (up until this
point, the discussion was great and very objective).

  Compass is here for about 5/6 years now (man, how time passes).
Concentrating on the transactional implementation it provides, there have
been changes to it along the years. The funny thing about that, by the way,
is that the first implementation based on Lucene 1.9 was a combination of
Lucene NRT and how it handles segments in the latest Lucene version.

  I will try and focus on the latest implementation, which uses latest
Lucene IndexWriter features. Lucene IndexWriter provides the ability to
prepare a commit point, and them commit it. The idea is that most of the
heavy operations and things that might go wrong are done on the prepare
phase, with the commit basically just updating the segments file. In its
nature, its very close to what databases do with their 2 phase commit
implementation (though, admittedly, the second phase probably has higher
chances of 2 phase success).

  What Compass does, with its transactional integration with other
transactional mechanisms, like JTA, is the ability to act as an XA Resource,
and use the IndexWriter prepare and commit within the appropriate XA
resource phases. Ultimately thought, even XA is not 100% safe, for example,
what happens when you have 5 resources, all gone through the prepare phase,
and the 4th failed in the commit phase ... (simplified example, but proves
the point).

  Another point, is how Compass handles transactions. Basically, it has what
I call transaction processors. The read committed one provides just that, a
read committed transactional isolation level (you do changes, you see them
while within the transaction, other see them when you commit the
transaction). It does come with its overhead compared with other paradigms
of how to use Lucene, but it gives you other things that a lot of people
find good. There are other transaction processors that work differently,
each with its own use case (heavy indexing, non real time search, async
indexing, and so on).

  At the end, its really hard to compare Compass to Solr. One evident
difference is the fact that Solr is more geared to be a Server solution,
while Compass at being more embeddable. There are difference in features
that each provides, and each comes with its own benefits. I think the rest
of the mails on this thread have already covered that very objectively. In
any case, you, the user, should use the right tool for the job, if it
happens to be either Compass or Solr, I wish you all the best (and luck) at
succeeding in it.

Shay

Minutello, Nick wrote:
> 
> 
> 
> Ultimately... You're right, to some extent, the transaction
> synchronisation isn't ideal for sheer throughput if you many small
> transactions (as Lucene benefits from batching documents when you
> index...). However, the subindex feature gives you decidedly more
> throughput since the locking is at the subindex level.
> 
>>> It is just blatant advertisement, trick; even JavaDocs remain
> unchanged...
> Such sneaky developers
> While I suspect its changed a bit since you last looked, I only ever
> used the local tx synch support, and not terribly interested in arguing
> the point...
> 
> -N
>  
> 
> -Original Message-
> From: Funtick [mailto:f...@efendi.ca] 
> Sent: 26 January 2010 02:44
> To: solr-user@lucene.apache.org
> Subject: RE: Solr vs. Compass
> 
> 
> 
> Minutello, Nick wrote:
>> 
>> Maybe spend some time playing with Compass rather than speculating ;)
>> 
> 
> I spent few weeks by studying Compass source code, it was three years
> ago, and Compass docs (3 years ago) were saying the same as now:
> "Compass::Core provides support for two phase commits transactions
> (read_committed and serializable), implemented on top of Lucene index
> segmentations. The implementation provides fast commits (faster than
> Lucene), though they do require the concept of Optimizers that will keep
> the index at bay. Compass::Core comes with support for Local and JTA
> transactions, and Compass::Spring comes with Spring transaction
> synchronization. When only adding data to the index, Compass comes with
> the batch_insert transaction, which is the same IndexWriter operation
> with the same usual suspects for controlling performance and memory. "
> 
> It is just blatant advertisement, trick; even JavaDocs remain
> unchanged...
> 
> 
> Clever guys from Compass can re-apply transaction log to Lucene in case
> of server crash (for instance, server was 'killed'  _before_ Lucene
> flushed new segment to disk).
> 
> Internally, it is implemented as a background thread. Nothing says in
> docs "lucene is part of transaction"; I studied source - it is just
> 'speculating'.
> 
> 
> 
> 
> Minutello, Nick wrote:
>> 
>> If it helps, on the project where I last used compass, we had what I 
>> consider to be a small dataset - just a few million documents. Nothing
> 
>> related to indexing/searching took mor

Re: determine which value produced a hit in multivalued field type

2010-01-26 Thread Renaud Delbru


Hi,

SIREn [1] could provide you such information (return the value index in 
the multi-valued field). But actually, only a Lucene extension is 
available, and you'll have to modified a little bit the SIREn query 
operator to returns you the value position in the query results.


[1] http://siren.sindice.com/
--
Renaud Delbru

On 22/01/10 22:52, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote:

Hi,
If I have a multiValued field type of text, and I put values 
[cat,dog,green,blue] in it.  Is there a way to tell when I execute a query 
against that field for dog, that it was in the 1st element position for that 
multiValued field?

Thanks!
Tim

Re: StreamingUpdateSolrServer seems to hang on indexing big batches

2010-01-26 Thread Erick Erickson

I'll have to defer that one for now.

2010/1/26 Tim Terlegård 

> 2010/1/26 Erick Erickson :
> > > My indexing script has been running all
> > > night and has accomplished nothing. I see lots of disk activity
> > > though, which is weird.
> >
> >
> > One explanation would be that you're memory-starved and
> > the disk activity you see is thrashing. How much memory
> > do you allocate to your JVM? A further indication that
> > this is where you should start looking would be if your
> > CPU usage is very low at the same time.
>
> CPU usage was very low. There were lots of free memory. I immediately
> thought solr caused the disk activity, but I might have been wrong,
> because the disk activity stopped after a while and the indexing still
> showed no progress.
>
> Does this thread dump reveal anything? It doesn't look like solr is doing
> much?
>
> /Tim
>
>
> 
> 
> example
> 
> 
> 1.5.0_22-147
> Java HotSpot(TM) Server VM
> 
> 
> 20
> 23
> 3
> 
> 
> 
> 64
> pool-8-thread-1
> WAITING
> 10430,6650ms
> 8602,0210ms
> 
> at sun.misc.Unsafe.park(Native Method) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
> 
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
> 
> at java.util.concurrent.DelayQueue.take(DelayQueue.java:131) 
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:533)
> 
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:526)
> 
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> 
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> 
> at java.lang.Thread.run(Thread.java:613) 
> 
> 
> 
> 60
> pool-7-thread-1
> WAITING
> 132,0080ms
> 13,6950ms
> 
> at sun.misc.Unsafe.park(Native Method) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
> 
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
> 
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359)
> 
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> 
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> 
> at java.lang.Thread.run(Thread.java:613) 
> 
> 
> 
> 26
> DestroyJavaVM
> RUNNABLE
> 1453,5030ms
> 1317,7670ms
> 
> 
> 
> 25
> Timer-2
> TIMED_WAITING
> java.util.taskqu...@6a58c4
> 3,2500ms
> 0,8090ms
> 
> at java.lang.Object.wait(Native Method) 
> at java.util.TimerThread.mainLoop(Timer.java:509) 
> at java.util.TimerThread.run(Timer.java:462) 
> 
> 
> 
> 24
> pool-1-thread-1
> WAITING
> 41,5590ms
> 39,1740ms
> 
> at sun.misc.Unsafe.park(Native Method) 
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:118)
> 
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1841)
> 
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:359)
> 
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:470)
> 
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> 
> at java.lang.Thread.run(Thread.java:613) 
> 
> 
> 
> 22
> Timer-1
> TIMED_WAITING
> java.util.taskqu...@e9d100
> 96,9640ms
> 74,5150ms
> 
> at java.lang.Object.wait(Native Method) 
> at java.util.TimerThread.mainLoop(Timer.java:509) 
> at java.util.TimerThread.run(Timer.java:462) 
> 
> 
> 
> 21
> btpool0-9 - Acceptor0 SocketConnector @ 0.0.0.0:8983
> RUNNABLE
> 
> 26,0110ms
> 23,3400ms
> 
> at java.net.PlainSocketImpl.socketAccept(Native Method) 
> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) 
> at java.net.ServerSocket.implAccept(ServerSocket.java:450) 
> at java.net.ServerSocket.accept(ServerSocket.java:421) 
> at
> org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:97)
> 
> at
> org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516)
> 
> at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> 
> 
> 
> 
> 20
> btpool0-8
> TIMED_WAITING
> org.mortbay.thread.boundedthreadpool$poolthr...@7a17
> 734105,5810ms
> 727677,9460ms
> 
> at java.lang.Object.wait(Native Method) 
> at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)
> 
> 
> 
> 
> 19
> btpool0-7
> TIMED_WAITING
> org.mortbay.thread.boundedthreadpool$poolthr...@7414c8
> 798010,4300ms
> 785039,2820ms
> 
> at java.lang.Object.wait(Native Method) 
> at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)
> 
> 
> 
> 
> 18
> btpool0-6
> TIMED_WAITING
> org.mortbay.thread.boundedthreadpool$poolthr...@e5c339
> 719254,0510ms
> 710319,6850ms
> 
> at java.lang.Object.wait(Native Method) 
> at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:482)
> 
> 
> 
> 
>

Re: Lock problems: Lock obtain timed out

2010-01-26 Thread Ian Connor

We traced one of the lock files, and it had been around for 3 hours. A
restart removed it - but is 3 hours normal for one of these locks?

Ian.

On Mon, Jan 25, 2010 at 4:14 PM, mike anderson wrote:

> I am getting this exception as well, but disk space is not my problem. What
> else can I do to debug this? The solr log doesn't appear to lend any other
> clues..
>
> Jan 25, 2010 4:02:22 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=1990
> Jan 25, 2010 4:02:22 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
> timed
> out: NativeFSLock@
> /solr8984/index/lucene-98c1cb272eb9e828b1357f68112231e0-write.lock
> at org.apache.lucene.store.Lock.obtain(Lock.java:85)
> at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545)
> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1402)
> at org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
> at
>
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
> at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
> at
>
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
> at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> at
>
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> at org.mortbay.jetty.Server.handle(Server.java:285)
> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> at
>
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> at
>
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> at
>
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
>
> Should I consider changing the lock timeout settings (currently set to
> defaults)? If so, I'm not sure what to base these values on.
>
> Thanks in advance,
> mike
>
>
> On Wed, Nov 4, 2009 at 8:27 PM, Lance Norskog  wrote:
>
> > This will not ever work reliably. You should have 2x total disk space
> > for the index. Optimize, for one, requires this.
> >
> > On Wed, Nov 4, 2009 at 6:37 AM, Jérôme Etévé 
> > wrote:
> > > Hi,
> > >
> > > It seems this situation is caused by some No space left on device
> > exeptions:
> > > SEVERE: java.io.IOException: No space left on device
> > >at java.io.RandomAccessFile.writeBytes(Native Method)
> > >at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
> > >at
> >
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
> > >at
> >
> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
> > >
> > >
> > > I'd better try to set my maxMergeDocs and mergeFactor to more
> > > adequates values for my app (I'm indexing ~15 Gb of data on 20Gb
> > > device, so I guess there's problem when solr tries to merge the index
> > > bits being build.
> > >
> > > At the moment, they are set to   100 and
> > > 2147483647
> > >
> > > Jerome.
> > >
> > > --
> > > Jerome Eteve.
> > > http://www.eteve.net
> > > jer...@eteve.net
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
>

Re: Solr wiki link broken

2010-01-26 Thread Sven Maurmann


Hi Erik,

one observation from me who is using the wiki from a browser
living in a non-US locale: I usually get the standard wiki
frontpage (in German) and not (!) the Solr-Frontpage I get,
if I use a US locale (or click on the link FrontPage).

B.t.w I know that this does not strictly belong to this list.

Cheers,
Sven


--On Dienstag, 26. Januar 2010 04:05 -0500 Erik Hatcher 
 wrote:



All seems well now.  The wiki does have its flakey moments though.

Erik

On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote:


In
http://lucene.apache.org/solr/
the wiki tab and "Docs (wiki)" hyper text in the side bar text after
expansion are the link to
http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a
generic help page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro

solr1.5

2010-01-26 Thread Matthieu Labour

Hi
quick question:
Is there any release date scheduled for solr 1.5 with all the wonderful
patches (StreamingUpdateSolrServer etc ...)?
Thank you !

Behaviour Indicitive of Throttling

2010-01-26 Thread Raf Gemmail

I've been working on benchmarking our solr response times in relation to
the a variable number of concurrent queries.  With maxThreads=150 - I've
tried running between 20-100 queries concurrently against our solr
instance and have noted that for all n-way (>20) queries  I'm finding
that performance flatlines at 20-30 requests/second.  

We've tried tuning caches and while part of the poor performance  is
down to poor query formulation - I find the lack of seeing either
performance improvement or degradation as being indicative of some kind
of throttling.  

Not sure if this is the case or not however as a novice in these realms
I would appreciate some guidance as to what I should be looking at and
where we might be able to tune/investigate?

We've ruled out disk contention and network latency.

Useful metrics:
maxThreads:150
filterCache Size; 16384
queryResultCache size: 16384
documentCache size: 10502

I'm running Solr/Lucene version:
Solr Specification Version: 1.3.0.2009.08.19.15.54.27Solr Implementation
Version: 1.4-dev ${svnversion} - rafiq - 2009-08-19 15:54:27Lucene
Specification Version: 2.9-dev

Would be grateful for any pointers and can furnish more details.

-- 
Raf Gemmail

Software Engineer
www.tmdr.com
0207 3489 912
Extension: 5112 
Raf Gemmail 
Senior Developer 
www.tmdr.com 
d: 0207 3489 912
t: 0845 468 0568
f: 0845 468 0868
m: 
Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - -
This message is sent in confidence for the addressee only. It may contain 
privileged
information. The contents are not to be disclosed to anyone other than the 
addressee.
Unauthorised recipients are requested to preserve this confidentiality and to 
advise
us of any errors in transmission. Thank you. 
Trinity Mirror Digital Recruitment ltd is registered in England & Wales. 
Registered office: One Canada Square, Canary Wharf, London E14 5AP. 
Registered No: 01904765.

Re: Behaviour Indicitive of Throttling

2010-01-26 Thread Jeff Newburn

Have you tried watching the threads in a monitoring program like VisualVM?
We have found that at a certain point the solr software starts locking in
the synchronous calls including logging.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Raf Gemmail 
> Reply-To: 
> Date: Tue, 26 Jan 2010 15:25:51 +
> To: 
> Subject: Behaviour Indicitive of Throttling
> 
> I've been working on benchmarking our solr response times in relation to
> the a variable number of concurrent queries.  With maxThreads=150 - I've
> tried running between 20-100 queries concurrently against our solr
> instance and have noted that for all n-way (>20) queries  I'm finding
> that performance flatlines at 20-30 requests/second.
> 
> We've tried tuning caches and while part of the poor performance  is
> down to poor query formulation - I find the lack of seeing either
> performance improvement or degradation as being indicative of some kind
> of throttling.  
> 
> Not sure if this is the case or not however as a novice in these realms
> I would appreciate some guidance as to what I should be looking at and
> where we might be able to tune/investigate?
> 
> We've ruled out disk contention and network latency.
> 
> Useful metrics:
> maxThreads:150
> filterCache Size; 16384
> queryResultCache size: 16384
> documentCache size: 10502
> 
> I'm running Solr/Lucene version:
> Solr Specification Version: 1.3.0.2009.08.19.15.54.27Solr Implementation
> Version: 1.4-dev ${svnversion} - rafiq - 2009-08-19 15:54:27Lucene
> Specification Version: 2.9-dev
> 
> Would be grateful for any pointers and can furnish more details.
> 
> -- 
> Raf Gemmail
> 
> Software Engineer
> www.tmdr.com
> 0207 3489 912
> Extension: 5112 
> Raf Gemmail 
> Senior Developer 
> www.tmdr.com 
> d: 0207 3489 912
> t: 0845 468 0568
> f: 0845 468 0868
> m: 
> Beaumont House, Kensington Village, Avonmore Road, London, W14 8TS
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - -
> This message is sent in confidence for the addressee only. It may contain
> privileged
> information. The contents are not to be disclosed to anyone other than the
> addressee.
> Unauthorised recipients are requested to preserve this confidentiality and to
> advise
> us of any errors in transmission. Thank you.
> Trinity Mirror Digital Recruitment ltd is registered in England & Wales.
> Registered office: One Canada Square, Canary Wharf, London E14 5AP.
> Registered No: 01904765.

replication setup

2010-01-26 Thread Matthieu Labour

Hi



I have set up replication following the wiki 

I downloaded the latest apache-solr-1.4 release and exploded it in 2 different 
directories
I modified both solrconfig.xml for the master & the slave as described on the 
wiki page
In both sirectory, I started solr from the example directory"
example on the master:
java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8983 
-DSTOP.PORT=8078 -DSTOP.KEY=stop.now -jar start.jar

and on the slave
java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8982 
-DSTOP.PORT=8077 -DSTOP.KEY=stop.now -jar start.jar



I can see core0 and core 1 when I open the solr url 
However, I don't see a replication link and
the following url  solr url / replication returns a 404 error



I must be doing something wrong. I would appreciate any help !



thanks a lot

matt

Solr wiki link broken

2010-01-26 Thread Teruhiko Kurosaka

In
http://lucene.apache.org/solr/
the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion 
are the link to
http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a generic help 
page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro

RE: Solr wiki link broken

2010-01-26 Thread Teruhiko Kurosaka

I'm sorry. Please ignore this duplicate posting.

From: Teruhiko Kurosaka
Sent: Tuesday, January 26, 2010 8:32 AM
To: solr-user@lucene.apache.org
Subject: Solr wiki link broken

In
http://lucene.apache.org/solr/
the wiki tab and "Docs (wiki)" hyper text in the side bar text after expansion 
are the link to
http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a generic help 
page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro

RE: Solr wiki link broken

2010-01-26 Thread Teruhiko Kurosaka

Sven,
You are right. The wiki can't be read if the preferred language is not English.
The wiki system seems to implement or be configured to use a wrong way of 
choosing its locale.
Erik, let me know if I can help solving this.

Kuro

From: Sven Maurmann [sven.maurm...@kippdata.de]
Sent: Tuesday, January 26, 2010 7:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr wiki link broken

Hi Erik,

one observation from me who is using the wiki from a browser
living in a non-US locale: I usually get the standard wiki
frontpage (in German) and not (!) the Solr-Frontpage I get,
if I use a US locale (or click on the link FrontPage).

B.t.w I know that this does not strictly belong to this list.

Cheers,
 Sven

--On Dienstag, 26. Januar 2010 04:05 -0500 Erik Hatcher
 wrote:

> All seems well now.  The wiki does have its flakey moments though.
>
>   Erik
>
> On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote:
>
>> In
>> http://lucene.apache.org/solr/
>> the wiki tab and "Docs (wiki)" hyper text in the side bar text after
>> expansion are the link to
>> http://wiki.apache.org/solr
>>
>> But the wiki site seems to be broken.  The above link took me to a
>> generic help page of the Wiki system.
>>
>> What's going on? Did I just hit the site in a maintenance time?
>>
>> Kuro
>

Re: Specify logging options from command line in Solr 1.4?

2010-01-26 Thread Mat Brown

On Mon, Jan 18, 2010 at 19:15, Mark Miller  wrote:
> Mat Brown wrote:
>> Hi all,
>>
>> Wondering if anyone can point me at a simple way to specify basic
>> logging options (log level, log file location) when starting the Solr
>> example jar from the command line.
>>
>> As a bit of background, I maintain a Ruby library for Solr called
>> Sunspot that ships with a Solr installation for ease of use. Sunspot
>> includes a script for starting Solr with various options, including
>> logging options. With Solr 1.3, I was able to write out a
>> logging.properties file and then set the system property
>> java.util.logging.config.file via the command line; this no longer
>> seems to work with Solr 1.4.
>>
>> I understand that Solr 1.4 has moved to SLF4J, but I haven't been able
>> to find a readily available answer to the above question in the SLF4J
>> or Solr logging documentation. To be honest, I've always found logging
>> in Java rather mystifying.
>>
>> Any help much appreciated!
>> Mat
>>
> By default, even though Solr uses SLF4J, it will actually use the Java
> Util logging Impl:
>
> http://wiki.apache.org/solr/SolrLogging
>
> So you just specify a util logging properties file on the sommand line with:
>
> -Djava.util.logging.config.file=myLoggingConfigFilePath
>
> An example being:
>
> handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler
>
> # Default global logging level.
> # Loggers and Handlers may override this level
> .level=INFO
>
> java.util.logging.ConsoleHandler.level=INFO
> java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
>
>
> # --- FileHandler ---
> # Override of global logging level
> java.util.logging.FileHandler.level=ALL
>
> # Naming style for the output file:
> # (The output file is placed in the directory
> # defined by the "user.home" System property.)
> java.util.logging.FileHandler.pattern=%h/java%u.log
>
> # Limiting size of output file in bytes:
> java.util.logging.FileHandler.limit=5
>
> # Number of output files to cycle through, by appending an
> # integer to the base file name:
> java.util.logging.FileHandler.count=1
>
> # Style of output (Simple or XML):
> java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
Hey Mark,

Thanks very much for this - using the java.util.logging properties
does indeed work just fine.

Cheers,
Mat

Re: StreamingUpdateSolrServer seems to hang on indexing big batches

2010-01-26 Thread Yonik Seeley

On Mon, Jan 25, 2010 at 7:27 PM, Jake Brownell  wrote:
> I swapped our indexing process over to the streaming update server, but now 
> I'm seeing places where our indexing code adds several documents, but 
> eventually hangs. It hangs just before the completion message, which comes 
> directly after sending to solr. I found this issue in jira
>
> https://issues.apache.org/jira/browse/SOLR-1711

I just reviewed and committed this patch, if you want to try solr-trunk.

-Yonik
http://www.lucidimagination.com

RE: Solr wiki link broken

2010-01-26 Thread Teruhiko Kurosaka

One more comment on this.
I can see this page
http://wiki.apache.org/solr/SolrTomcat
w/o a problem, for example.

Or I can see this:
http://wiki.apache.org/solr/FrontPage

I think it's only the main page
without actual page name
http://wiki.apache.org/solr/
that is having the problem.

So the quick fix to this is to avoid solr/
and use the solr/FrontPage link.

Kuro

Mail config

2010-01-26 Thread Bogdan Vatkov

Hi,

I do not want to receive all the emails from this mail list, I only want to
receive the answers to my questions, is this possible?
If I am not mistaken when I unsubscribed I sent an email which did not reach
the mail list at all (therefore there was of course no chance to get any
replies).
How can I send questions and receive the replies but not to receive all
other posts?
I am newbie for Solr and I doubt I can contribute much by answering to other
posts.

-- 
Best regards,
Bogdan

To store or not to store serialized objects in solr

2010-01-26 Thread Andre Parodi


Hi,

We currently are storing all of our data in sql database and use solr 
for indexing. We get a list of id's from solr and retrieve the data from 
the db.


We are considering storing all the data in solr to simplify 
administration and remove any synchronisation and are considering the 
following:


1. storing the data in individual fields in solr (indexed=true, store=true)
2. storing the data in a serialized form in a binary field in solr 
(using google proto buffers or similar) and keep the rest of the solr 
fields as indexed=true, stored=*false*.
3. keep as is. data stored in db and just keep solr fields as 
indexed=true, stored=false


Can anyone provide some advice in terms of performance of the different 
approaches. Are there any obvious pitfalls to option 1 and 2 that i need 
to be mindful of?


I am thinking option 2 would be the fastest as it would be reading the 
data in one contiguous block. Will be doing some preformance test to 
verify this soon.


FYI we are looking at 5-10M records, a serialised object is 500 to 1000 
bytes and we index approx 20 fields.


Thanks for any advice.
andre

Query 2 Cats

2010-01-26 Thread Lee Smith

Sorry of this is a poor Q but cant seem to get it to work.

I have a field called cat setup so I can query against specific categories.

It ok I search all or one but cant seem to make it search over multiples.

ie q=string AND cat:name1 AND cat:name2

I have tried the following variations.

cat:name1,name2
cat:name1+name2

I have also tried using & instead of AND with still same results.

Hope you can help !!

Thank you in advance

RE: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Shah, Nirmal

Hi Jorg,

This is working now.  If you look at SOLR-1583 
(http://issues.apache.org/jira/browse/SOLR-1583) you can see that an 
InputStream was needed from the DataSource for file and URL data sources.  The 
same is true for the FieldReaderDataSource.  I created a class, 
BinFieldReaderDataSource that returns the InputStream rather than a Reader of 
the BLOB.

I am working off the trunk code from a few days ago which I checked out using 
tortoise svn and compiled using the ant that was in my eclipse plugin 
directory, a fairly painless process.

I am somewhat new to open source development, so for now I have just copied the 
text of the java file and my xml config below.

# BinFieldReaderDataSource.java
public class BinFieldReaderDataSource extends DataSource {
private static final Logger LOG = LoggerFactory
.getLogger(FieldReaderDataSource.class);
protected VariableResolver vr;
protected String dataField;
private String encoding;
private EntityProcessorWrapper entityProcessor;

public void init(Context context, Properties initProps) {
dataField = context.getEntityAttribute("dataField");
encoding = context.getEntityAttribute("encoding");
entityProcessor = (EntityProcessorWrapper) 
context.getEntityProcessor();
/* no op */
}

public InputStream getData(String query) {
Object o = 
entityProcessor.getVariableResolver().resolve(dataField);
if (o == null) {
throw new DataImportHandlerException(SEVERE,
"No field available for name : " + 
dataField);
}

if (o instanceof String) {
throw new DataImportHandlerException(SEVERE,
"Unsupported field type: String");
} else if (o instanceof Clob) {
throw new DataImportHandlerException(SEVERE,
"Unsupported field type: CLOB");
} else if (o instanceof Blob) {
Blob blob = (Blob) o;
try {
// Most of the JDBC drivers have 
getBinaryStream defined as
// public
// so let us just check it
Method m = 
blob.getClass().getDeclaredMethod("getBinaryStream");
if (Modifier.isPublic(m.getModifiers())) {
return getInputStream(m, blob);
} else {
// force invoke
m.setAccessible(true);
return getInputStream(m, blob);
}
} catch (Exception e) {
LOG.info("Unable to get data from BLOB");
return null;

}
} else {
return null;
}

}

static Reader readCharStream(Clob clob) {
try {
Method m = 
clob.getClass().getDeclaredMethod("getCharacterStream");
if (Modifier.isPublic(m.getModifiers())) {
return (Reader) m.invoke(clob);
} else {
// force invoke
m.setAccessible(true);
return (Reader) m.invoke(clob);
}
} catch (Exception e) {
wrapAndThrow(SEVERE, e, "Unable to get reader from 
clob");
return null;// unreachable
}
}

private InputStream getInputStream(Method m, Blob blob)
throws IllegalAccessException, 
InvocationTargetException,
UnsupportedEncodingException {
InputStream is = (InputStream) m.invoke(blob);
return is;
}

public void close() {

}
}

## Tika-data-config.xml

  
  
  





  



Nirmal Shah


-Original Message-
From: Jorg Heymans [mailto:jorg.heym...@gmail.com] 
Sent: Tuesday, January 26, 2010 3:43 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

Hi Shah,

I am assuming you are talking about the integration of SOLR-1358, i am very
interested in this feature as well. Did you get it to work ? Is there a
snapshot build available for this somewhere or do i have to build solr from
source myself ?

Thanks,
Jorg

On Mon, Jan 25, 2010 at 6:27 PM, Shah, Nirmal  wrote:

> Hi,
>
>
>
> I am

Re: To store or not to store serialized objects in solr

2010-01-26 Thread Markus Jelsma

Hello Andre,


We have used this approach before. We did keep all our data in a RDBMS but
added serialized objects to the index so we could simply query the record
and display it as is, without any hassle and SQL connections.

Although storing this data sounds a bit strange, it actually works well
and keeps things a bit simpler.

The performance of querying the index is the same (or with extremely tiny
differences). However, it does take some additional disk space and
transfer time for it to reach your application. On the other hand,
performance would surely be weaker if you would transfer the same data
(although in a not so verbose XML format) and need to connect and query a
SQL server.


Cheers,

Andre Parodi said:
> Hi,
>
> We currently are storing all of our data in sql database and use solr
> for indexing. We get a list of id's from solr and retrieve the data from
>  the db.
>
> We are considering storing all the data in solr to simplify
> administration and remove any synchronisation and are considering the
> following:
>
> 1. storing the data in individual fields in solr (indexed=true,
> store=true) 2. storing the data in a serialized form in a binary field
> in solr  (using google proto buffers or similar) and keep the rest of
> the solr  fields as indexed=true, stored=*false*.
> 3. keep as is. data stored in db and just keep solr fields as
> indexed=true, stored=false
>
> Can anyone provide some advice in terms of performance of the different
> approaches. Are there any obvious pitfalls to option 1 and 2 that i need
>  to be mindful of?
>
> I am thinking option 2 would be the fastest as it would be reading the
> data in one contiguous block. Will be doing some preformance test to
> verify this soon.
>
> FYI we are looking at 5-10M records, a serialised object is 500 to 1000
> bytes and we index approx 20 fields.
>
> Thanks for any advice.
> andre

Re: Query 2 Cats

2010-01-26 Thread Erick Erickson

Tell us more about the cat field. Is there one (and only one)
value per document? Or are there multiple values per
document? Because if there's only one cat value/doc,
you want something like q=string AND (cat:name1 OR cat:name2)

Erick

On Tue, Jan 26, 2010 at 1:52 PM, Lee Smith  wrote:

> Sorry of this is a poor Q but cant seem to get it to work.
>
> I have a field called cat setup so I can query against specific categories.
>
> It ok I search all or one but cant seem to make it search over multiples.
>
> ie q=string AND cat:name1 AND cat:name2
>
> I have tried the following variations.
>
> cat:name1,name2
> cat:name1+name2
>
> I have also tried using & instead of AND with still same results.
>
> Hope you can help !!
>
> Thank you in advance
>
>

Re: Query 2 Cats

2010-01-26 Thread Dave Searle

Try

> q=string AND (cat:name1 OR cat:name2)


On 26 Jan 2010, at 18:53, "Lee Smith"  wrote:

> Sorry of this is a poor Q but cant seem to get it to work.
>
> I have a field called cat setup so I can query against specific  
> categories.
>
> It ok I search all or one but cant seem to make it search over  
> multiples.
>
> ie q=string AND cat:name1 AND cat:name2
>
> I have tried the following variations.
>
> cat:name1,name2
> cat:name1+name2
>
> I have also tried using & instead of AND with still same results.
>
> Hope you can help !!
>
> Thank you in advance
>

Re: Query 2 Cats

2010-01-26 Thread Lee Smith

Thank you Dave, Eric

Worked a charm


On 26 Jan 2010, at 18:58, Dave Searle wrote:

> Try
> 
>> q=string AND (cat:name1 OR cat:name2)
> 
> 
> On 26 Jan 2010, at 18:53, "Lee Smith"  wrote:
> 
>> Sorry of this is a poor Q but cant seem to get it to work.
>> 
>> I have a field called cat setup so I can query against specific  
>> categories.
>> 
>> It ok I search all or one but cant seem to make it search over  
>> multiples.
>> 
>> ie q=string AND cat:name1 AND cat:name2
>> 
>> I have tried the following variations.
>> 
>> cat:name1,name2
>> cat:name1+name2
>> 
>> I have also tried using & instead of AND with still same results.
>> 
>> Hope you can help !!
>> 
>> Thank you in advance
>>

Basic questions about Solr cost in programming time

2010-01-26 Thread Jeff Crump

Hi, 
I hope this message is OK for this list.
 
I'm looking into search solutions for an intranet site built with Drupal.
Eventually we'd like to scale to enterprise search, which would include the
Drupal site, a document repository, and Jive SBS (collaboration software).
I'm interested in Lucene/Solr because of its scalability, faceted search and
optimization features, and because it is free. Our problem is that we are a
non-profit organization with only three very busy programmers/sys admins
supporting our employees around the world. 
 
To help me argue for Solr in terms of total cost, I'm hoping that members of
this list can share their insights about the following:
 
* About how many hours of programming did it take you to set up your
instance of Lucene/Solr (not counting time spent on optimization)?
 
* Are there any disadvantages of going with a certified distribution rather
than the standard distribution?
 
 
Thanks and best regards,
Jeff
 
Jeff Crump
jcr...@hq.mercycorps.org

Re: Basic questions about Solr cost in programming time

2010-01-26 Thread Israel Ekpo

On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump wrote:

> Hi,
> I hope this message is OK for this list.
>
> I'm looking into search solutions for an intranet site built with Drupal.
> Eventually we'd like to scale to enterprise search, which would include the
> Drupal site, a document repository, and Jive SBS (collaboration software).
> I'm interested in Lucene/Solr because of its scalability, faceted search
> and
> optimization features, and because it is free. Our problem is that we are a
> non-profit organization with only three very busy programmers/sys admins
> supporting our employees around the world.
>
> To help me argue for Solr in terms of total cost, I'm hoping that members
> of
> this list can share their insights about the following:
>
> * About how many hours of programming did it take you to set up your
> instance of Lucene/Solr (not counting time spent on optimization)?
>
>
For me this generally took 30 to 70 hours to create the entire search
application depending on the features on the web application and the
complexity of the site.


> * Are there any disadvantages of going with a certified distribution rather
> than the standard distribution?
>
>
> The people at Lucid Imagination can probably provide a better answer for
this. It is not really a disadvantage to go with the certified version but
you may have to pay in order to get the certified distribution. However, you
will get dedicated support if you happen to run into any issues or need
technical assistance.

If you use the standard version you can always get help from the mailing
list if you have any issues.



> Thanks and best regards,
> Jeff
>
> Jeff Crump
> jcr...@hq.mercycorps.org
>
>
>
>
>
>
>
>
>
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

SOLR index file system size estimate

2010-01-26 Thread SHS SOLR

We wanted to estimate the file system size requirements for index. Although
space very cheap, its not so here as we have to go through a process to add
space to the file system. So we don't want to end up estimating less and get
the process to kick in.

Is there a estimate tool for index sizes that can give a number based on
estimated size of each document? How much % should we add to the actual
document size considering we do all kinds of analysis/filters on text?

We are currently looking at only 70 documents each 20k size. But the number
of documents will increase to more than 10K soon. We would like to request
for some space keeping in mind about the future.

Any help is appreciated.

Thanks,
Pavan.

RE: matching exact/whole phrase

2010-01-26 Thread darniz

Extending this thread.
Is it safe to say in order to do exact matches the field should be a string.
Let say for example i have two fields on is caption which is of type string
and the other is regular text.
So if i index caption as "my car is the best car in the world" it will be
stored and i copy the caption to the text field. Since text has all
anylysers defined so lets assume only the following words are indexed after
stop words and other filters "my", "car","best","world"

Now in my dismax handler if i have the qf defined as text field and run a
phrase search on text field
"my car is the best car in the world"
i dont get back any results. looking with debugQuery=on this is the
parsedQuery
text:"my tire pressure warning light came my honda civic"
This will not work since text was indexed by removing all stop words.
But if i remove the double quotes it matches that document.

Now if i add extra query field &qf=caption and then do a phrase search i get
back that document since caption is of type string and it maintains all the
stop words and other stuff.

Is my assumption correct.

After i get a response i will put some more questions.
Thanks
darniz

Sandeep Shetty-2 wrote:
> 
> That was the answer I was looking for, I will try that one out
> 
> Thanks Daniel
> 
> -Original Message-
> From: Daniel Papasian [mailto:daniel.papas...@chronicle.com]
> Sent: 01 April 2008 16:03
> To: solr-user@lucene.apache.org
> Subject: Re: matching exact/whole phrase
> 
> Sandeep Shetty wrote:
>> Hi people,
>>
>> I am looking to provide exact phrase match, along with the full text
>> search with solr.  I want to achieve the same effect in solr rather
>> than use a separate SQL query. I want to do the following as an
>> example
>>
>> The indexed field has the text "car repair" (without the double
>> quotes)  for a document and I want this document to come in the
>> search result only if someone searches for "car repair". The document
>> should not show up for "repair" and "car" searches.
>>
>> Is it possible to do this type of exact phrase matching if needed
>> with solr itself?
> 
> It sounds like you want to do an exact string match, and not a text
> match, so I don't think there's anything complex you'd need to do...
> just store the field with "car repair" as type="string" and do all of
> the literal searches you want.
> 
> But if you are working off a field that contains something beyond the
> exact match of what you want to search for, you'll just need to define a
> new field type and use only the analysis filters that you need, and
> you'll have to think more about what you need if that's the case.
> 
> Daniel
> 
> Sandeep Shetty
> Technical Development Manager
> 
> Touch Local
> 89 Albert Embankment, London, SE1 7TP, UK
> D: 020 7840 4335
> E: sandeep.she...@touchlocal.com
> T: 020 7840 4300
> F: 020 7840 4301 
> 
> This email is confidential and may also be privileged. If you are not the
> intended recipient please notify us immediately by calling 020 7840 4300
> or email postmas...@touchlocal.com. You should not copy it or use it for
> any purpose nor disclose its contents to any other person. Touch Local Ltd
> cannot accept liability for statements made which are clearly the sender's
> own and are not made on behalf of the firm.
> Registered in England and Wales. Registration Number: 2885607 VAT Number:
> GB896112114
> 
> Help to save some trees. Print e-mails only if you really need to.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/matching-exact-whole-phrase-tp16424969p27329651.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to Create dynamic field names using script transformers

2010-01-26 Thread JavaGuy84


Hi,

I am trying to generate a dynamic fieldname using custom transformers but
couldn't achieve the expected results.

My requirement is that I do not want to hardcode some of field names used by
SOLR for indexing, instead the field name should be generated using the data
retreieved from a table.

Any help on this regard is greatly appreciated.

Thanks,
Barani
-- 
View this message in context: 
http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to Create dynamic field names using script transformers

2010-01-26 Thread Erik Hatcher


Barani -

Give us some details of what you tried, what you expected to happen,  
and what actually happened.


Erik


On Jan 26, 2010, at 4:15 PM, JavaGuy84 wrote:



Hi,

I am trying to generate a dynamic fieldname using custom  
transformers but

couldn't achieve the expected results.

My requirement is that I do not want to hardcode some of field names  
used by
SOLR for indexing, instead the field name should be generated using  
the data

retreieved from a table.

Any help on this regard is greatly appreciated.

Thanks,
Barani
--
View this message in context: 
http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to Create dynamic field names using script transformers

2010-01-26 Thread JavaGuy84


Hey Erik,

Thanks a lot for your reply.. I am a newbie to SOLR ...

I am just trying to use the example present in Apache WIKI to understand
"how" the scriptTransformer works. I want to know how to pass the data from
table.field to transformer and get back the data from transformer and set
the value to any field.


 
 

 
 
 
 



Basically I want a field like... 
and index this field so that users can search on this dynamic field and get
the corresponding data also.


Thanks,
Barani



Erik Hatcher-4 wrote:
> 
> Barani -
> 
> Give us some details of what you tried, what you expected to happen,  
> and what actually happened.
> 
>   Erik
> 
> 
> On Jan 26, 2010, at 4:15 PM, JavaGuy84 wrote:
> 
>>
>> Hi,
>>
>> I am trying to generate a dynamic fieldname using custom  
>> transformers but
>> couldn't achieve the expected results.
>>
>> My requirement is that I do not want to hardcode some of field names  
>> used by
>> SOLR for indexing, instead the field name should be generated using  
>> the data
>> retreieved from a table.
>>
>> Any help on this regard is greatly appreciated.
>>
>> Thanks,
>> Barani
>> -- 
>> View this message in context:
>> http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27329876.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27330330.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to Create dynamic field names using script transformers

2010-01-26 Thread JavaGuy84


To add some more details, this is what I am trying to acheive...

There are 2 fields present in a database table and I am trying to make those
2 fields as key value pair.

Eg: Consider there are 2 fields associated with each other (Propertyid and
propertyValue)

I want the property id as field name and property value as its field
value..something like...

<111>Test<1>

Thanks,
Barani


JavaGuy84 wrote:
> 
> Hi,
> 
> I am trying to generate a dynamic fieldname using custom transformers but
> couldn't achieve the expected results.
> 
> My requirement is that I do not want to hardcode some of field names used
> by SOLR for indexing, instead the field name should be generated using the
> data retreieved from a table.
> 
> Any help on this regard is greatly appreciated.
> 
> Thanks,
> Barani
> 

-- 
View this message in context: 
http://old.nabble.com/How-to-Create-dynamic-field-names-using-script-transformers-tp27329876p27330470.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: determine which value produced a hit in multivalued field type

2010-01-26 Thread Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS]

I guess it's not possible for all types then: int, sdate, etc.  Because, 
Highlighting will only work on text fields.

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Monday, January 25, 2010 3:47 PM
To: solr-user@lucene.apache.org
Subject: Re: determine which value produced a hit in multivalued field type

Thanks Erik, I did not know about the order guarantee for indexed
multivalue fields.

Timothy, it could be more than one term matches the queries.
Highlighting will show you which terms matched your query. You'll have
to post-process the results.

On Mon, Jan 25, 2010 at 7:26 AM, Harsch, Timothy J. (ARC-TI)[PEROT
SYSTEMS]  wrote:
> If a simple "no" is the answer I'd be glad if anyone could confirm.
>
> Thanks.
>
> -Original Message-
> From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] 
> [mailto:timothy.j.har...@nasa.gov]
> Sent: Friday, January 22, 2010 2:53 PM
> To: solr-user@lucene.apache.org
> Subject: determine which value produced a hit in multivalued field type
>
> Hi,
> If I have a multiValued field type of text, and I put values 
> [cat,dog,green,blue] in it.  Is there a way to tell when I execute a query 
> against that field for dog, that it was in the 1st element position for that 
> multiValued field?
>
> Thanks!
> Tim
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: SOLR index file system size estimate

2010-01-26 Thread Erick Erickson

10K documents of 20K each is only 200M as a
base, so I don't think you need to worry.

Especially since your question is unanswerable
given the number of variables

About the only thing you can really do is measure, with the
understanding that the first documents are more expensive
space-wise than later documents. So, assuming your
documents are similar, index the first 5,000, then index
the next 2000 and use the size delta to calculate the
average index growth/document. That'll give you a
pretty good idea in *your* environment with *your*
index structure..

But, again, this is not much data to index, so
I really think you'll be fine.

HTH
Erick

On Tue, Jan 26, 2010 at 3:41 PM, SHS SOLR  wrote:

> We wanted to estimate the file system size requirements for index. Although
> space very cheap, its not so here as we have to go through a process to add
> space to the file system. So we don't want to end up estimating less and
> get
> the process to kick in.
>
> Is there a estimate tool for index sizes that can give a number based on
> estimated size of each document? How much % should we add to the actual
> document size considering we do all kinds of analysis/filters on text?
>
> We are currently looking at only 70 documents each 20k size. But the number
> of documents will increase to more than 10K soon. We would like to request
> for some space keeping in mind about the future.
>
> Any help is appreciated.
>
> Thanks,
> Pavan.
>

Re: Dynamic boosting of ids at search time

2010-01-26 Thread Chris Hostetter


: I mean, if for query x, ids to be boosted are 243452,346563,773567, then for
: query y the ids to be boosted won't be the same. They are calculated at the
: search time.
: Also, I cant keep them in the lucene query as the list goes in thousands.
: Please suggest a good resolution to it.

I'm at a loss here ... your first sentence seems to suggest that every 
unique request needs to specify a distinct list of IDs to give a bosted 
score too, but your second sentence clarifies that it's infeasible for you 
to include the IDs in the query.  that seems tantamount to saying 
"everytime i do a solr search, the rules about what is important change; 
but the rules are too long for me to tell solr what they are everytime i 
do a search." ... that's a catch-22.

My best suggestion based on what little i understand of the information 
you're provided is to suggest that perhaps you could write a custom plugin 
... either a RequestHandler, or a SearchComponent, or a QParser depending 
on what works best for your use cases ... where the client might be able 
to pass some "key" that can be used by the plugin to "look up" the list of 
IDs from some other data source and to build the query that way.

...but given how little i understnad about what it is you are trying to 
do, i suspect my best guess really isnt' a very good one.

Frankly, this is starting to smell like an XY Problem

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Comparison of Solr with Sharepoint Search

2010-01-26 Thread Chris Hostetter


: Has anyone done a functionality comparison of Solr with Sharepoint/Fast
: Search?

there's been some discussion on this over the years comparing Solr with 
FAST if you go looking for it...

http://old.nabble.com/SOLR-X-FAST-to14284618.html
http://old.nabble.com/Replacing-FAST-functionality-at-sesam.no-td19186109.html
http://old.nabble.com/Experiences-from-migrating-from-FAST-to-Solr-td26371613.html
http://sesat.no/moving-from-fast-to-solr-review.html

...i have no idea about Sharepoint Search (isn't that actaully a seperate 
system? ... Microsoft Search Server or something?)


-Hoss

Re: How can I boost bq in FieldQParserPlugin?

2010-01-26 Thread Chris Hostetter


: My original query is:
: http://myhost:8080/solr/select?q=ipod&*bq=userId:12345^0.5*
: 
&fq=&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&debugQuery=on&explainOther=&hl.fl=

: But I would like to place bq phrase in the default solrconfig.xml
: configuration to make the query string more brief, so I did the following?
: http://myhost:8080/solr/select?q=ipod&*bq={!field f=userId v=$qq}&qq=12345*

: However, filedQueryParser doesn't accespt a boost parameter, then what shall

...the issue is not that "filedQueryParser doesn't accespt a boost 
parameter" the problem is that the weight syntax from your orriginal bq 
(the "^0.5" part) is actaul syntax from the standard parser -- and you 
arent' using tha parser any more (the distinction between query syntax and 
params is significant)

I haven't tried this, but i think it might do what you want...

q=ipod&bq={!dismax qf=userId^0.5 v=$qq}&qq=12345&qt=dismax

...but you might have to put other blank params inside that {!dismax} 
block to keep them from getting inherited formthe outer query (i can't 
remember how that logic works off the top of my head)


-Hoss

Re: Design Question - Dynamic Field Names (*)

2010-01-26 Thread Chris Hostetter


: - We are indexing CSV files and generating field names dynamically from the
: "header" line.
: User should be able to *list all the possible header names* (i.e. dynamic
: field names), and filter results based on some of the field names.
: - Also, list* all possible values* associated to for a given field name.

#1) the LukeRequestHandler can list all field names in the index.
#2) the TermsComponent or Faceting can list all *indexed* values in a 
given field ... which one you'll want to use depends largely on what you 
want to do with that list.



-Hoss

Multiple Cores Vs. Single Core for the following use case

2010-01-26 Thread Matthieu Labour

Hi



Shall I set up Multiple Core or Single core for the following use case:



I have X number of users.



When I do a search, I always know for which user I am doing a search



Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add a 
userId field to each document?



If I choose the 1 core solution then I am concerned with performance.
Let's say I search for "NewYork" ... If lucene returns all "New York"
matches for all users and then filters based on the userId, then this
is going to be less efficient than if I have sharded per user and send
the request for "New York" to the user's core



Thank you for your help



matt

RE: Solr wiki link broken

2010-01-26 Thread Chris Hostetter


: You are right. The wiki can't be read if the preferred language is not 
English.
: The wiki system seems to implement or be configured to use a wrong way of 
choosing its locale.
: Erik, let me know if I can help solving this.

Interesting.  

When accessing "http://wiki.apache.org/solr/"; MoinMoin evidently picks a 
"translated" version of the page to show each user based on the 
"Accept-Language" header sent by the browser.  If it's "en" or unset, you 
get the same thing as http://wiki.apache.org/solr/FrontPage -- but if you 
have some other prefered langauge configured in your browser, then you get 
a differnet page, for example "de" causes 
http://wiki.apache.org/solr/StartSeite to be loaded instead.

(this behavior can be forced inspite of the "Accept-Language" header 
sent by the browser if you are logged into the wiki and change the 
"Preferred langauge" setting from "" to something else 
... but i don't recommend it since i was stuck with German for about 10 
minutes and got 500 errors every time i tried to change my prefrences 
back)

This is presumably designed to make it easy to support a multilanguage 
wiki, with users getting langauge specific "homepages" that can then link 
out to lanaguge specific versions of pages -- but that doesn't really help 
us much since we don't have any meaninful content on those langauge 
specific homepages.

According to this...
http://wiki.apache.org/solr/HelpOnLanguages

...we should be deleting all those unused pages, or have INFRA change or 
wiki config so that something other then FrontPage is out default (which 
now explains why Lunce-Java has "FrontPageEN" as the default)

Any volunteers to help purge the wiki of (effectively) blank translation 
pages? ... it looks like they all (probably) have have comment 
"##master-page:FrontPage" at the top, so they should be easy to identify 
even if you don't speak the language ... but they aren't very easy to 
search for since those comments don't appear in the generated page.


-Hoss

How to index the fields as key value pair if a query returns multiple rows

2010-01-26 Thread JavaGuy84


Hi all,

I have a scenario where a particular query returns multiple results and I
need to map those results as a key value pair.

Ex:

http://old.nabble.com/How-to-index-the-fields-as-key-value-pair-if-a-query-returns-multiple-rows-tp27332475p27332475.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Comparison of Solr with Sharepoint Search

2010-01-26 Thread Fuad Efendi


I can only tell that Liferay Portal (WebDAV) Document Library Portlet has
same functionality as Sharepoint (it has even /servlet/ URL with suffix
'/sharepoint'); Liferay also has plugin (web-hook) for SOLR (it has generic
search wrapper; any kind of search service provider can be hooked in
Liferay)
All assets (web content, message board posts, documents, and etc.) can
implement "indexing" interface and get indexed (Lucene, SOLR, etc)

So far, it is the best approach. You can enjoy configuring SOLR
analyzers/fields/language/stemmers/dictionaries/... You can't do it with
MS-Sharepoint (or, for instance, their close competitors Alfresco)!!!

-Fuad
http://www.tokenizer.ca


> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: January-26-10 7:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Comparison of Solr with Sharepoint Search
> 
> 
> : Has anyone done a functionality comparison of Solr with
> Sharepoint/Fast
> : Search?
> 
> there's been some discussion on this over the years comparing Solr with
> FAST if you go looking for it...
> 
> http://old.nabble.com/SOLR-X-FAST-to14284618.html
> http://old.nabble.com/Replacing-FAST-functionality-at-sesam.no-
> td19186109.html
> http://old.nabble.com/Experiences-from-migrating-from-FAST-to-Solr-
> td26371613.html
> http://sesat.no/moving-from-fast-to-solr-review.html
> 
> ...i have no idea about Sharepoint Search (isn't that actaully a
> seperate
> system? ... Microsoft Search Server or something?)
> 
> 
> -Hoss

Re: Basic questions about Solr cost in programming time

2010-01-26 Thread Peter Wolanin

Having worked quite a bit on the Drupal integration - here's my quick take:

If you have someone help you the first time, you can have a basic
implementation running in Jetty in about 15 minutes.  On your own, a
couple hours maybe. For a non-public site (intranet) with modest
traffic and no requirements for high availability, that is likely
going to hold you for a while.

If you are not already using tomcat6 and want a more robust
deployment, getting that right will take you a couple days work I'd
guess.

There are already some options for indexing/searching documents via
the Drupal integration, but that's still a little rough.

Of course, we'd also be happy to have you get Drupal support and a
hosted Solr index from us at Acquia.
http://acquia.com/products-services/acquia-search-features  However, I
don't think you'll readily be able to use our service with Jive at the
moment - you don't really describe why you'd be using both Jive and
Drupal.

If you are not doing any customization and compiling the java isn't
something you enjoy, I'd think the certified distribution is a fine
place to start and you can get with it Lucid's free PDF book, which
is, I think, by far the best and most comprehensive Solr 1.4 reference
work that exists at the moment.

-Peter

On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump  wrote:
> Hi,
> I hope this message is OK for this list.
>
> I'm looking into search solutions for an intranet site built with Drupal.
> Eventually we'd like to scale to enterprise search, which would include the
> Drupal site, a document repository, and Jive SBS (collaboration software).
> I'm interested in Lucene/Solr because of its scalability, faceted search and
> optimization features, and because it is free. Our problem is that we are a
> non-profit organization with only three very busy programmers/sys admins
> supporting our employees around the world.
>
> To help me argue for Solr in terms of total cost, I'm hoping that members of
> this list can share their insights about the following:
>
> * About how many hours of programming did it take you to set up your
> instance of Lucene/Solr (not counting time spent on optimization)?
>
> * Are there any disadvantages of going with a certified distribution rather
> than the standard distribution?
>
>
> Thanks and best regards,
> Jeff
>
> Jeff Crump
> jcr...@hq.mercycorps.org
>
>
>
>
>
>
>
>
>
>
>

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Solr 1.4 - stats page slow

2010-01-26 Thread Peter Wolanin

Sorry for not following up sooner- been a busy last couple weeks.

We do see a significant instanity count - could this be due to
updating indexes from the dev Solr build?  E.g. on one server I see


  61


and entries like:


  SUBREADER: Found caches for decendents of
org.apache.lucene.index.readonlydirectoryrea...@2b8d6cbf+created

'org.apache.lucene.index.readonlydirectoryrea...@2b8d6cbf'=>'created',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#2002656056
(size =~ 74.4 KB)

'org.apache.lucene.store.niofsdirectory$niofsindexin...@47adeb94'=>'created',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1099177573
(size =~ 74.4 KB)




  SUBREADER: Found caches for decendents of
org.apache.lucene.index.readonlydirectoryrea...@d0340a9+created

'org.apache.lucene.index.readonlydirectoryrea...@d0340a9'=>'created',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#868132357
(size =~ 831.2 KB)

'org.apache.lucene.store.niofsdirectory$niofsindexin...@78802615'=>'created',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#1542727931
(size =~ 831.2 KB)




And I think it's higher on the one associated with the screenshot.

using the lucene checkIndex tool does not show any errors.

Most of what we want is returned by the Luke handler, except for the
pending adds and deletes and the index size.  I can hack around this
by creating a greatly reduced stats.jsp, but I'd also liek to
understand what we are experiencing.

-Peter

On Fri, Jan 8, 2010 at 1:38 PM, Mark Miller  wrote:
> Yonik Seeley wrote:
>> On Fri, Jan 8, 2010 at 1:03 PM, Mark Miller  wrote:
>>
>>> It should be fixed in trunk, but that was after 1.4. Currently, it
>>> should only do it if it sees insanity - which there shouldn't be any
>>> with stock Solr.
>>>
>>
>> http://svn.apache.org/viewvc/lucene/solr/tags/release-1.4.0/src/java/org/apache/solr/search/SolrFieldCacheMBean.java
>> http://svn.apache.org/viewvc?view=revision&revision=826788
>> Seems like it's there? Or was it a different commit?
>>
>> Perhaps there is just real instanity... which may be unavoidable at
>> this point since not everything in solr is done per-segment yet.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
> Your right - when looking at the Solr release date, I quickly took the
> 10 as October - but it was 11/10, so it is in 1.4.
>
> So people seeing this should also being seeing an insanity count over one.
>
> I'd think that would be rarer than one this sounds like though ... whats
> left that could cause insanity?
>
> We should prob switch to never calculating the size unless an explicit
> param is pass to the stats page.
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: schema.xml and Xinclude

2010-01-26 Thread Peter Wolanin

It doesn't really work with the schema.xml - I beat my head on it for
a few hours not long ago - maybe I sent an e-mail to this list about
it?

Yes, here:  
http://www.lucidimagination.com/search/document/ba68aa6f2f7702c3/is_it_possible_to_use_xinclude_in_schema_xml

-Peter

On Wed, Jan 6, 2010 at 8:36 AM, Patrick Sauts  wrote:
> As  in schema.xml are the same between all our indexes, I'd like to
> make them an XInclude so I tried :
>
> 
>
>  xmlns:xi="http://www.w3.org/2001/XInclude";>
>
>  
> 
> -
> -
> -
> 
>
> My Syntax might not be correct ?
> Or it is not possible ? yet ?
>
> Thank you again for your time.
>
> Patrick.
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Re: Solr 1.4 - stats page slow

2010-01-26 Thread Yonik Seeley

On Tue, Jan 26, 2010 at 8:49 PM, Peter Wolanin  wrote:
> Sorry for not following up sooner- been a busy last couple weeks.
>
> We do see a significant instanity count - could this be due to
> updating indexes from the dev Solr build?  E.g. on one server I see

Do you both sort (or use a function query) and facet on the "created" field?
Faceting on single-valued fields is still currently done at the
top-level reader, while sorting and function queries are at a segment
level.

-Yonik
http://www.lucidimagination.com

Re: Multiple Cores Vs. Single Core for the following use case

2010-01-26 Thread Trey

Hi Matt,

In most cases you are going to be better off going with the userid method
unless you have a very small number of users and a very large number of
docs/user. The userid method will likely be much easier to manage, as you
won't have to spin up a new core every time you add a new user.  I would
start here and see if the performance is good enough for your requirements
before you start worrying about it not being efficient.

That being said, I really don't have any idea what your data looks like.
How many users do you have?  How many documents per user?  Are any documents
shared by multiple users?

-Trey

On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
wrote:

> Hi
>
>
>
> Shall I set up Multiple Core or Single core for the following use case:
>
>
>
> I have X number of users.
>
>
>
> When I do a search, I always know for which user I am doing a search
>
>
>
> Shall I set up X cores, 1 for each user ? Or shall I set up 1 core and add
> a userId field to each document?
>
>
>
> If I choose the 1 core solution then I am concerned with performance.
> Let's say I search for "NewYork" ... If lucene returns all "New York"
> matches for all users and then filters based on the userId, then this
> is going to be less efficient than if I have sharded per user and send
> the request for "New York" to the user's core
>
>
>
> Thank you for your help
>
>
>
> matt
>
>
>
>
>
>
>

Re: NullPointerException in ReplicationHandler.postCommit + question about compression

2010-01-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

never keep a 0.

It is better to leave not mention the deletionPolicy at all. The
defaults are usually fine.

On Fri, Jan 22, 2010 at 11:12 AM, Stephen Weiss  wrote:
> Hi Shalin,
>
> Thanks for your reply.  Please see below.
>
>
> On Jan 18, 2010, at 4:19 AM, Shalin Shekhar Mangar wrote:
>
>> On Wed, Jan 13, 2010 at 12:51 AM, Stephen Weiss
>> wrote:
>> ...
>
>>>  When we replicate
>>> manually (via the admin page) things seem to go well.  However, when
>>> replication is triggered by a commit event on the master, the master gets
>>> a
>>> NullPointerException and no replication seems to take place.
>>>
>>> SEVERE: java.lang.NullPointerException

      at

 org.apache.solr.handler.ReplicationHandler$4.postCommit(ReplicationHandler.java:922)
      at...
>>>
>>> Does anyone know off the top of their head what this might indicate, or
>>> know what further troubleshooting steps we should be taking to isolate
>>> the
>>> issue?
>>>
>>
>> That is a strange one. It looks like the latest commit point was null. Do
>> you have a deletion policy section in your solrconfig.xml? Are you always
>> able to reproduce the exception?
>
> We are always able to reproduce the exception.
>
> The master has committed changes many times for over a year now... so if
> that's what's being reported, it's not quite accurate.
>
> This is our deletion policy.  I don't believe that I've edited it, it is
> probably verbatim from the example (the example of what version of Solr, I
> can't tell you for sure, but I imagine it's from 1.2 or 1.4 - we never
> updated the config when using 1.3).
>
>>    
>>      1
>>      0
>>    
>
> (removing comments per Noble Paul's request...  we keep these in the file
> for our own readability purposes but agreed, we have no need to e-mail them
> along)
>
> I would have never thought to look there but it does seem suspicious now
> that you mention it.  For a proper replication configuration where we
> replicate on commit, is there a recommended setting?
>
>>> ...
>>>
>> During our tests we found that enabling compression on a gigabit ethernet
>> actually degrades transfer rate because of the compress/de-compress
>> overhead. Just comment out that line to disable compression.
>
> Thank you for the clarification.  We will comment it out.
>
> --
> Steve



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Wildcard Search and Filter in Solr

2010-01-26 Thread ashokcz


Hi just looked at the analysis.jsp and found out what it does during index /
query

Index Analyzer
Intel 
intel 
intel 
intel 
intel 
intel 

Query Analyzer
Inte* 
Inte* 
inte* 
inte 
inte 
inte 
int 

I think somewhere my configuration or my definition of the type "text" is
wrong.
This is my configuration . 


  
  
  

  
  
  
   
   


 
  
  
  
  
  
  
   
  
 


I think i am missing some basic configuration for doing wildcard searches .
but could not figure it out .
can someone help please


Ahmet Arslan wrote:
> 
> 
>> Hi , 
>> I m trying to use wildcard keywords in my search term and
>> filter term . but
>> i didnt get any results.
>> Searched a lot but could not find any lead .
>> Can someone help me in this.
>> i m using solr 1.2.0 and have few records indexed with
>> vendorName value as
>> Intel
>> 
>> In solr admin interface i m trying to do the search like
>> this 
>> 
>> http://localhost:8983/solr/select?indent=on&version=2.2&q=intel&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
>> 
>> and i m getting the result properly 
>> 
>> but when i use q=inte* no records are returned.
>> 
>> the same is the case for Filter Query on using
>> &fq=VendorName:"Intel" i get
>> my results.
>> 
>> but on using &fq=VendorName:"Inte*" no results are
>> returned.
>> 
>> I can guess i doing mistake in few obvious things , but
>> could not figure it
>> out ..
>> Can someone pls help me out :) :)
> 
> If &q=intel returns documents while q=inte* does not, it means that
> fieldType of your defaultSearchField is reducing the token intel into
> something. 
> 
> Can you find out it by using /admin/anaysis.jsp what happens to "Intel
> intel" at index and query time?
> 
> What is your defaultSearchField? Is it VendorName?
> 
> It is expected that &fq=VendorName:Intel returns results while
> &fq=VendorName:Inte* does not. Because prefix queries are not analyzed.
>  
> 
> But it is strange that q=inte* does not return anything. Maybe your index
> analyzer is reducing Intel into int or ıntel? 
> 
> I am not 100% sure but solr 1.2.0  may use default locale in lowercase
> operation. What is your default locale?
> 
> It is better to see what happens word Intel using analysis.jsp page.
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Wildcard-Search-and-Filter-in-Solr-tp27306734p27334486.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-01-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

There is no corresponding DataSurce which can be used with
TikaEntityProcessor which reads from BLOB
I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737

On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal  wrote:
> Hi,
>
>
>
> I am fairly new to Solr and would like to use the DIH to pull rich text
> files (pdfs, etc) from BLOB fields in my database.
>
>
>
> There was a suggestion made to use the FieldReaderDataSource with the
> recently commited TikaEntityProcessor.  Has anyone accomplished this?
>
> This is my configuration, and the resulting error - I'm not sure if I'm
> using the FieldReaderDataSource correctly.  If anyone could shed light
> on whether I am going the right direction or not, it would be
> appreciated.
>
>
>
> ---Data-config.xml:
>
> 
>
>   
>
>    url="jdbc:oracle:thin:un/p...@host:1521:sid" />
>
>      
>
>      
>
>          dataField="attach.attachment" format="text">
>
>            
>
>         
>
>      
>
>   
>
> 
>
>
>
>
>
> -Debug error:
>
> 
>
> 
>
> 0
>
> 203
>
> 
>
> 
>
> 
>
> testdb-data-config.xml
>
> 
>
> 
>
> full-import
>
> debug
>
> 
>
> 
>
> 
>
> 
>
> select id as name, attachment from testtable2
>
> 0:0:0.32
>
> --- row #1-
>
> java.math.BigDecimal:2
>
> oracle.sql.BLOB:oracle.sql.b...@1c8e807
>
> -
>
> 
>
> 
>
> org.apache.solr.handler.dataimport.DataImportHandlerException: No
> dataSource :f1 available for entity :253433571801723 Processing Document
> # 1
>
>                at
> org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
> taImporter.java:279)
>
>                at
> org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl
> .java:93)
>
>                at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit
> yProcessor.java:97)
>
>                at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity
> ProcessorWrapper.java:237)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:357)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j
> ava:383)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java
> :242)
>
>                at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18
> 0)
>
>                at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:331)
>
>                at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :389)
>
>                at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D
> ataImportHandler.java:203)
>
>                at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
> ase.java:131)
>
>                at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
>                at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
> va:338)
>
>                at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
> ava:241)
>
>                at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan
> dler.java:1089)
>
>                at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>
>                at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2
> 16)
>
>                at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>
>                at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>
>                at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>
>                at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler
> Collection.java:211)
>
>                at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav
> a:114)
>
>                at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>
>                at org.mortbay.jetty.Server.handle(Server.java:285)
>
>                at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>
>                at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne
> ction.java:821)
>
>                at
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>
>                at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>
>                at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>
>                at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.jav
> a:226)
>
>                at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.ja
> va:442)
>
>
>
> Thanks,
>
> Nirmal
>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

Re: Fastest way to use solrj

2010-01-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

if you write only a few docs you may not observe much difference in
size. if you write large no:of docs you may observe a big difference.

2010/1/27 Tim Terlegård :
> I got the binary format to work perfectly now. Performance is better
> than with xml. Thanks!
>
> Although, it doesn't look like a binary file is smaller in size than
> an xml file?
>
> /Tim
>
> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् :
>> 2010/1/21 Tim Terlegård :
>>> Yes, it worked! Thank you very much. But do I need to use curl or can
>>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't
>>> use BinaryWriter then I don't know how to do this.
>> if your data is serialized using JavaBinUpdateRequestCodec, you may
>> POST it using curl.
>> If you are writing directly , use CommonsHttpSolrServer
>>>
>>> /Tim
>>>
>>> 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ् :
 2010/1/20 Tim Terlegård :
 BinaryRequestWriter does not read from a file and post it
>>>
>>> Is there any other way or is this use case not supported? I tried this:
>>>
>>> $ curl /solr/update/javabin -F stream.file=/tmp/data.bin
>>> $ curl /solr/update -F stream.body=' '
>>>
>>> Solr did read the file, because solr complained when the file wasn't
>>> in the format the JavaBinUpdateRequestCodec expected. But no data is
>>> added to the index for some reason.
>
>> how did you create the file /tmp/data.bin ? what is the format?
>
> I wrote this in the first email. It's in the javabin format (I think).
> I did like this (groovy code):
>
>   fieldId = new NamedList()
>   fieldId.add("name", "id")
>   fieldId.add("val", "9-0")
>   fieldId.add("boost", null)
>   fieldText = new NamedList()
>   fieldText.add("name", "text")
>   fieldText.add("val", "Some text")
>   fieldText.add("boost", null)
>   fieldNull = new NamedList()
>   fieldNull.add("boost", null)
>   doc = [fieldNull, fieldId, fieldText]
>   docs = [doc]
>   root = new NamedList()
>   root.add("docs", docs)
>   fos = new FileOutputStream("data.bin")
>   new JavaBinCodec().marshal(root, fos)
>
> /Tim
>
 JavaBin is a format.
 use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
 updateRequest, OutputStream os)

 The output of this can be posted to solr and it should work



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com

>>>
>>
>>
>>
>> --
>> -
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com

60 matches

Mail list logo