Kinda-sorta realtime?

2010-04-16 Thread Don Werve
We're using Solr as the backbone for our shiny new helpdesk application, and
by and large it's been a big win... especially in terms of search
performance.  But before I pat myself on the back because the Solr devs have
done a great job, I had a question regarding commit frequency.

While our app doesn't need truly realtime search, documents get updated and
replaced somewhat frequently, and those changes need to be visible in the
index within 500ms.  At the moment, I'm using autocommit to satisfy this,
but I've run across a few threads mentioning that frequent commits may cause
some serious performance issues.

Our average document size is quite small (less than 10k), and I'm expecting
that we're going to have a maximum of around 100k documents per day on any
given index; most of these will be replacing existing documents.

So, rather than getting bitten by this down the road, I figure I may as well
(a) ask if anybody else here is running a similar setup or has any input,
and then (b) do some heavy load testing via a fake data generator.

Thanks-in-advance!


problem querying date field

2010-04-16 Thread Jan-Olav Eide
I have the follwing field in my schema :



Querying for another field, I verify that the value is being set as expected.

http://localhost:8080/apache-solr-1.4.0/select/?q=url:www.vg.no 


...
2010-04-16T10:47:25.282Z

however, querying on a date-range that definitely includes this document, I get 
no documents returned:

http://localhost:8080/apache-solr-1.4.0/select/?q=indextime:NOW-1DAY%20TO%20NOW%20+1DAY


What is wrong with my query here ? 

--
jo

Re: Kinda-sorta realtime?

2010-04-16 Thread Peter Sturge
Hi Don,

We've got a similar requirement in our environment - here's what we've
found..
Every time you commit, you're doing a relatively disk I/O intensive task to
insert the document(s) into the index.

For very small indexes (say, <10,000 docs), the commit time is pretty short
and you can get away with doing frequent commits. With large indexes,
commits can take seconds to complete, and use a fair bit of CPU & disk
resource along the way. This of course impacts search performance, and it
won't get your docs searchable within your 500ms requirement.

The planned NRT (near real-time) feature (I believe scheduled for 1.5?) is
probably what you need, where Lucene commits are done on a pre-segment
basis.

You could also check out the Zoie plugin, but make sure you're not also
committing to disk straightaway, and that you don't mind having to reinput
some data if your server crashes (Zoie uses an in-memory lookup for new doc
insertions).

HTH
Peter


On Fri, Apr 16, 2010 at 10:13 AM, Don Werve  wrote:

> We're using Solr as the backbone for our shiny new helpdesk application,
> and
> by and large it's been a big win... especially in terms of search
> performance.  But before I pat myself on the back because the Solr devs
> have
> done a great job, I had a question regarding commit frequency.
>
> While our app doesn't need truly realtime search, documents get updated and
> replaced somewhat frequently, and those changes need to be visible in the
> index within 500ms.  At the moment, I'm using autocommit to satisfy this,
> but I've run across a few threads mentioning that frequent commits may
> cause
> some serious performance issues.
>
> Our average document size is quite small (less than 10k), and I'm expecting
> that we're going to have a maximum of around 100k documents per day on any
> given index; most of these will be replacing existing documents.
>
> So, rather than getting bitten by this down the road, I figure I may as
> well
> (a) ask if anybody else here is running a similar setup or has any input,
> and then (b) do some heavy load testing via a fake data generator.
>
> Thanks-in-advance!
>


Re: problem querying date field

2010-04-16 Thread Erik Hatcher

You need to use brackets around range queries.  See 
http://wiki.apache.org/solr/SolrQuerySyntax

Erik

On Apr 16, 2010, at 7:08 AM, Jan-Olav Eide wrote:


I have the follwing field in my schema :

default="NOW" multiValued="false"/>


Querying for another field, I verify that the value is being set as  
expected.


http://localhost:8080/apache-solr-1.4.0/select/?q=url:www.vg.no


...
2010-04-16T10:47:25.282Z

however, querying on a date-range that definitely includes this  
document, I get no documents returned:


http://localhost:8080/apache-solr-1.4.0/select/?q=indextime:NOW-1DAY%20TO%20NOW%20+1DAY


What is wrong with my query here ?

--
jo




Getting the length of a field?

2010-04-16 Thread Oliver Beattie
Hi there,

I'm looking around to see if there's a function that will return the length
of a string in a field, but not seeing one. This is a field whose data I
store, but don't use for querying generally, but I want to be able to take
its length into account. Is this possible?

Any help much appreciated :)

—Oliver


Re: StreamingUpdateSolrServer hangs

2010-04-16 Thread Sascha Szott

Hi Yonik,

Yonik Seeley wrote:

Stephen, were you running stock Solr 1.4, or did you apply any of the
SolrJ patches?
I'm trying to figure out if anyone still has any problems, or if this
was fixed with SOLR-1711:
I'm using the latest trunk version (rev. 934846) and constantly running 
into the same problem. I'm using StreamingUpdateSolrServer with 3 treads 
and a queue size of 20 (not really knowing if this configuration is 
optimal). My multi-threaded application indexes 200k data items 
(bibliographic metadata in Dublin Core format) and constantly hangs 
after running for some time.


Below you can find the thread dump of one of my index threads (after the 
app hangs all dumps are the same)


"thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on 
condition [0x42d05000]

   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x7fe8cdcb7598> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
	at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
	at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
	at 
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216)
	at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
	at 
de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29)
	at 
de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10)
	at 
de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59)

at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30)
at de.kobv.ked.rss.RssThread.run(RssThread.java:58)



and of the three SUSS threads:

"pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in 
Object.wait() [0x409ac000]

   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
	- waiting on <0x7fe8cdcb6f10> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
	at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
	- locked <0x7fe8cdcb6f10> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
	at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
	at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
	at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at 
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-2" prio=10 tid=0x7fe8c7afa000 nid=0x277f in 
Object.wait() [0x40209000]

   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
	- waiting on <0x7fe8cdcb6f10> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
	at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
	- locked <0x7fe8cdcb6f10> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
	at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
	at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
	at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at 
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-1" prio=10 tid=0x7fe8c79f2800 nid=0x277e in 
Object.wait() [0x42e06000]

   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
	- waiting on <0x7fe8cdcb6f10> (a 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
	at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.

Re: StreamingUpdateSolrServer hangs

2010-04-16 Thread Yonik Seeley
Thanks for the report Sascha.
So after the hang, it never recovers?  Some amount of hanging could be
visible if there was a commit on the Solr server or something else to
cause the solr requests to block for a while... but it should return
to normal on it's own...

Looking at the stack trace, it looks like threads are blocked waiting
to get an http connection.

I'm traveling all next week, but I'll open a JIRA issue for this now.
Anything that would help us reproduce this is much appreciated.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Fri, Apr 16, 2010 at 8:57 AM, Sascha Szott  wrote:
> Hi Yonik,
>
> Yonik Seeley wrote:
>>
>> Stephen, were you running stock Solr 1.4, or did you apply any of the
>> SolrJ patches?
>> I'm trying to figure out if anyone still has any problems, or if this
>> was fixed with SOLR-1711:
>
> I'm using the latest trunk version (rev. 934846) and constantly running into
> the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a
> queue size of 20 (not really knowing if this configuration is optimal). My
> multi-threaded application indexes 200k data items (bibliographic metadata
> in Dublin Core format) and constantly hangs after running for some time.
>
> Below you can find the thread dump of one of my index threads (after the app
> hangs all dumps are the same)
>
> "thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on condition
> [0x42d05000]
>   java.lang.Thread.State: WAITING (parking)
>        at sun.misc.Unsafe.park(Native Method)
>        - parking to wait for  <0x7fe8cdcb7598> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>        at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>        at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
>        at
> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216)
>        at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
>        at
> de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29)
>        at
> de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10)
>        at
> de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59)
>        at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30)
>        at de.kobv.ked.rss.RssThread.run(RssThread.java:58)
>
>
>
> and of the three SUSS threads:
>
> "pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in Object.wait()
> [0x409ac000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x7fe8cdcb6f10> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
>        - locked <0x7fe8cdcb6f10> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>        at
> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
> "pool-1-thread-2" prio=10 tid=0x7fe8c7afa000 nid=0x277f in Object.wait()
> [0x40209000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x7fe8cdcb6f10> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
>        - locked <0x7fe8cdcb6f10> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
>        at
> org.apach

Re: SOLR Exact match problem - Punctuations, double quotes etc.

2010-04-16 Thread Erick Erickson
Well, I think that's part of your problem. WhitespaceAnalyzer does
exactly what it says, splits on whitespace. So indexing "carbon" and
searching "carbon." won't generate a hit.

If KeywordAnalyzer doesn't work for you, you could consider either
using one of the Pattern* guys or write your own.

HTH
Erick



On Fri, Apr 16, 2010 at 12:18 AM, Hid-Mubarmij  wrote:

>
> Hi Erick,
>
> Thanks, I am using solr.WhitespaceTokenizerFactory and
> solr.LowerCaseFilterFactory for both index and query time.
> Following is the complete field i am using schema.xml:
> ==
>  positionIncrementGap="100">
>
>  
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   
>
>
> 
> ==
>
>
> --
> View this message in context:
> http://n3.nabble.com/SOLR-Exact-match-problem-Punctuations-double-quotes-etc-tp720807p723099.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: XSD for Solrv1.4

2010-04-16 Thread Stefan Maric
Thanks I'm taking a look at SolrJ

Longer term i'd still like to have access to an XSD - then i can see us
integrating this better in the Oracle Service Bus and writing less Java code
in our webApp



-Original Message-
From: hkmortensen [mailto:ko...@yahoo.com]
Sent: 15 April 2010 21:26
To: solr-user@lucene.apache.org
Subject: Re: XSD for Solrv1.4




Smaric-2 wrote:
>
> Are there any plans to release an xsd (& preferably a set of JAXB classes)
> so we can process the xml returned for a search request
>
>
>

I do not know. I would recommend to use solrj, the java client. I always do
that myself, do you have a reason not to do that?

--
View this message in context:
http://n3.nabble.com/weight-and-ranking-tp720944p722255.html
Sent from the Solr - User mailing list archive at Nabble.com.
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.437 / Virus Database: 271.1.1/2812 - Release Date: 04/15/10
06:31:00



Solr Index Lock Issue

2010-04-16 Thread Sethi, Parampreet
Hi All,
 
We are facing the issue with the Solr server in the DMOZ data migration.
The Solr has 0 records when the migration starts and the data is added
into Solr in the batches of 2 records. The commit is called on Solr
after 20k records are processed. 
 
While commiting the data into Solr, a lucene lock file is created in the
/data/index folder which is automatically released once the
successful commit happens. But after 4-5 batches, the lock file remains
there and Solr just hangs and does not add any new records. Some times
the whole migration goes through without any errors.
 
Kindly let me know in case some setting needs to be required on Solr
side, which ensures that until the Solr commits the index, the next set
of records should not be added.
 
Thanks,
Param


Re: StreamingUpdateSolrServer hangs

2010-04-16 Thread Sascha Szott

Hi Yonik,

thanks for your fast reply.

Yonik Seeley wrote:

Thanks for the report Sascha.
So after the hang, it never recovers?  Some amount of hanging could be
visible if there was a commit on the Solr server or something else to
cause the solr requests to block for a while... but it should return
to normal on it's own...
In my case the whole application hangs and never recovers (CPU 
utilization goes down to near 0%). Interestingly, the problem 
reproducibly occurs only if SUSS is created with *more than 2* threads.



Looking at the stack trace, it looks like threads are blocked waiting
to get an http connection.
I forgot to mention that my index app has exclusive access to the Solr 
instance. Therefore, concurrent searches against the same Solr instance 
while indexing are excluded.



I'm traveling all next week, but I'll open a JIRA issue for this now.

Thank you!


Anything that would help us reproduce this is much appreciated.

Are there any other who have experienced the same problem?

-Sascha



On Fri, Apr 16, 2010 at 8:57 AM, Sascha Szott  wrote:

Hi Yonik,

Yonik Seeley wrote:


Stephen, were you running stock Solr 1.4, or did you apply any of the
SolrJ patches?
I'm trying to figure out if anyone still has any problems, or if this
was fixed with SOLR-1711:


I'm using the latest trunk version (rev. 934846) and constantly running into
the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a
queue size of 20 (not really knowing if this configuration is optimal). My
multi-threaded application indexes 200k data items (bibliographic metadata
in Dublin Core format) and constantly hangs after running for some time.

Below you can find the thread dump of one of my index threads (after the app
hangs all dumps are the same)

"thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on condition
[0x42d05000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for<0x7fe8cdcb7598>  (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
at
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at
de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29)
at
de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10)
at
de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59)
at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30)
at de.kobv.ked.rss.RssThread.run(RssThread.java:58)



and of the three SUSS threads:

"pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in Object.wait()
[0x409ac000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on<0x7fe8cdcb6f10>  (a
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
- locked<0x7fe8cdcb6f10>  (a
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

"pool-1-thread-2" prio=10 tid=0x7fe8c7afa000 nid=0x277f in Object.wait()
[0x40209000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on<0x7fe8cdcb6f10>  (a
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
- locked<0x7fe8cdcb6f10>  (a
org.apache.commons.http

Re: Handling missing date fields in a date-oriented function query

2010-04-16 Thread Chris Harris
I still like this approach, but I've discovered one wrinkle, which is
that I have dates in my dataset dated at the epoch (i.e. midnight Jan
1, 1970), as well as before the epoch (e.g. midnight Jan 1, 1950).

The docs dated *before* the epoch so far don't seem to be a problem;
they end up having a negative numeric value, but that seems workable.

The docs dated exactly *at* the epoch, though, are trouble, because I
can't tell those docs apart from the undated docs in my function
query. (Both end up with numeric value of 0 in the function query
code. So missing date is the same as midnight Jan 1, 1970.)

So far, in my case, the best bet seems to be changing the time
component of my dates. So rather than rounding dates to the nearest
midnight (e.g. 1970-01-01T00:00:00Z), I could round them to the
nearest, say, 1AM (e.g. 1970-01-01T01:00:00Z), with the goal of making
sure that none of my legitimate date field values will evaluate to
numeric value 0. Since I don't show the time component of dates to my
users, I don't think this would cause any real trouble. It feels
slightly unclean, though.

On Thu, Apr 8, 2010 at 1:05 PM, Chris Harris  wrote:
> If anyone is curious, I've created a patch that creates a variant of
> map that can be used in the way indicated below. See
> http://issues.apache.org/jira/browse/SOLR-1871
>
> On Wed, Apr 7, 2010 at 3:41 PM, Chris Harris  wrote:
>
>> Option 1. Use map
>>
>> The most obvious way to do this would be to wrap the reference to
>> mydatefield inside a map, like this:
>>
>>    recip(ms(NOW,map(mydatefield,0,0,ms(NOW)),3.16e-11,1,1))
>>
>> However, this throws an exception because map is hard-coded to take
>> float constants, rather than arbitrary subqueries.
>


Re: Handling missing date fields in a date-oriented function query

2010-04-16 Thread Yonik Seeley
On Fri, Apr 16, 2010 at 4:42 PM, Chris Harris  wrote:
> The docs dated exactly *at* the epoch, though, are trouble, because I
> can't tell those docs apart from the undated docs in my function
> query.

Neither can Solr currently... it's a Lucene FieldCache limitation.
The other thing we can't do because of this limitation is
sortMissingFirst, sortMissingLast like we can do with string based
fields.  I'm hopeful we'll be able to get this added to Lucene, and
that will enable us to truly deprecate the string-based numeric
fields.  We'll also be able to add true defaults in function queries
for documents without a value.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


Re: Solr Index Lock Issue

2010-04-16 Thread Otis Gospodnetic
Hi,

What you are doing sounds fine.  You don't need to commit while indexing, 
though, just commit/optimize at the end.  I'm not saying this will solve your 
problem, but give it a try.
 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: "Sethi, Parampreet" 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 16, 2010 1:13:57 PM
> Subject: Solr Index Lock Issue
> 
> Hi All,

We are facing the issue with the Solr server in the DMOZ data 
> migration.
The Solr has 0 records when the migration starts and the data is 
> added
into Solr in the batches of 2 records. The commit is called on 
> Solr
after 20k records are processed. 

While commiting the data into 
> Solr, a lucene lock file is created in the
/data/index 
> folder which is automatically released once the
successful commit happens. 
> But after 4-5 batches, the lock file remains
there and Solr just hangs and 
> does not add any new records. Some times
the whole migration goes through 
> without any errors.

Kindly let me know in case some setting needs to be 
> required on Solr
side, which ensures that until the Solr commits the index, 
> the next set
of records should not be added.

Thanks,
Param


Sum of return fields

2010-04-16 Thread Jim Adams
Is it possible to add or subtract a value and return that field from
the index in solr? Or do you have to do it programmatically
afterwards?

Thanks!


Re: Getting the length of a field?

2010-04-16 Thread Otis Gospodnetic
Hm, I don't follow what you are looking to do, Oliver.  You want to take the 
field length into account. when indexing?  Or when searching?  You want it 
to affect relevance?

You can certainly get the length of the String (original value) in a field 
*after* you get your result set, but that's probably not what you are after, 
because that's just a length()-type call.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Oliver Beattie 
> To: solr-user 
> Sent: Fri, April 16, 2010 8:52:58 AM
> Subject: Getting the length of a field?
> 
> Hi there,

I'm looking around to see if there's a function that will 
> return the length
of a string in a field, but not seeing one. This is a field 
> whose data I
store, but don't use for querying generally, but I want to be 
> able to take
its length into account. Is this possible?

Any help much 
> appreciated :)

—Oliver


Re: Sum of return fields

2010-04-16 Thread Otis Gospodnetic
Jim, like this:

https://issues.apache.org/jira/browse/SOLR-1298 ?

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Jim Adams 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 16, 2010 6:22:00 PM
> Subject: Sum of return fields
> 
> Is it possible to add or subtract a value and return that field from
the 
> index in solr? Or do you have to do it 
> programmatically
afterwards?

Thanks!


Re: run in background

2010-04-16 Thread Otis Gospodnetic
Better ye, use screent: http://www.manpagez.com/man/1/screen/

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Walter Underwood 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 15, 2010 11:31:31 PM
> Subject: Re: run in background
> 
> nohup my_command &

That will run "my_command" in the background and 
> "nohup" ignores the SIGHUP signal sent when you log out. Or, originally, 
> "hang 
> up" the modem.

wunder

On Apr 15, 2010, at 8:27 PM, Dan Yamins 
> wrote:

> Hi,
> 
> Normally I've been starting solr like 
> so:
> 
>   java -jar start.jar
> 
> However, I need 
> to have this process executed over a remove ssh connection
> that cannot 
> be blocking.   I'd therefore like to execute the process "in the
> 
> background", , somehow in  a forked process, so that the command 
> returns
> while having set solr to run in the child process.  Is 
> there a simple way to
> do this?
> 
> Thanks,
> 
> dan


Re: DIH dataimport.properties with

2010-04-16 Thread Otis Gospodnetic
Hm, why not just go to the MySQL master then?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Michael Tibben 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 15, 2010 10:15:14 PM
> Subject: DIH dataimport.properties with
> 
> Hi,

I am using the DIH to import data from a mysql slave. However, the 
> slave sometimes runs behind the master. The delay is variable, most of the 
> time 
> it is in sync, but sometimes can run behind by a few minutes.

This is a 
> problem, because DIH uses dataimport.properties to determine the 
> last_index_time 
> for delta updates. This last_index_time does not correspond to the position 
> of 
> the slave, and so documents are being missed.

What I need to be able to 
> do is tell DIH what the last_index_time should be. Or alternatively, be able 
> to 
> specify another property in dataimport.properties, perhaps called 
> datasource_version or similar.

Is this possible?


I have 
> thought of a sneaky way to hack around the issue. Just before the delta 
> update 
> is run, I will switch the system time to the mysql slave's replication time. 
> The 
> system is used for nothing but solr master, so I think this should work OK. 
> Any 
> thoughts?

Regards,

Michael


Re: Sum of return fields

2010-04-16 Thread Jim Adams
Yes, that's exactly it.  Looks like it is going in to 1.5...hmmm...guess
I'll have to do something programmatically instead as I'm not there yet.

On Fri, Apr 16, 2010 at 4:24 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Jim, like this:
>
> https://issues.apache.org/jira/browse/SOLR-1298 ?
>
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Jim Adams 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, April 16, 2010 6:22:00 PM
> > Subject: Sum of return fields
> >
> > Is it possible to add or subtract a value and return that field from
> the
> > index in solr? Or do you have to do it
> > programmatically
> afterwards?
>
> Thanks!
>


Re: StreamingUpdateSolrServer hangs

2010-04-16 Thread Rich Cariens
I experienced the hang described with the Solr 1.4.0 build.

Yonik - I also thought the streaming updater was blocking on commits but
updates never resumed.

To be honest I was in a bit of a rush to meet a deadline so after spending a
day or so tinkering I bailed out and just wrote a component by hand.  I have
not tried to reproduce this using the current trunk.  I was using the 32-bit
Sun JRE on a Red Hat EL 5 HP server.

I'm not sure if the following enriches this thread, but I'll include it
anyways: write a document generator and start adding a ton of 'em to a Solr
server instance using the streaming updater.  You *should* experience the
hang.

HTH,
Rich

On Fri, Apr 16, 2010 at 1:34 PM, Sascha Szott  wrote:

> Hi Yonik,
>
> thanks for your fast reply.
>
>
> Yonik Seeley wrote:
>
>> Thanks for the report Sascha.
>> So after the hang, it never recovers?  Some amount of hanging could be
>> visible if there was a commit on the Solr server or something else to
>> cause the solr requests to block for a while... but it should return
>> to normal on it's own...
>>
> In my case the whole application hangs and never recovers (CPU utilization
> goes down to near 0%). Interestingly, the problem reproducibly occurs only
> if SUSS is created with *more than 2* threads.
>
>
>  Looking at the stack trace, it looks like threads are blocked waiting
>> to get an http connection.
>>
> I forgot to mention that my index app has exclusive access to the Solr
> instance. Therefore, concurrent searches against the same Solr instance
> while indexing are excluded.
>
>
>  I'm traveling all next week, but I'll open a JIRA issue for this now.
>>
> Thank you!
>
>
>  Anything that would help us reproduce this is much appreciated.
>>
> Are there any other who have experienced the same problem?
>
> -Sascha
>
>
>
>> On Fri, Apr 16, 2010 at 8:57 AM, Sascha Szott  wrote:
>>
>>> Hi Yonik,
>>>
>>> Yonik Seeley wrote:
>>>

 Stephen, were you running stock Solr 1.4, or did you apply any of the
 SolrJ patches?
 I'm trying to figure out if anyone still has any problems, or if this
 was fixed with SOLR-1711:

>>>
>>> I'm using the latest trunk version (rev. 934846) and constantly running
>>> into
>>> the same problem. I'm using StreamingUpdateSolrServer with 3 treads and a
>>> queue size of 20 (not really knowing if this configuration is optimal).
>>> My
>>> multi-threaded application indexes 200k data items (bibliographic
>>> metadata
>>> in Dublin Core format) and constantly hangs after running for some time.
>>>
>>> Below you can find the thread dump of one of my index threads (after the
>>> app
>>> hangs all dumps are the same)
>>>
>>> "thread 19" prio=10 tid=0x7fe8c0415800 nid=0x277d waiting on
>>> condition
>>> [0x42d05000]
>>>   java.lang.Thread.State: WAITING (parking)
>>>at sun.misc.Unsafe.park(Native Method)
>>>- parking to wait for<0x7fe8cdcb7598>  (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>>at
>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>>at
>>>
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>>>at
>>>
>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
>>>at
>>>
>>> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(StreamingUpdateSolrServer.java:216)
>>>at
>>>
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>>>at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
>>>at
>>>
>>> de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:29)
>>>at
>>>
>>> de.kobv.ked.index.SolrIndexWriter.addIndexDocument(SolrIndexWriter.java:10)
>>>at
>>>
>>> de.kobv.ked.index.AbstractIndexThread.addIndexDocument(AbstractIndexThread.java:59)
>>>at de.kobv.ked.rss.RssThread.indiziere(RssThread.java:30)
>>>at de.kobv.ked.rss.RssThread.run(RssThread.java:58)
>>>
>>>
>>>
>>> and of the three SUSS threads:
>>>
>>> "pool-1-thread-3" prio=10 tid=0x7fe8c7b7f000 nid=0x2780 in
>>> Object.wait()
>>> [0x409ac000]
>>>   java.lang.Thread.State: WAITING (on object monitor)
>>>at java.lang.Object.wait(Native Method)
>>>- waiting on<0x7fe8cdcb6f10>  (a
>>>
>>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>>>at
>>>
>>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
>>>- locked<0x7fe8cdcb6f10>  (a
>>>
>>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>>>at
>>>
>>> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
>>>at
>>>
>>> org.apache.commons.httpclient.HttpMethodDirector.e

admin-extra file in multicore

2010-04-16 Thread Jon Baer
Hi,

It looks like Im trying to do the same thing in this open JIRA here ...

https://issues.apache.org/jira/browse/SOLR-975

I noticed in index.jsp it has a reference to:

<%
 // a quick hack to get rid of get-file.jsp -- note this still spits out 
invalid HTML
 out.write( 
org.apache.solr.handler.admin.ShowFileRequestHandler.getFileContents( 
"admin-extra.html" ) );
%>

Instead of resolving with the core.getName() path ...

Was trying to avoid building a custom solr.war for this project is there 
another quick hack to include content for admin backend or is patching the only 
way?

Thanks.

- Jon




Re: bug using distributed search, highlighting and q.alt

2010-04-16 Thread Otis Gospodnetic
Marc - Mind creating a ticket in JIRA and attaching our patch?
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Marc Sturlese 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 15, 2010 1:30:22 PM
> Subject: bug using distributed search, highlighting and q.alt
> 
> 
I have noticed when using q.alt even if hl=true highlights are not 
> returned.
When using distributed search, q.alt and hl, 
> HighlightComponent.java
finishStage expects the highlighting NamedList of 
> each shard (if hl=true)
but it will never be returned. It will end up with a 
> NullPointerExcepion.
I have temporally solved it checking that highlight 
> NamedList is always
returned for each shard. If it's not the case, highlights 
> are not added to
the response:

  @Override
  public void 
> finishStage(ResponseBuilder rb) {
boolean hasHighlighting = 
> true ;
if (rb.doHighlights && rb.stage == 
> ResponseBuilder.STAGE_GET_FIELDS) {

  
> Map.Entry[] arr = 
> new
NamedList.NamedListEntry[rb.resultIds.size()];


>   // TODO: make a generic routine to do automatic merging of id 
> keyed
data
  for (ShardRequest sreq : rb.finished) 
> {
if ((sreq.purpose & 
> ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0)
continue;
  
>   for (ShardResponse srsp : sreq.responses) {
  
> NamedList hl 
> =
(NamedList)srsp.getSolrResponse().getResponse().get("highlighting");
  
> if(hl != null) {

> for (int i=0; iString id = hl.getName(i);

>  ShardDoc sdoc = rb.resultIds.get(id);
  
>int idx = sdoc.positionInResponse;
  
>arr[idx] = new NamedList.NamedListEntry(id, 
> hl.getVal(i));
}

>   } else {

> hasHighlighting = false;
  }
  
>   }
  }

  // 
> remove nulls in case not all docs were able to be retrieved

>   if(hasHighlighting) {

> rb.rsp.add("highlighting", removeNulls(new SimpleOrderedMap(arr)));
  
> }
}
  }
-- 
View this message in 
> context: 
> href="http://n3.nabble.com/bug-using-distributed-search-highlighting-and-q-alt-tp721797p721797.html";
>  
> target=_blank 
> >http://n3.nabble.com/bug-using-distributed-search-highlighting-and-q-alt-tp721797p721797.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.


Re: Supporting multiple index / query analyzer stacks

2010-04-16 Thread Otis Gospodnetic
Gert,

You could:
* run 1 Solr instance with N cores. Each core would have a different 
flavour/stack of otherwise the same schema
* run 1 Solr instance with 1 core and in it N copies of each fiel, each copy 
with its flavour/stack
* run N Solr instances, each with a different flavour/stack of otherwise the 
same schema


If I had to do this, I'd go with the first option - it's the last management, 
not super resource hungry, and I each stack would be cleanly and truly 
separate. 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Villemos, Gert" 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 15, 2010 5:19:52 AM
> Subject: Supporting multiple index / query analyzer stacks
> 
> Having developed a system based on SOLr, we are now optimizing the
ranking of 
> the search results to give the user a better search
experience.


> 

We would like to create multiple index / query analyzer stacks in 
> the
SOLr configuration to test how this affects the results. We would 
> index
the same text field with all stacks and then at runtime allow the 
> user
to select the stack to be used to execute the search. He can 
> thus
perform the same search in for example 5 ways, and tell us which 
> search
stack gave him the best set of results.



How can we do 
> this?



We were thinking along the 
> lines:

*In the schema.xml define the different index / 
> query stacks for
different field types ("text_stack1", "text_stack2", 
> "text_stack3",...).
*Create a field of each type 
> ("", " name="text_s2" type="text_stack2"
...>", ...).
*
> Create a copy field definition, copying the same text into the
five different 
> fields ("",
" source="text_in" dest="text_s2">", ...).



Is this the smart 
> way of doing it? Is there a better way?


> 

Thanks,

Gert.



Please help Logica to respect the 
> environment by not printing this email  / Pour contribuer comme Logica au 
> respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte 
> drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die 
> Umwelt 
> zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao 
> imprimindo este correio electronico.



This e-mail and any 
> attachment is for authorised use by the intended recipient(s) only. It may 
> contain proprietary material, confidential information and/or be subject to 
> legal privilege. It should not be copied, disclosed to, retained or used by, 
> any 
> other party. If you are not an intended recipient then please promptly delete 
> this e-mail and any attachment and all copies and inform the sender. Thank 
> you.


Re: DIH

2010-04-16 Thread Lance Norskog
Oops, haven't checked that. The Wiki page generally marks new stuff
with a "Solr 1.5" marker.

On Wed, Apr 14, 2010 at 9:30 PM, Sandhya Agarwal  wrote:
> Thanks a lot, Lance.
>
> So, are these part of solr 1.4 release ?
>
> -Original Message-
> From: Lance Norskog [mailto:goks...@gmail.com]
> Sent: Thursday, April 15, 2010 9:53 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DIH
>
> FileListEntityProcessor -> BinFileDataSource -> TikaEntityProcessor (I think)
>
> FLEP walks the directory and supplies a separate record per file.
> BFDS pulls the file and supplies it to TikaEntityProcessor.
>
> BinFileDataSource is not documented, but you need it for binary data
> streams like PDF & Word. For text files, use FileDataSource.
>
> On 4/14/10, Sandhya Agarwal  wrote:
>> Hello,
>>
>> We want to design a solution where we have one polling directory (data
>> source directory) containing the xml files, of all data that must be
>> indexed. These XML files contain a reference to the content file. So, we
>> need another datasource that must be created for the content files. Could
>> somebody please tell me what is the best way to get this working using the
>> DIH / tika processor.
>>
>> Thanks,
>> Sandhya
>>
>>
>>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: SOLR Exact match problem - Punctuations, double quotes etc.

2010-04-16 Thread Hid-Mubarmij

Thanks a lot Erick,

I just used this "solr.PatternReplaceFilterFactory" in my field and the
problem is solved.

Thanks

-- 
View this message in context: 
http://n3.nabble.com/SOLR-Exact-match-problem-Punctuations-double-quotes-etc-tp720807p725630.html
Sent from the Solr - User mailing list archive at Nabble.com.