LockObtainFailedException and older segment_N files are present

2012-10-23 Thread jame vaalet
Hi,
I have got a searcher server replicating index from master server. Recently
I have noticed the huge difference in the index size between master and
slave followed by LockObtainFailedException in catalin.out log. When I
debugged the searcher index folder, I could see more that 100 segement_N
files in it. After debugging I found the root cause to be mis-configured
 solrconfig.xml, I was using solr3.4 and it had  section
instead of main index and hence forth it was using simple file lock rather
than the configured native lock(
http://wiki.apache.org/solr/SolrConfigXml#indexConfig). Rectifying this
cofiguration corrected the error and subsequently replication was fine and
older segement files got deleted, finally synching the searcher core size
and indexer core size.

What I would like to understand here is:

   1. Why would it cause lock obtain time out with simple file lock ? (may
   be the defaul  was too short).
   2. What would happen in case of this LockObtainFailedException while
   repliation, will it fail to replicate the docs ? However I have observed
   the searcher has equal numfound doc as indexer, which mean clients
   searching wouldn't face any documents missing.
   3. why did the index size bloat and older segment_N files were present?


thanks in advance !



-- 

-JAME


Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Sandeep Mestry
Dear All,

I have a requirement to search against multiple fields like title,
description, annotations, comments, text and the query can contain multiple
boolean operators.
So, can someone point me out in right direction.

If the user enters a query like ,

- (day AND world) NOT night

I want to form a query:

*(title:day AND title:world NOT title:night) OR (description:day
AND description:world NOT description:night) OR (annotations:day
AND annotations:world NOT annotations:night) OR (comments:day
AND comments:world NOT comments:night) OR (text:day AND text:world
NOT text:night) *

I've tried Lucene MultiFieldQueryParser to form the query and after some
string manipulation tried producing a query as below, however it does not
provide me correct relevancy.

*(title:day OR description:day OR annotations:day OR comments:day OR
text:day) AND (title:world OR description:world OR annotations:world OR
comments:world OR text:world) NOT (title:night OR description:night
OR annotations:night OR comments:night OR text:night)*

For the record, the project is still on Solr 1.4 and hence I'm using
Standard Query Parser (the upgrade is due in coming months). But for now, I
need to make it work for above requirement.

Please suggest if there is any straightforward approach or should I take
the route of writing the QueryGrammar myself?

Many Thanks,
Sandeep


Solr - Use Regex in the Query Phrase

2012-10-23 Thread Daisy
Hi; 

I am working with apache-solr-3.6.0 on windows machine. I would like to be
able to search for certain phrase which include some regex.

For example:  I want my query to be:  "art(.*?)le"
or another example of a phrase:  "he sa*"

I dont know how to do that in the url that will be sent to solr. I would
like to be able to do something like that:
http://localhost:8983/solr/core0/select/?q="ar(.*?)cle"&version=2.2&start=0&rows=2&debugQuery=on&hl=true&hl.fl=*

or

http://localhost:8983/solr/core0/select/?q="he
sa*"&version=2.2&start=0&rows=2&debugQuery=on&hl=true&hl.fl=*


Here is the part of my schema that I am using 






   


 
   

Any Ideas, please?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Use-Regex-in-the-Query-Phrase-tp4015335.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr - Use Regex in the Query Phrase

2012-10-23 Thread Markus Jelsma
Hi - Regex is not available in Solr 3.6:
https://issues.apache.org/jira/browse/LUCENE-2604
 
-Original message-
> From:Daisy 
> Sent: Tue 23-Oct-2012 14:13
> To: solr-user@lucene.apache.org
> Subject: Solr - Use Regex  in the Query Phrase
> 
> Hi; 
> 
> I am working with apache-solr-3.6.0 on windows machine. I would like to be
> able to search for certain phrase which include some regex.
> 
> For example:  I want my query to be:  "art(.*?)le"
> or another example of a phrase:  "he sa*"
> 
> I dont know how to do that in the url that will be sent to solr. I would
> like to be able to do something like that:
> http://localhost:8983/solr/core0/select/?q="ar(.*?)cle"&version=2.2&start=0&rows=2&debugQuery=on&hl=true&hl.fl=*
> 
> or
> 
> http://localhost:8983/solr/core0/select/?q="he
> sa*"&version=2.2&start=0&rows=2&debugQuery=on&hl=true&hl.fl=*
> 
> 
> Here is the part of my schema that I am using 
> 
> 
> 
>  positionIncrementGap="100">
> 
> 
>
> 
> 
>  
> termVectors="true" multiValued="true"/>
> 
> Any Ideas, please?
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Use-Regex-in-the-Query-Phrase-tp4015335.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Solr - Use Regex in the Query Phrase

2012-10-23 Thread Jack Krupansky
In 4.0, your first query can be done with a regex query - enclose the 
pattern in slashes:


q=/art(.*?)le/

Wildcards, fuzzy query, and regex query can only be applied to a single 
term, so you can't do your latter "proximity" query. The best you can do is 
a conjunction:


q=he+sa*

Although in that specific case you might simply expand the terms and use 
explicit terms in a proximity query:


q="he+says"+OR+"he+said"+OR+"he+saw"+OR+"he+saved"+OR+"he+saves"

I think there is an alternative query parser that supports wildcards in 
phrases, but the name escapes me at the moment.


-- Jack Krupansky

-Original Message- 
From: Daisy

Sent: Tuesday, October 23, 2012 8:08 AM
To: solr-user@lucene.apache.org
Subject: Solr - Use Regex in the Query Phrase

Hi;

I am working with apache-solr-3.6.0 on windows machine. I would like to be
able to search for certain phrase which include some regex.

For example:  I want my query to be:  "art(.*?)le"
or another example of a phrase:  "he sa*"

I dont know how to do that in the url that will be sent to solr. I would
like to be able to do something like that:
http://localhost:8983/solr/core0/select/?q="ar(.*?)cle"&version=2.2&start=0&rows=2&debugQuery=on&hl=true&hl.fl=*

or

http://localhost:8983/solr/core0/select/?q="he
sa*"&version=2.2&start=0&rows=2&debugQuery=on&hl=true&hl.fl=*


Here is the part of my schema that I am using



   
 
   
  
   


  

Any Ideas, please?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Use-Regex-in-the-Query-Phrase-tp4015335.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: solr 4.0 point type distance unit

2012-10-23 Thread Erick Erickson
Not quite sure what you're asking here. In bold, on the page
 you mentioned is:

"NOTE: Unless otherwise specified, all units of distance are
kilometers and points are in degrees of latitude,longitude."

Asking "default distance for point type" is meaningless, it's a lat/lon
coordinate for a spot on the earth's surface. Distance _between_
two points is kilometers, as is, say, the radius with a point at its
center.

Best
Erick

On Mon, Oct 22, 2012 at 10:31 AM, PORTO aLET  wrote:
> Hi forum,
>
> What is the default distance unit for "point" type ?
>
> Reading solr wiki http://wiki.apache.org/solr/SpatialSearch
> it implies that the distance unit is kilometers,
> however based on my testing, it seems like the distance unit is in "degree"
>
> How do I make sure if it's in "degree" or "kilometers" ?
>
> I also posted similar question at
> http://stackoverflow.com/questions/1215/solr-4-0-default-distance-unit
>
> Thanks


Re: Solr - Use Regex in the Query Phrase

2012-10-23 Thread Ahmet Arslan

> I think there is an alternative query parser that supports
> wildcards in phrases, but the name escapes me at the
> moment.

Yes, https://issues.apache.org/jira/browse/SOLR-1604


Re: uniqueKey not enforced

2012-10-23 Thread Erick Erickson
>From left field:

Try looking at your admin/schema browser page for the ID in question.
That actually
gets stuff out of your index (the actual indexed terms). See if you
have two values
for that ID. In which case you _might_ have spaces before or after the value
somehow. I notice your comment says something about "computed", so... Since
String types are totally unanalyzed, spaces would count.

you can also use the TermsComponent to see what's there, see:
http://wiki.apache.org/solr/TermsComponent

Best
Erick

On Mon, Oct 22, 2012 at 12:37 PM, Robert Krüger  wrote:
> On Mon, Oct 22, 2012 at 6:01 PM, Jack Krupansky  
> wrote:
>> And, are you using UUID's or providing specific key values?
> specific key values


Re: Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Ahmet Arslan


--- On Tue, 10/23/12, Sandeep Mestry  wrote:

> From: Sandeep Mestry 
> Subject: Forming Solr Query for multiple operators against multiple fields
> To: solr-user@lucene.apache.org
> Date: Tuesday, October 23, 2012, 2:51 PM
> Dear All,
> 
> I have a requirement to search against multiple fields like
> title,
> description, annotations, comments, text and the query can
> contain multiple
> boolean operators.
> So, can someone point me out in right direction.
> 
> If the user enters a query like ,
> 
> - (day AND world) NOT night

Probably you can make use of (e)dismax query parser.
http://wiki.apache.org/solr/DisMax
http://wiki.apache.org/solr/ExtendedDisMax



Re: Best and quickest Solr Search Front end

2012-10-23 Thread Muwonge Ronald
Yes this was the person I wanted to befriend Eric good to hear from
you.I won't hide anything am working a search engine for particular
content.I crawled some data  with nutch and indexed to solr.I have
been in search of a good solution and when reading the
Apache_Solr_3_Enterprise_Search_Server i fell in love with Blacklight
(nice name) I wend ahead and here iam.

http://waatu.com:3001

These will be the fields I need, URL,content,Title  something like
fields used by google,bing etc.What would yo advise tried to follow
this page 
https://github.com/projectblacklight/blacklight/wiki/How-to-configure-Blacklight-to-talk-to-your-(pre-existing)-Solr-index
but failed to get results form my solr index ;-(
.
By the way I like it's use at those universities and will introduce it
at the University I work soon thanks for job
https://hydra.hull.ac.uk/
IF all goes well wait for my donation ;-)
Regards
Ronny

On Mon, Oct 22, 2012 at 5:55 PM, Erik Hatcher  wrote:
> Further on that in recent versions of Solr, it's /browse, not the sillier 
> /itas handler name.
>
> As far as the "best" search front end, it's such an opinionated answer here.  
> It all really depends on what technologies you'd like to deploy.  The library 
> world has created two nice front-ends that are more or less general purpose 
> enough to use for other (non-library) schemas, with a bit of configuration.  
> There's Blacklight (Ruby on Rails) and VuFind (PHP).  As the initial creator 
> of Blacklight, I'll toss in my vote for that one as the best :)  But again, 
> it depends on many factors what's the Right choice for your environment.
>
> You can learn more about Blacklight at http://projectblacklight.org/, and see 
> many examples of it deployed in production here: 
> 
>
> Erik
>
>
> On Oct 22, 2012, at 08:13 , Paul Libbrecht wrote:
>
>> My experience for the easiest query is solr/itas (aka velocity solr).
>>
>> paul
>>
>>
>> Le 22 oct. 2012 à 11:15, Muwonge Ronald a écrit :
>>
>>> Hi all,
>>> have done some crawls for certain urls with nutch and indexed them  to
>>> solr.I kindly request for assistance in getting the best search
>>> interface but have no choice.Could you please assist me on this with
>>> examples and guide lines looked at solr-php-client but failed.
>>> Thnx
>>> Ronny
>>
>


Re: Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Sandeep Mestry
Thanks Ahmet, however as I have mentioned in my e-mail, we're using Solr
1.4 here and edismax is supported from Solr 3.1.

:-)

On 23 October 2012 13:42, Ahmet Arslan  wrote:

>
>
> --- On Tue, 10/23/12, Sandeep Mestry  wrote:
>
> > From: Sandeep Mestry 
> > Subject: Forming Solr Query for multiple operators against multiple
> fields
> > To: solr-user@lucene.apache.org
> > Date: Tuesday, October 23, 2012, 2:51 PM
> > Dear All,
> >
> > I have a requirement to search against multiple fields like
> > title,
> > description, annotations, comments, text and the query can
> > contain multiple
> > boolean operators.
> > So, can someone point me out in right direction.
> >
> > If the user enters a query like ,
> >
> > - (day AND world) NOT night
>
> Probably you can make use of (e)dismax query parser.
> http://wiki.apache.org/solr/DisMax
> http://wiki.apache.org/solr/ExtendedDisMax
>
>


Re: Forming Solr Query for multiple operators against multiple fields

2012-10-23 Thread Ahmet Arslan
> Thanks Ahmet, however as I have
> mentioned in my e-mail, we're using Solr
> 1.4 here and edismax is supported from Solr 3.1.

I think 1.4 has http://wiki.apache.org/solr/DisMaxQParserPlugin

But you need to use - + unitary operators with this.

(day AND world) NOT night => +day +world -night




Partitioning data to Cores using a Solr plugin

2012-10-23 Thread Shahar Davidson
Hi all,

I would like to partition my data (by date for example) into Solr Cores by 
implement some sort of *pluggable component* for Solr.
In other words, I want Solr to handle distribution to partitions (rather than 
implementing an external "solr proxy" for sending requests to the right Solr 
Core).

Initially I thought that DistributedUpdateProcessor may help here but, as I 
understood, it is not intended for partitioning into Cores but rather into 
shard across several machines. In addition, one cannot control the logic by 
which distribution is done.
I thought about implementing an UpdateRequestProcessor that forwards document 
updates to the right Cores (DistributedUpdateProcessor), yet I want to check 
with you (all Solr users out there) if this can be avoided by doing it 
differently.

In other words, is there any other way of implementing a pluggable component 
for Solr that can forward/route updates (using predefined logic) to Cores?
Is there, for instance, a way to catch an update request before it enters the 
update-request processor-chain?

Thanks,

Shahar.


Re: Occasional Solr performance issues

2012-10-23 Thread Erick Erickson
Maybe you've been looking at it but one thing that I didn't see on a fast
scan was that maybe the commit bit is the problem. When you commit,
eventually the segments will be merged and a new searcher will be opened
(this is true even if you're NOT optimizing). So you're effectively committing
every 1-2 seconds, creating many segments which get merged, but more
importantly opening new searchers (which you are getting since you pasted
the message: Overlapping onDeckSearchers=2).

You could pinpoint this by NOT committing explicitly, just set your autocommit
parameters (or specify commitWithin in your indexing program, which is
preferred). Try setting it at a minute or so and see if your problem goes away
perhaps?

The NRT stuff happens on soft commits, so you have that option to have the
documents immediately available for search.

Best
Erick

On Mon, Oct 22, 2012 at 10:44 AM, Dotan Cohen  wrote:
> I've got a script writing ~50 documents to Solr at a time, then
> commiting. Each of these documents is no longer than 1 KiB of text,
> some much less. Usually the write-and-commit will take 1-2 seconds or
> less, but sometimes it can go over 60 seconds.
>
> During a recent time of over-60-second write-and-commits, I saw that
> the server did not look overloaded:
>
> $ uptime
>  14:36:46 up 19:20,  1 user,  load average: 1.08, 1.16, 1.16
> $ free -m
>  total   used   free sharedbuffers cached
> Mem: 14980   2091  12889  0233   1243
> -/+ buffers/cache:613  14366
> Swap:0  0  0
>
> Other than Solr, nothing is running on this machine other than stock
> Ubuntu Server services (no Apache, no MySQL). The machine is running
> on an Extra Large Amazon EC2 instance, with a virtual 4-core 2.4 GHz
> Xeon processor and ~16 GiB of RAM. The solr home is on a mounted EBS
> volume.
>
> What might make some queries take so long, while others perform fine?
>
> Thanks.
>
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com


Re: SolrCloud - loop in recovery mode

2012-10-23 Thread Mark Miller
This sounds like two issues. One, you are triggering a leader election
somehow. Do you see session expiration errors in the logs? You may
need to raise the timeout.

The second issue is probably related to what came out of
https://issues.apache.org/jira/browse/SOLR-3939

Shards that have no versions (because they are empty or just
replicated) cannot become the leader.

So if your indexes are empty and a leader election is triggered, no
one ends up becoming the leader.

This will be addressed in a release shortly, but the simple workaround
for now is simply to restart the shard and you will get a leader.

Next, you want to figure out why a leader election is being triggered
at that time (eg a zk session timeout?). Is the system under heavy
load at that point?

- Mark


Re: Best and quickest Solr Search Front end

2012-10-23 Thread Erik Hatcher
Ronny -

Your best bets will be to engage the Blacklight team at their mailing list or 
in the #blacklight IRC room, both of which are active every day.

The links in your search UI will need to be adjusted, as Rails doesn't like 
having the URL formatted id's as part of the path - this is something I've 
ended up customizing when I've demo'd Blacklight with a non-library data set 
that had URLs for id's (but I don't recall how I customized it exactly, but 
basically overrode how the document URLs are formed to put the id in a query 
string parameter).

You did get results though!   Press search :)   You're just not showing any 
initial facets, but that's a config option.

What University do you work for?   I don't know whether you meant to imply Hull 
below or not, but Hull is using Blacklight within their Hydra project among 
others.

Erik


On Oct 23, 2012, at 08:46 , Muwonge Ronald wrote:

> Yes this was the person I wanted to befriend Eric good to hear from
> you.I won't hide anything am working a search engine for particular
> content.I crawled some data  with nutch and indexed to solr.I have
> been in search of a good solution and when reading the
> Apache_Solr_3_Enterprise_Search_Server i fell in love with Blacklight
> (nice name) I wend ahead and here iam.
> 
> http://waatu.com:3001
> 
> These will be the fields I need, URL,content,Title  something like
> fields used by google,bing etc.What would yo advise tried to follow
> this page 
> https://github.com/projectblacklight/blacklight/wiki/How-to-configure-Blacklight-to-talk-to-your-(pre-existing)-Solr-index
> but failed to get results form my solr index ;-(
> .
> By the way I like it's use at those universities and will introduce it
> at the University I work soon thanks for job
> https://hydra.hull.ac.uk/
> IF all goes well wait for my donation ;-)
> Regards
> Ronny
> 
> On Mon, Oct 22, 2012 at 5:55 PM, Erik Hatcher  wrote:
>> Further on that in recent versions of Solr, it's /browse, not the 
>> sillier /itas handler name.
>> 
>> As far as the "best" search front end, it's such an opinionated answer here. 
>>  It all really depends on what technologies you'd like to deploy.  The 
>> library world has created two nice front-ends that are more or less general 
>> purpose enough to use for other (non-library) schemas, with a bit of 
>> configuration.  There's Blacklight (Ruby on Rails) and VuFind (PHP).  As the 
>> initial creator of Blacklight, I'll toss in my vote for that one as the best 
>> :)  But again, it depends on many factors what's the Right choice for your 
>> environment.
>> 
>> You can learn more about Blacklight at http://projectblacklight.org/, and 
>> see many examples of it deployed in production here: 
>> 
>> 
>>Erik
>> 
>> 
>> On Oct 22, 2012, at 08:13 , Paul Libbrecht wrote:
>> 
>>> My experience for the easiest query is solr/itas (aka velocity solr).
>>> 
>>> paul
>>> 
>>> 
>>> Le 22 oct. 2012 à 11:15, Muwonge Ronald a écrit :
>>> 
 Hi all,
 have done some crawls for certain urls with nutch and indexed them  to
 solr.I kindly request for assistance in getting the best search
 interface but have no choice.Could you please assist me on this with
 examples and guide lines looked at solr-php-client but failed.
 Thnx
 Ronny
>>> 
>> 



Re: Best and quickest Solr Search Front end

2012-10-23 Thread Muwonge Ronald
On Tue, Oct 23, 2012 at 4:49 PM, Erik Hatcher  wrote:
> Ronny -
>
> Your best bets will be to engage the Blacklight team at their mailing list or 
> in the #blacklight IRC room, both of which are active every day.
Thanks will do that
>
> The links in your search UI will need to be adjusted, as Rails doesn't like 
> having the URL formatted id's as part of the path - this is something I've 
> ended up customizing when I've demo'd Blacklight with a non-library data set 
> that had URLs for id's (but I don't recall how I customized it exactly, but 
> basically overrode how the document URLs are formed to put the id in a query 
> string parameter).
>
> You did get results though!   Press search :)
Now when I did press search I get something but why don't I get a
single result when I search for something :-).FUNNY hahah I can afford
a smile at least

 You're just not showing any initial facets, but that's a config option.
>
> What University do you work for?   I don't know whether you meant to imply 
> Hull below or not, but Hull is using Blacklight within their Hydra project 
> among others.
>
Well am in Africa Uganda International University of East Africa
> Erik
>
>
> On Oct 23, 2012, at 08:46 , Muwonge Ronald wrote:
>
>> Yes this was the person I wanted to befriend Eric good to hear from
>> you.I won't hide anything am working a search engine for particular
>> content.I crawled some data  with nutch and indexed to solr.I have
>> been in search of a good solution and when reading the
>> Apache_Solr_3_Enterprise_Search_Server i fell in love with Blacklight
>> (nice name) I wend ahead and here iam.
>>
>> http://waatu.com:3001
>>
>> These will be the fields I need, URL,content,Title  something like
>> fields used by google,bing etc.What would yo advise tried to follow
>> this page 
>> https://github.com/projectblacklight/blacklight/wiki/How-to-configure-Blacklight-to-talk-to-your-(pre-existing)-Solr-index
>> but failed to get results form my solr index ;-(
>> .
>> By the way I like it's use at those universities and will introduce it
>> at the University I work soon thanks for job
>> https://hydra.hull.ac.uk/
>> IF all goes well wait for my donation ;-)
>> Regards
>> Ronny
>>
>> On Mon, Oct 22, 2012 at 5:55 PM, Erik Hatcher  wrote:
>>> Further on that in recent versions of Solr, it's /browse, not the 
>>> sillier /itas handler name.
>>>
>>> As far as the "best" search front end, it's such an opinionated answer 
>>> here.  It all really depends on what technologies you'd like to deploy.  
>>> The library world has created two nice front-ends that are more or less 
>>> general purpose enough to use for other (non-library) schemas, with a bit 
>>> of configuration.  There's Blacklight (Ruby on Rails) and VuFind (PHP).  As 
>>> the initial creator of Blacklight, I'll toss in my vote for that one as the 
>>> best :)  But again, it depends on many factors what's the Right choice for 
>>> your environment.
>>>
>>> You can learn more about Blacklight at http://projectblacklight.org/, and 
>>> see many examples of it deployed in production here: 
>>> 
>>>
>>>Erik
>>>
>>>
>>> On Oct 22, 2012, at 08:13 , Paul Libbrecht wrote:
>>>
 My experience for the easiest query is solr/itas (aka velocity solr).

 paul


 Le 22 oct. 2012 à 11:15, Muwonge Ronald a écrit :

> Hi all,
> have done some crawls for certain urls with nutch and indexed them  to
> solr.I kindly request for assistance in getting the best search
> interface but have no choice.Could you please assist me on this with
> examples and guide lines looked at solr-php-client but failed.
> Thnx
> Ronny

>>>
>


WordDelimiterFilter preserveOriginal & position increment

2012-10-23 Thread Jay Luker
Hi,

I'm having an issue with the WDF preserveOriginal="1" setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:

"...obtained with the Southern African Large Telescope,SALT..."

A lot of our text is extracted from PDFs, so this kind of formatting
junk is very common.

The phrase query that is failing is:

"Southern African Large Telescope"

>From looking at the analysis debugger I can see that the WDF is
getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:

Pos  Term
1  Southern
2  African
3  Large
4  Telescope,SALT  <-- original term
5  Telescope
6  SALT

Only by adding a phrase slop of "~1" do I get a match.

I realize that the WDF is behaving correctly in this case (or at least
I can't imagine a rational alternative). But I'm curious if anyone can
suggest an way to work around this issue that doesn't involve adding
phrase query slop.

Thanks,
--jay


Re: WordDelimiterFilter preserveOriginal & position increment

2012-10-23 Thread Jack Krupansky
Your "query" analyzer should not have preserveOriginal="1". You should have 
separate "index" and "query" analyzers; they may be almost identical, but 
the "query" analyzer must not have preserveOriginal="1" so that it generate 
a clean sequence of terms that were indexed in that exact order.


-- Jack Krupansky

-Original Message- 
From: Jay Luker

Sent: Tuesday, October 23, 2012 10:16 AM
To: solr-user
Subject: WordDelimiterFilter preserveOriginal & position increment

Hi,

I'm having an issue with the WDF preserveOriginal="1" setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:

"...obtained with the Southern African Large Telescope,SALT..."

A lot of our text is extracted from PDFs, so this kind of formatting
junk is very common.

The phrase query that is failing is:

"Southern African Large Telescope"


From looking at the analysis debugger I can see that the WDF is

getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:

Pos  Term
1  Southern
2  African
3  Large
4  Telescope,SALT  <-- original term
5  Telescope
6  SALT

Only by adding a phrase slop of "~1" do I get a match.

I realize that the WDF is behaving correctly in this case (or at least
I can't imagine a rational alternative). But I'm curious if anyone can
suggest an way to work around this issue that doesn't involve adding
phrase query slop.

Thanks,
--jay 



Re: WordDelimiterFilter preserveOriginal & position increment

2012-10-23 Thread Shawn Heisey

On 10/23/2012 8:16 AM, Jay Luker wrote:

 From looking at the analysis debugger I can see that the WDF is
getting the term "Telescope,SALT" and correctly splitting on the
comma. The problem seems to be that the original term is given the 1st
position, e.g.:

Pos  Term
1  Southern
2  African
3  Large
4  Telescope,SALT  <-- original term
5  Telescope
6  SALT


Jay, I have WDF with preserveOriginal turned on.  I get the following 
from WDF parsing in the analysis page on either 3.5 or 4.1-SNAPSHOT, and 
the analyzer shows that all four of the query words are found in 
consecutive fields.  On the new version, I had to slide a scrollbar to 
the right to see the last term.  Visually they were not in consecutive 
fields on the new version (they were on 3.5), but the position number 
says otherwise.


1Southern
2African
3Large
4Telescope,SALT
4Telescope
5SALT
5TelescopeSALT

My full WDF parameters:
index: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1, 
catenateWords=1, splitOnNumerics=1, stemEnglishPossessive=1, 
luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0, 
catenateNumbers=1}
query: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1, 
catenateWords=0, splitOnNumerics=1, stemEnglishPossessive=1, 
luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0, 
catenateNumbers=0}


I understand from other messages on the mailing list that I should not 
have preserveOriginal on the query side, but I have not yet changed it.


If your position numbers really are what you indicated, you may have 
found a bug.  I have not tried the released 4.0.0 version, I expect to 
deploy from the 4.x branch under development.


Thanks,
Shawn



Re: Best and quickest Solr Search Front end

2012-10-23 Thread Erik Hatcher
A blank search (in Blacklight) is actually a search underneath for *:* (match 
all documents), so searching is working.  But searching for actual words 
depends on a lot of things, what fields they're in, what analyzers are or are 
not being used, what query parser is being used, etc.  And in the Blacklight 
case, you probably just need to tune some configuration to get it querying how 
you'd like.   Making &debugQuery=true queries to Solr can shed a lot of light 
on what's happening, especially combining that with looking at the logs to see 
what requests Blacklight is sending to reverse engineer what's going on.

Erik

On Oct 23, 2012, at 10:06 , Muwonge Ronald wrote:

> On Tue, Oct 23, 2012 at 4:49 PM, Erik Hatcher  wrote:
>> Ronny -
>> 
>> Your best bets will be to engage the Blacklight team at their mailing list 
>> or in the #blacklight IRC room, both of which are active every day.
> Thanks will do that
>> 
>> The links in your search UI will need to be adjusted, as Rails doesn't like 
>> having the URL formatted id's as part of the path - this is something I've 
>> ended up customizing when I've demo'd Blacklight with a non-library data set 
>> that had URLs for id's (but I don't recall how I customized it exactly, but 
>> basically overrode how the document URLs are formed to put the id in a query 
>> string parameter).
>> 
>> You did get results though!   Press search :)
> Now when I did press search I get something but why don't I get a
> single result when I search for something :-).FUNNY hahah I can afford
> a smile at least
> 
> You're just not showing any initial facets, but that's a config option.
>> 
>> What University do you work for?   I don't know whether you meant to imply 
>> Hull below or not, but Hull is using Blacklight within their Hydra project 
>> among others.
>> 
> Well am in Africa Uganda International University of East Africa
>>Erik
>> 
>> 
>> On Oct 23, 2012, at 08:46 , Muwonge Ronald wrote:
>> 
>>> Yes this was the person I wanted to befriend Eric good to hear from
>>> you.I won't hide anything am working a search engine for particular
>>> content.I crawled some data  with nutch and indexed to solr.I have
>>> been in search of a good solution and when reading the
>>> Apache_Solr_3_Enterprise_Search_Server i fell in love with Blacklight
>>> (nice name) I wend ahead and here iam.
>>> 
>>> http://waatu.com:3001
>>> 
>>> These will be the fields I need, URL,content,Title  something like
>>> fields used by google,bing etc.What would yo advise tried to follow
>>> this page 
>>> https://github.com/projectblacklight/blacklight/wiki/How-to-configure-Blacklight-to-talk-to-your-(pre-existing)-Solr-index
>>> but failed to get results form my solr index ;-(
>>> .
>>> By the way I like it's use at those universities and will introduce it
>>> at the University I work soon thanks for job
>>> https://hydra.hull.ac.uk/
>>> IF all goes well wait for my donation ;-)
>>> Regards
>>> Ronny
>>> 
>>> On Mon, Oct 22, 2012 at 5:55 PM, Erik Hatcher  
>>> wrote:
 Further on that in recent versions of Solr, it's /browse, not the 
 sillier /itas handler name.
 
 As far as the "best" search front end, it's such an opinionated answer 
 here.  It all really depends on what technologies you'd like to deploy.  
 The library world has created two nice front-ends that are more or less 
 general purpose enough to use for other (non-library) schemas, with a bit 
 of configuration.  There's Blacklight (Ruby on Rails) and VuFind (PHP).  
 As the initial creator of Blacklight, I'll toss in my vote for that one as 
 the best :)  But again, it depends on many factors what's the Right choice 
 for your environment.
 
 You can learn more about Blacklight at http://projectblacklight.org/, and 
 see many examples of it deployed in production here: 
 
 
   Erik
 
 
 On Oct 22, 2012, at 08:13 , Paul Libbrecht wrote:
 
> My experience for the easiest query is solr/itas (aka velocity solr).
> 
> paul
> 
> 
> Le 22 oct. 2012 à 11:15, Muwonge Ronald a écrit :
> 
>> Hi all,
>> have done some crawls for certain urls with nutch and indexed them  to
>> solr.I kindly request for assistance in getting the best search
>> interface but have no choice.Could you please assist me on this with
>> examples and guide lines looked at solr-php-client but failed.
>> Thnx
>> Ronny
> 
 
>> 



SolrCloud - Replication - Runtime

2012-10-23 Thread balaji.gandhi
Hi,

I am trying to add new Solr nodes to an existing cluster for replication.
Only newly added documents are added to the replicas. Please let me know the
config for syncing the documents on adding the new nodes.

Thanks,
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Replication-Runtime-tp4015391.html
Sent from the Solr - User mailing list archive at Nabble.com.


Anyone running Solr in WebSphere on a mainframe?

2012-10-23 Thread re_buchanan
Yes, you heard that correctly on a mainframe.  

I've had no trouble setting up Solr on Tomcat on a Linux box sitting on my
desk, however as it's getting closer to time go to production and I need it
on our "real" environment.

We've managed to get Solr installed and running without too much trouble
(layout and set-up the solr home directory, create a WebSphere server
instance, and deploy the war file.)  This runs fine for the most part (I can
query, I can update, etc.)

However, things aren't working so well doing the initial loads.  I keep
getting "#500 Internal Server" error messages.

We're using post.jar to manage this (about 5 million records in 12 XML
files) - I know post.jar is not ideal, but it runs like a champ loading the
not particularly powerful Linux box on my desk.

We're going through the logs, but any general advice on running in this
environment would be appreciated.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyone-running-Solr-in-WebSphere-on-a-mainframe-tp4015417.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-23 Thread Chris Hostetter
: Just discovered that the replication admin REST API reports the correct
: index version and generation:
: 
: http://master_host:port/solr/replication?command=indexversion
: 
: So is this a bug in the admin UI?

Ya gotta be specific Bill: where in the admin UI do you think it's 
displaying the incorrect information?

The Admin UI just adds pretty markup to information fetched from the 
admin handlers using javascript, so if there is a problem it's either in 
the admin handlers, or in the javascript possibly caching the olds values.

Off the cuff, this reminds me of...

https://issues.apache.org/jira/browse/SOLR-3681

The root confusion there was that /admin/replication explicitly shows data 
about the commit point available for replication -- not the current commit 
point being "searched" on the master.

So if you are seeing a disconnect, then perhaps it's just that same 
descrepency? -- allthough if you are *only* seeing a disconnect after a 
deleteByQuery (and not after document adds, or a deleteById) then that 
does smell fishy, and makes me wonder if there is a code path where the 
"userData" for the commits aren't being set properly.

Can you file a bug with a unit test to reproduce?  or at the very list a 
set of specific commands to run against the solr example including what 
request handler URLs to hit (so there's no risk of confusion about the ui 
javascript behavior) to see the problem?


-Hoss


Re: WordDelimiterFilter preserveOriginal & position increment

2012-10-23 Thread Jay Luker
Bah... While attempting to duplicate this on our 4.x instance I
realized I was mis-reading the analysis output. In the example I
mentioned it was actually a SynonymFilter in the analysis chain that
was affecting the term position (we have several synonyms for
"telescope").

Regardless, it seems to not be a problem in 4.x.

Thanks,
--jay

On Tue, Oct 23, 2012 at 10:45 AM, Shawn Heisey  wrote:
> On 10/23/2012 8:16 AM, Jay Luker wrote:
>>
>>  From looking at the analysis debugger I can see that the WDF is
>> getting the term "Telescope,SALT" and correctly splitting on the
>> comma. The problem seems to be that the original term is given the 1st
>> position, e.g.:
>>
>> Pos  Term
>> 1  Southern
>> 2  African
>> 3  Large
>> 4  Telescope,SALT  <-- original term
>> 5  Telescope
>> 6  SALT
>
>
> Jay, I have WDF with preserveOriginal turned on.  I get the following from
> WDF parsing in the analysis page on either 3.5 or 4.1-SNAPSHOT, and the
> analyzer shows that all four of the query words are found in consecutive
> fields.  On the new version, I had to slide a scrollbar to the right to see
> the last term.  Visually they were not in consecutive fields on the new
> version (they were on 3.5), but the position number says otherwise.
>
>
> 1Southern
> 2African
> 3Large
> 4Telescope,SALT
> 4Telescope
> 5SALT
> 5TelescopeSALT
>
> My full WDF parameters:
> index: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1,
> catenateWords=1, splitOnNumerics=1, stemEnglishPossessive=1,
> luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0,
> catenateNumbers=1}
> query: {preserveOriginal=1, splitOnCaseChange=1, generateNumberParts=1,
> catenateWords=0, splitOnNumerics=1, stemEnglishPossessive=1,
> luceneMatchVersion=LUCENE_35, generateWordParts=1, catenateAll=0,
> catenateNumbers=0}
>
> I understand from other messages on the mailing list that I should not have
> preserveOriginal on the query side, but I have not yet changed it.
>
> If your position numbers really are what you indicated, you may have found a
> bug.  I have not tried the released 4.0.0 version, I expect to deploy from
> the 4.x branch under development.
>
> Thanks,
> Shawn
>


Failure to open existing log file (non fatal)

2012-10-23 Thread Markus Jelsma
Hi,

We're testing a 10 node cluster running trunk and write a few million documents 
to it from Hadoop. We just saw a node die for no apparent reason. Tomcat was 
completely dead before it was automatically restarted again. Indexing failed 
when it received the typical Internal Server Error. The log only shows:

2012-10-23 19:07:09,291 ERROR [solr.update.UpdateLog] - [main] - : Failure to 
open existing log file (non fatal) 
/opt/solr/cores/shard_f/data/tlog/tlog.0010484:org.apache.solr.common.SolrException:
 java.io.EOFException
at org.apache.solr.update.TransactionLog.(TransactionLog.java:182)
at org.apache.solr.update.UpdateLog.init(UpdateLog.java:216)
at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
at org.apache.solr.update.UpdateHandler.(UpdateHandler.java:111)
at 
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:97)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:483)
at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:551)
at org.apache.solr.core.SolrCore.(SolrCore.java:714)
at org.apache.solr.core.SolrCore.(SolrCore.java:573)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
..lots of catalina traces...
Caused by: java.io.EOFException
at 
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:72)
at 
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:206)
at 
org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
at org.apache.solr.update.TransactionLog.(TransactionLog.java:160)
... 44 more

According to syslog Tomcat was not killed by the OOM-killer, what i initially 
expected. Syslog is also still running ;)

It seem the error is more fatal than the error tells me, the indexing error and 
the exception happened within a few seconds of eachother. Any ideas? Existing 
issue? File bug?

Thanks
Markus


Re: SolrCloud - Replication - Runtime

2012-10-23 Thread Otis Gospodnetic
Are you looking to rebalance existing docs?  I don't think that is
currently possible.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 23, 2012 1:03 PM, "balaji.gandhi"  wrote:

> Hi,
>
> I am trying to add new Solr nodes to an existing cluster for replication.
> Only newly added documents are added to the replicas. Please let me know
> the
> config for syncing the documents on adding the new nodes.
>
> Thanks,
> Balaji
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Replication-Runtime-tp4015391.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr/Lucene + Oracle Database seamless integration

2012-10-23 Thread Maximiliano Keen
*Scotas* brings a remarkable advancement to Enterprise Text Search. Scotas
combines and synchronize the high-performance, full-featured Solr/Lucene
text search engine with the industry leading Oracle Database's performance,
scalability, security, and reliability.

Main features are:

   - Full Integration with Near Real Time index updating
   - Solr syntax embedded in PLSQL queries
   - Oracle Scalability, Performance and Security


Don't hesitate to contact us for further questions.




[image: Imágenes integradas 1]
*
*
*Maximiliano Keen*
mk...@scotas.com

PS: Scotas is pleased to announce the release of the new website. *
http://www.scotas.com*


Questions about HttpSolrServer

2012-10-23 Thread Benjamin, Roy
Assuming one has hundreds of Solr nodes should an indexing application pool 
HttpSolrServer instances ala a connection pool ?

Thanks
Roy


Solr 3.6


Re: Questions about HttpSolrServer

2012-10-23 Thread Anirudha Jadhav
try reading up on :  LBHttpSolrServer. We have a layer on top of solr to
manage such scenarios.

On Tue, Oct 23, 2012 at 4:52 PM, Benjamin, Roy  wrote:

> Assuming one has hundreds of Solr nodes should an indexing application
> pool HttpSolrServer instances ala a connection pool ?
>
> Thanks
> Roy
>
>
> Solr 3.6
>



-- 
Anirudha P. Jadhav


Re: Failure to open existing log file (non fatal)

2012-10-23 Thread Mark Miller
It depends I think. Is that shard missing any data?

Those two exceptions belong together it appears to me - the first says that 
there was an error loading the transaction log and that it's not a fatal error 
- the second looks like the stack trace from the low level spot that loading 
the tran log failed at - and it looks like it could not read the header.

I think that's a normal situation in the case of a crash? You might not have a 
well formed transaction log file upon a crash - as long as a request was not 
ack'd that should have ended up in that log you are fine. Though, unless you 
have replicas, you will want to enable fsync per request to guarantee that on 
hard crashes.

Perhaps we can improve the appearance of this - but it's expected to happen in 
crash cases.

- Mark

On Oct 23, 2012, at 3:22 PM, Markus Jelsma  wrote:

> Hi,
> 
> We're testing a 10 node cluster running trunk and write a few million 
> documents to it from Hadoop. We just saw a node die for no apparent reason. 
> Tomcat was completely dead before it was automatically restarted again. 
> Indexing failed when it received the typical Internal Server Error. The log 
> only shows:
> 
> 2012-10-23 19:07:09,291 ERROR [solr.update.UpdateLog] - [main] - : Failure to 
> open existing log file (non fatal) 
> /opt/solr/cores/shard_f/data/tlog/tlog.0010484:org.apache.solr.common.SolrException:
>  java.io.EOFException
>at 
> org.apache.solr.update.TransactionLog.(TransactionLog.java:182)
>at org.apache.solr.update.UpdateLog.init(UpdateLog.java:216)
>at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
>at org.apache.solr.update.UpdateHandler.(UpdateHandler.java:111)
>at 
> org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:97)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:483)
>at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:551)
>at org.apache.solr.core.SolrCore.(SolrCore.java:714)
>at org.apache.solr.core.SolrCore.(SolrCore.java:573)
>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:850)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
>at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
>at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
> ..lots of catalina traces...
> Caused by: java.io.EOFException
>at 
> org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:72)
>at 
> org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:206)
>at 
> org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
>at 
> org.apache.solr.update.TransactionLog.(TransactionLog.java:160)
>... 44 more
> 
> According to syslog Tomcat was not killed by the OOM-killer, what i initially 
> expected. Syslog is also still running ;)
> 
> It seem the error is more fatal than the error tells me, the indexing error 
> and the exception happened within a few seconds of eachother. Any ideas? 
> Existing issue? File bug?
> 
> Thanks
> Markus



Re: Failure to open existing log file (non fatal)

2012-10-23 Thread Chris Hostetter

: Perhaps we can improve the appearance of this - but it's expected to happen 
in crash cases.

in case it wasn't clear: there's no indication that this "Failure to open 
existing log file" *caused* any sort of crash -- it comes from the 
initialization code of the UpdateHandler when a SolrCore is being created 
as part of Solr startup.

So this is an error that was definitely logged after tomcat was restarted.

: > It seem the error is more fatal than the error tells me, the indexing 
: error and the exception happened within a few seconds of eachother. Any 
: ideas? Existing issue? File bug?


-Hoss


RE: Failure to open existing log file (non fatal)

2012-10-23 Thread Markus Jelsma
Hi,

I checked the logs and it confirms the error is not fatal, it was logged just a 
few seconds before it was restarted. The node runs fine after it was restarted 
but logged this non fatal error replayed the log twice. This leaves the 
question why it died, there is no log of it dying anywhere. We don't recover 
rsyslogd so it was running all the time and there is no report of an OOM-killer 
there.

Any more thoughts to share?

Thanks
Markus

 
-Original message-
> From:Chris Hostetter 
> Sent: Wed 24-Oct-2012 00:38
> To: solr-user@lucene.apache.org
> Subject: Re: Failure to open existing log file (non fatal)
> 
> 
> : Perhaps we can improve the appearance of this - but it's expected to happen 
> in crash cases.
> 
> in case it wasn't clear: there's no indication that this "Failure to open 
> existing log file" *caused* any sort of crash -- it comes from the 
> initialization code of the UpdateHandler when a SolrCore is being created 
> as part of Solr startup.
> 
> So this is an error that was definitely logged after tomcat was restarted.
> 
> : > It seem the error is more fatal than the error tells me, the indexing 
> : error and the exception happened within a few seconds of eachother. Any 
> : ideas? Existing issue? File bug?
> 
> 
> -Hoss
> 


Re: Data Writing Performance of Solr 4.0

2012-10-23 Thread Otis Gospodnetic
Hideki,

Mark's answer is right. It depends.  Solr 4 has NRT built in.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 20, 2012 3:43 PM, "Nagendra Nagarajayya" <
nnagaraja...@transaxtions.com> wrote:

>
> You may want to look at realtime NRT for this kind of performance:
> https://issues.apache.org/jira/browse/SOLR-3816
>
> You can download realtime NRT integrated with Apache Solr from here:
> http://solr-ra.tgels.org
>
>
> Regards,
>
> - Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.org
>
>
>
> On 10/18/2012 11:50 PM, higashihara_hdk wrote:
> > Hello everyone.
> >
> > I have two questions. I am considering using Solr 4.0 to perform full
> > searches on the data output in real-time by a Storm cluster
> > (http://storm-project.net/).
> >
> > 1. In particular, I'm concerned whether Solr would be able to keep up
> > with the 2000-message-per-second throughput of the Storm cluster. What
> > kind of throughput would I be able to expect from Solr 4.0, for example
> > on a Xeon 2.5GHz 4-core with HDD?
> >
> > 2. Also, how efficiently would Solr scale with clustering?
> >
> > Any pertinent information would be greatly appreciated.
> >
> > Hideki Higashihara
> >
> >
>
>


Re: Solr Cloud Questions

2012-10-23 Thread Otis Gospodnetic
Some of our clients have been using it. It still has a few problems as you
can see in jira, but nothing major.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 22, 2012 9:18 PM, "Mark"  wrote:

> I have a few questions regarding Solr Cloud. I've been following it for
> quite some time but I believe it wasn't ever production ready. I see that
> with the release of 4.0 it's considered stable… is that the case? Can
> anyone out there share your experiences with Solr Cloud in a production
> environment?
>
>
>


Re: Failure to open existing log file (non fatal)

2012-10-23 Thread Mark Miller
On Tue, Oct 23, 2012 at 6:33 PM, Chris Hostetter
 wrote:
>
> : Perhaps we can improve the appearance of this - but it's expected to happen 
> in crash cases.
>
> in case it wasn't clear: there's no indication that this "Failure to open
> existing log file" *caused* any sort of crash -- it comes from the
> initialization code of the UpdateHandler when a SolrCore is being created
> as part of Solr startup.
>
> So this is an error that was definitely logged after tomcat was restarted.
>

No, it would not cause a crash - you would see on startup after a crash.

-- 
- Mark


Re: Failure to open existing log file (non fatal)

2012-10-23 Thread Mark Miller
Why the process died, I cannot say. Seems like the world of guesses is
just too large :) If there is nothing in the logs, it's likely a the
OS level? But if there are no dump files or evidence of it in system
logs, I don't even know where to start.

All I can help with is that the exception is an expected possibility
after a Solr crash (on the next startup).

- Mark

On Tue, Oct 23, 2012 at 6:48 PM, Markus Jelsma
 wrote:
> Hi,
>
> I checked the logs and it confirms the error is not fatal, it was logged just 
> a few seconds before it was restarted. The node runs fine after it was 
> restarted but logged this non fatal error replayed the log twice. This leaves 
> the question why it died, there is no log of it dying anywhere. We don't 
> recover rsyslogd so it was running all the time and there is no report of an 
> OOM-killer there.
>
> Any more thoughts to share?
>
> Thanks
> Markus
>
>
> -Original message-
>> From:Chris Hostetter 
>> Sent: Wed 24-Oct-2012 00:38
>> To: solr-user@lucene.apache.org
>> Subject: Re: Failure to open existing log file (non fatal)
>>
>>
>> : Perhaps we can improve the appearance of this - but it's expected to 
>> happen in crash cases.
>>
>> in case it wasn't clear: there's no indication that this "Failure to open
>> existing log file" *caused* any sort of crash -- it comes from the
>> initialization code of the UpdateHandler when a SolrCore is being created
>> as part of Solr startup.
>>
>> So this is an error that was definitely logged after tomcat was restarted.
>>
>> : > It seem the error is more fatal than the error tells me, the indexing
>> : error and the exception happened within a few seconds of eachother. Any
>> : ideas? Existing issue? File bug?
>>
>>
>> -Hoss
>>



-- 
- Mark


Re: Solr - Use Regex in the Query Phrase

2012-10-23 Thread Daisy
Thanks very much, I have upgraded my solr to apache-solr-4.0 and my first
query works fine.
For the second one I have tried what iorixxx pointed out, but I couldnt
proceed. I am not experienced with patching, However i tried the steps
recommended by Ahmet Arslan  here
  .
But it didnt works for me as it is always says class not found.
Could you please guide me how to use it to execute my second query? (I am
using windows)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Use-Regex-in-the-Query-Phrase-tp4015335p4015473.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr UI for File Search

2012-10-23 Thread 122jxgcn
Hello,

I'm almost done with my file (rich document) searching system for server and
client side.
Now I have to do is configure search result interface so that
it displays result properly and attach a link to the searched files.
(It just shows xml result now)
I cannot simply use other application because I added my own file parsers on
Tika.
So what would be my best option in order to add nice UI to my system without
messing with it?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-UI-for-File-Search-tp4015476.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrJ CloudSolrServer throws ClassCastException

2012-10-23 Thread Kevin Osborn
I am getting a ClassCastException when i call Solr. My code is pretty
simple.

SolrServer mySolrServer = new CloudSolrServer(zookeeperHost);
((CloudSolrServer)mySolrServer).setDefaultCollection("manufacturer")
((CloudSolrServer)mySolrServer).connect()


The actual error is thrown on line 300 of ClusterState.java:
new ZkNodeProps(sliceMap.get(shardName))

It is trying to convert a String to a Map which causes the
ClassCastException.

My zookeepHost string is simply  "myHost:6200". My SolrCloud has 2 shards
over a single collection. And two instances are running. I also tried an
external Zookeeper with the same results.


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]


Re: Data Writing Performance of Solr 4.0

2012-10-23 Thread higashihara_hdk

Mark, Otis,

Thanks for the replies.
I consider it.

Hideki


(2012/10/24 8:01), Otis Gospodnetic wrote:

Hideki,

Mark's answer is right. It depends.  Solr 4 has NRT built in.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 20, 2012 3:43 PM, "Nagendra Nagarajayya" <
nnagaraja...@transaxtions.com> wrote:


You may want to look at realtime NRT for this kind of performance:
https://issues.apache.org/jira/browse/SOLR-3816

You can download realtime NRT integrated with Apache Solr from here:
http://solr-ra.tgels.org


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org



On 10/18/2012 11:50 PM, higashihara_hdk wrote:

Hello everyone.

I have two questions. I am considering using Solr 4.0 to perform full
searches on the data output in real-time by a Storm cluster
(http://storm-project.net/).

1. In particular, I'm concerned whether Solr would be able to keep up
with the 2000-message-per-second throughput of the Storm cluster. What
kind of throughput would I be able to expect from Solr 4.0, for example
on a Xeon 2.5GHz 4-core with HDD?

2. Also, how efficiently would Solr scale with clustering?

Any pertinent information would be greatly appreciated.

Hideki Higashihara








Re: Solr - Use Regex in the Query Phrase

2012-10-23 Thread Ahmet Arslan
> For the second one I have tried what iorixxx pointed out,
> but I couldnt
> proceed. I am not experienced with patching, However i tried
> the steps
> recommended by Ahmet Arslan  here
>  
> .
> But it didnt works for me as it is always says class not
> found.
> Could you please guide me how to use it to execute my second
> query? (I am
> using windows)

I added a new zip file to SOLR-1604. It includes README.txt that has step by 
step instructions. It simply follows 
http://wiki.apache.org/solr/SolrPlugins#The_Old_Way




Re: Solr Cloud Questions

2012-10-23 Thread Tomás Fernández Löbbe
Some of our clients have been using it. It still has a few problems as you
> can see in jira, but nothing major.
>

Same here.


> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Oct 22, 2012 9:18 PM, "Mark"  wrote:
>
> > I have a few questions regarding Solr Cloud. I've been following it for
> > quite some time but I believe it wasn't ever production ready. I see that
> > with the release of 4.0 it's considered stable… is that the case? Can
> > anyone out there share your experiences with Solr Cloud in a production
> > environment?
> >
> >
> >
>


Re: Solr Implementation Plan and FTE for Install/Maintenance

2012-10-23 Thread Otis Gospodnetic
1 :) I'm not kidding.
1 server if HA is not needed. RAM depends, say 8 or 16GB
1 day, 1 person

Devil's in the details though.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 22, 2012 10:44 AM, "Seth Hynes"  wrote:

> All - I'm a bit new to Solr and looking for documentation or guides on
> implementing Solr as an enterprise search solution over some other products
> we are currently using. Ideally, I'd like to find out information about
>
>
> * General Solr server hardware requirements and approx. starting
> size for a 3 million document index
>
> * Approximate time to setup and configure Solr for a 3 million
> document index
>
> * Number of FTE's typically that folks see to setup and configure
> Solr
>
> * Approximate number of FTE's necessary to maintain Solr on an
> ongoing basis
>
> Any general FTE information, implementation timeline information, or cost
> comparison data you may have, I'd find extremely interesting.
>
> I've looked for this type of data blogs and on the Lucene site but haven't
> been able to find much information in these areas.
>
> Thanks!
> Seth
>
>
>
>


Re: SOLR capacity planning and Disaster relief

2012-10-23 Thread Otis Gospodnetic
Hi Worty,

On Sun, Oct 21, 2012 at 2:30 AM, Worthy LaFollette  wrote:
> CAVEAT: I am a nubie w/r to SOLR (some Lucene experience, but not SOLR
> itself.  Trying to come up to speed.
>
>
> What have you all done w/r to SOLR capacity planning and disaster relief?

Re capacity planning - performance testing with realistic datasets,
query types and rates combined with monitoring tools that show you
system and Solr metrics so you can understand what is going on will
get you far.  Ongoing monitoring and observation of a running system
will let you understand trends, bottlenecks, and figure out if you
need to get ready to buy more RAM or add servers or ...

> I am curious to the following metrics:
>
>  - File handles and other ulimit/profile concerns

Not often a concern any more.  Typical Linux systems come with 1024
max open files, which is often insufficient, so people change that to
20K, 30K, etc.
I *think* we have this system metric in SPM for Solr, but I'm not sure
right now.

>  - Space calculations (particularly w/r to optimizations, etc.)

Monitoring again is the best way to tell and to keep an eye on this.
Optimization can take ~3x disk space, if I remember correctly.  You
can also check ML archives for recent emails re index optimization.

>  - Taxonomy considerations

I think this is typically DIY.

>  - Single Core vs. Multi-core

Not sure what to say here.  Typically one type of data goes in one
core.  You typically don't put both people records and product records
and order records in the same core because these three things have
different structure/schema.

>  - ?
>
> Also, anyone plan for Disaster relief for SOLR across non-metro data
> centers?   Currently not an issue for me, but will be shortly.

Have a look at http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


Re: SolrJ CloudSolrServer throws ClassCastException

2012-10-23 Thread Kevin Osborn
It looks like this is where the problem lies. Here is the JSON that SolrJ
is receiving from Zookeeper:

"data":"{\\"manufacturer\\":{\\n\\"shard1\\":{\\n
 \\"range\\":\\"8000-\\",\\n
 \\"replicas\\":{\\"myhost:5270_solr_manufacturer\\":{\\n
 \\"shard\\":\\"shard1\\",\\n  \\"roles\\":null,\\n
 \\"state\\":\\"active\\",\\n  \\"core\\":\\"manufacturer\\",\\n
   \\"collection\\":\\"manufacturer\\",\\n
 \\"node_name\\":\\"phx2-ccs-apl-dev-wax1.cnet.com:5270_solr\\",\\n
 \\"base_url\\":\\"http://myhost:5270/solr\\",\\n
 \\"leader\\":\\"true\\"}}},\\n\\"shard2\\":{\\n
 \\"range\\":\\"0-7fff\\",\\n
 \\"replicas\\":{\\"myhost:5275_solr_manufacturer\\":{\\n
 \\"shard\\":\\"shard2\\",\\n  \\"roles\\":null,\\n
 \\"state\\":\\"active\\",\\n  \\"core\\":\\"manufacturer\\",\\n
   \\"collection\\":\\"manufacturer\\",\\n
 \\"node_name\\":\\"myhost:5275_solr\\",\\n  \\"base_url\\":\\"
http://myhost:5275/solr\\",\\n
 \\"leader\\":\\"true\\"}"}},{"data":{

Where SolrJ is expecting the shard Name, it is actually getting "range" as
the shard name and "8000-" as the value. Any ideas? Did I
configure something wrong?


On Tue, Oct 23, 2012 at 5:17 PM, Kevin Osborn  wrote:

> I am getting a ClassCastException when i call Solr. My code is pretty
> simple.
>
> SolrServer mySolrServer = new CloudSolrServer(zookeeperHost);
> ((CloudSolrServer)mySolrServer).setDefaultCollection("manufacturer")
> ((CloudSolrServer)mySolrServer).connect()
>
>
> The actual error is thrown on line 300 of ClusterState.java:
> new ZkNodeProps(sliceMap.get(shardName))
>
> It is trying to convert a String to a Map which causes the
> ClassCastException.
>
> My zookeepHost string is simply  "myHost:6200". My SolrCloud has 2 shards
> over a single collection. And two instances are running. I also tried an
> external Zookeeper with the same results.
>
>
> --
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]
>
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]


DIH nested entities don't work

2012-10-23 Thread mroosendaal
Hi,

We have several oracle views which contain the result of an ETL process. I
want to index information from those views and have the following
data-config.xml

:1521/  user="a" password="a"/>




   
   
 
...

products can be everything and sometimes are musicproducts.

it works up to the point where the 'item'/product is indexed but the rest is
not. The only strange thing i see is that (almost) every product is indexed
with one and the same songtitle for some reason.

So for some reason it does not do a 'join all'.

Any suggestions?

Thanks,
Maarten




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH nested entities don't work

2012-10-23 Thread Gora Mohanty
On 24 October 2012 10:30, mroosendaal  wrote:
>
> Hi,
>
> We have several oracle views which contain the result of an ETL process. I
> want to index information from those views and have the following
> data-config.xml
>
>  driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@//:1521/  user="a" password="a"/>
> 
> 
> 
> 
>
>
>  
> ...
>
> products can be everything and sometimes are musicproducts.
>
> it works up to the point where the 'item'/product is indexed but the rest is
> not. The only strange thing i see is that (almost) every product is indexed
> with one and the same songtitle for some reason.


What does the DIH summary say after the import is complete?
Could you share your schema.xml? Are the proper fields defined
there? Have you tried the various SELECTs manually?

For the second issue with the entity "songtitel" it might be because you are
missing a '$'. You have  {item.pdt_id} instead of ${item.pdt_id}

Regards,
Gora